Research Data Scientist Intermediate for ARC-TS | Michigan Institute for Computational Discovery and Engineering

Advanced Research Computing – Technology Services (ARC-TS) has an exciting opportunity for those who wish to impact our world through science and research by use of computational and data tools such as, Machine Learning, statistical analysis, High Performance Computing (HPC), Big Data (Hadoop, Spark, DBMS, etc), cloud computing services (AWS, Azure, GCP), and more.

This position will be part of a team supporting all areas of research that utilize data at the University of Michigan. The primary responsibilities will be consulting on as well as providing services in collecting, discovering, cataloging, manipulating, and transforming data. This role will work closely with possibly multiple projects. Other responsibilities will include making presentations and providing training on the use, cataloging, and manipulation of data to students and researchers.

The successful candidate should be comfortable with Linux systems and the use of common data manipulation tools and languages such as Python and SQL and be able to pick up new tools quickly as needed for the scope of the project currently assigned.

Note: Technical training will be provided to address specific gaps in desired qualifications.

Responsibilities

Data Preparation and Identification — This role will help users through the lifecycle of their datasets. The position will help users understand the data set that they have, determine programmatic ways to clean the data, prepare the data for analysis and annotate datasets with descriptions for multiple uses. We also foresee the role helping to identify existing datasets around the University that could be used by courses and for research. Data Collection and Programing — Your role will assist in the creation of tools that collect data from many disparate sources such as SQL and NoSQL, databases, APIs, web scraping, flat files, and other file formats. Your interaction with the research projects may include extended functionality to manipulate, identify duplicates, removing identifying data, etc through the use of tools. Documentation and Training of Tools — Your role will participate in a larger group to provide workshops on the use of data and data manipulation tools. This will include creating documentation of how to use tools in our supported environments such. Documentation and Cataloging of Data — Your role will document data such as meta-data, schemas and more so that researchers may consume prepared data for use in their own analysis. This documentation will include how the data are manipulated and assumptions used for any summaries or statistics. Development of Self and Others — You will explore new tools and technologies through formal and self-directed learning. Research and provide advice to team on latest application technology trends to support ongoing development of existing tools and services.

Required Qualifications

Bachelors degree in a related field and/or equivalent combination of education, certification and experience
Two (2) years of experience in collecting, discovering, cataloging, manipulating, and transforming data
Python Proficiency
Very Basic SQL experience
Linux Proficiency
Experience with data from different fields and domains
Comfortable supporting a broad range of research (students, researchers, and faculty)
Ability to communicate effectively via email, letters, and in person to teams and customers
Ability to work independently and collaboratively

Desired Qualifications

Masters or PhD in related area
Experience working in an academic environment
Familiarity with big data tools from the Hadoop ecosystem such as Mapreduce, Spark, Hive, Impala, etc.
Understanding of any of the following numerical techniques: causal inference, selection bias, dimensionality reduction (Singular value decomposition, Principal component analysis)
Understanding of Machine Learning tools such as Tensorflow, PyTORCH, Scikit, CNTK/Microsoft Cognitive Toolkit, Power AI, Theano, Caffe, etc.
Understanding of Machine Learning/AI methods such as random forest, neural networks, Markov models, etc.
Proficiency in any of the following: R, SAS, SPSS, Tableau, Perl, C/C++, Go, etc.
Advanced SQL experience
Experience with any of the following: Compilers, Makefiles, and common build chains (autoconf/automake, CMake, pip, easy_build, Spack)

Diversity, Equity and Inclusion

The University of Michigan Information and Technology Services seeks to recruit and retain a diverse workforce as a reflection of our commitment to serve the diverse people of Michigan, to maintain the excellence of the University and to offer our students richly varied disciplines, perspectives and ways of knowing and learning.

Comprehensive Benefits

The University of Michigan Benefits Office is committed to offering a high-quality benefits package to support faculty, staff and their families. Learn more about our 2:1 retirement matching, healthcare plans with nationwide coverage including prescription drug coverage, three dental plans, a vision plan, flexible spending account, well-being programs, long-term disability, automatic life insurance, general legal services, three early childhood centers, time away from work and work-life programs to promote balance. Learn more at hr.umich.edu/benefits-wellness

Application Procedure

To be considered, a cover letter and resume are required. The cover letter must be the leading page of your resume and should:

Specifically outline the reasons for your interest in the position and
Outline your particular skills and experience that directly relate to this position.
For more information, and to apply, use the link found here

Starting salaries will vary depending upon the qualifications and experience of the selected candidate.

Salary: $68,462.00 – $89,000.00

Work Location: Ann Arbor Campus, Ann Arbor, MI

Full-Time Position