Researchers can leverage high performance computing (HPC) to pose and probe unique and interesting questions. In order to answer those questions, they need some ability to extract meaning from the data they generate. Data science has recently risen to prominence for its abilities not only to extract this meaning, but also to turn that meaning into actionable, data-driven predictions. Combining HPC and data science can allow a researcher to more easily ask questions and discover answers leading to even deeper questions.
However, despite these benefits, it can be difficult for full-time researchers to learn an emerging field with no guidance while still producing research.

To reduce this barrier, the Scientific Computing Student Club (SC2) presents the HPC and Data Science Workshop Series. For eight weeks in the Fall semester, we will meet once a week to discuss HPC and Data Science topics, focusing on how elements of HPC and data science can reinforce research and discovery. There will be no one instructor for this workshop; instead we will have guest speakers from SC2, ARC-TS, MICDE, and even MathWorks (MATLAB). Attendance at all workshops is encouraged, but not required. The planned syllabus is below.

Everyone is welcome to join at any time, but be prepared to actively participate.

Pizza will be provided thanks to the generous support of U-M College of Engineering through the Graduate Student Community Grant Program – just bring your own drink!

Syllabus

The following is a tentative syllabus for the 2018 High Performance Computing and Data Science Workshop. It is tentative in that the topics listed below are suggestions of what could be covered in the 8 week workshop, but since the instructors will cycle between the participants and invited speakers, each instructor is welcome and encouraged to add or remove topics to their lesson as deemed necessary. 

{Oct 4, 2018} Introduction to HPC and Data Science: Bash, conda and git

  1. Workshop Overview and Intention
  2. What is HPC? What is Data Science?
  3. Setting up your computing environment

Follow this link to see the meeting’s slides.

{Oct 11, 2018} Utilizing the Cluster Part I

Note: ARC-TS guest will teach this session. We may extend this session as it covers a lot of material.

  • Supercomputing available on campus
    • Flux and the Flux Hadoop cluster
  • Signac flow

{Oct 18, 2018} Utilizing the Cluster Part II

Note: ARC-TS guest will teach this session. We may extend this session as it covers a lot of material.

  • Supercomputing available on campus
    • Flux and the Flux Hadoop cluster
  • Signac flow

{Oct 25, 2018} Hadoop & Spark

Note: ARC-TS guest will lead this session

  • What is Hadoop? When should I use Hadoop?
  • What is Spark? When should I use Spark?

{Nov 8, 2018} Data Management Best Practices

Session will be lead by signac developers Bradley Dice and Vyas Ramasubramani. They will be using signac in their presentation, but the topics covered are generalizable.

  • Computational research project management
  • Storing and querying data and metadata

Prof. Shedden, director of CSCAR, will give an overview of available, free workshops and consulting services at U-M related to HPC and data science

{Nov 15, 2018} ML Methods - Deep Learning

Note: this session will be lead by a MathWorks guest

{Dec 6th, 2018; Note Date Change} Data Science Ethics

  • ‘Privacy,’ ‘fairness,’ and ‘representation’ in the age of Big Data

Workshop WRAP UP!

LOGISTICS

TIME: Thursdays @ 6 – 7 PM

ROOM: 3150 Dow (2300 Hayward St)

The workshop will run every Thursday from October 4 to November 29, 2018, except November 22.

Pizza provided – just bring your own drink

RSVP
Questions?