XSEDE: Python Tools for Data Science

By |

OVERVIEW

Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2 .

See https://portal.xsede.org/course-calendar/-/training-user/class/2467/session/4161 for more information and registration

 

Register via the XSEDE Portal:

If you do not currently have an XSEDE Portal account, you will need to create one:

https://portal.xsede.org/my-xsede?p_p_id=58&p_p_lifecycle=0&p_p_state=maximized&p_p_mode=view&_58_struts_action=%2Flogin%2Fcreate_account

Should you have any problems with that process, please contact help@xsede.org and they will provide assistance.

 

Geostatistics – III

By |

Many environmental variables such as temperature, rainfall, air pollutants, and soil nutrients are measured at sampled point locations. We often need to estimate these variables at one of more unsampled locations. Geostatistics provide tools and techniques to carry out this task.

In a series of three workshops, we will cover the basics of Geostatistics. In this third workshop, we will combine the material we covered in the first two workshops and develop the geostatistical modeling approach. This is mainly a lecture style workshop, but will include an example in R. The material will also help you understand the basics of Gaussian Process Regression, a commonly used modeling technique in Machine Learning.

Geostatistics – II

By |

Many environmental variables such as temperature, rainfall, air pollutants, and soil nutrients are measured at sampled point locations. We often need to estimate these variables at one of more unsampled locations. Geostatistics provide tools and techniques to carry out this task.

In a series of three workshops, we are covering the basics of Geostatistics. In this second workshop, we will focus on covariance and variogram, and their estimation in the context of geostatistical modeling. This is mainly a lecture style workshop, but we will also execute some examples in R. The material will also help you understand the basics of Gaussian Process Regression, a commonly used modeling technique in Machine Learning.

Geostatistics – I

By |

Many environmental variables such as temperature, rainfall, air pollutants, and soil nutrients are measured at sparsely sampled point locations. We often need to estimate these variables at one of more unsampled locations. Geostatistics provide tools and techniques to carry out this task.

In a series of three workshops, we will cover the basics of Geostatistics. In this first workshop we will understand the idea of stationary random fields, positive definite functions, and the fundamental building blocks of Gaussian random fields. This is mainly a lecture style workshop, but we will also execute some examples in R. The material will also help you understand the foundations of Gaussian Process Regression, a commonly used technique in Machine Learning and AI.

XSEDE HPC HPC Summer Boot Camp

By |

OVERVIEW

XSEDE, along with the Pittsburgh Supercomputing Center is pleased to present a Hybrid Computing workshop.

This 4 day event will include MPI, OpenMP, GPU programming using OpenACC and accelerators.

This workshop will be remote to desktop only due to the COVID-19 pandemic.  When the registration has filled, there will be no more students added due to our current limits.

The schedule can be found here:  https://www.psc.edu/resources/training/xsede-hpc-workshop-june-8-11-2021-summer-boot-camp/

 

Register via the XSEDE Portal:

https://portal.xsede.org/course-calendar/-/training-user/class/2338/session/4002

If you do not currently have an XSEDE Portal account, you will need to create one:

https://portal.xsede.org/my-xsede?p_p_id=58&p_p_lifecycle=0&p_p_state=maximized&p_p_mode=view&_58_struts_action=%2Flogin%2Fcreate_account

Should you have any problems with that process, please contact help@xsede.org and they will provide assistance.

Questions

Please address any questions to Tom Maiden at tmaiden@psc.edu.

Advanced ML topics: Algorithms, writing ML code, comparing implementations

By |

OVERVIEW

This workshop is designed as a follow-up to the basic introduction to machine learning earlier in this series. We will cover several examples in Python and compare different implementations. We will also look at advanced topics in machine learning, such as GPU optimization, parallel processing, and deep learning. A basic understanding of Python is required.

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

Introduction to Machine Learning

By |

OVERVIEW

Machine learning is becoming an increasingly popular tool in several fields, including data science, medicine, engineering, and business. This workshop will cover basic concepts related to machine learning, including definitions of basic terms, sample applications, and methods for deciding whether your project is a good fit for machine learning. No prior knowledge or coding experience is required

INSTRUCTORS

Meghan Richey
Machine Learning Specialist
Information and Technology Services – Advanced Research Computing – Technology Services

Meghan Richey is a machine learning specialist in the Advanced Research Computing- Technology Services department at the University of Michigan. She consults on several faculty and student machine learning applications and research studies, specializing in natural language processing and convolutional neural networks. Before her position at the university, Ms. Richey worked for a defense contractor as a software engineer to design and implement software solutions for DoD-funded artificial intelligence efforts.

MATERIALS

A Zoom link will be provided to the participants the day before the class. Registration is required.

Instructor will be available at the Zoom link, to be provided, from 9-10 AM for computer setup assistance.

Please note, this session will be recorded.  

Register here

If you have questions about this workshop, please send an email to the instructor at richeym@umich.edu

Balzano wins NSF CAREER award for research on machine learning and big data involving physical, biological and social phenomena

By | General Interest, Happenings, News, Research

Prof. Laura Balzano received an NSF CAREER award to support research that aims to improve the use of machine learning in big data problems involving elaborate physical, biological, and social phenomena. The project, called “Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering,” is expected to have broad applicability in data science.

Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data, says Balzano. Typically the data is broken down in one of two ways. Dimensionality-reduction uses an algorithm to break down high-dimensional data into low-dimensional structure that is most relevant to the problem being solved. Clustering, on the other hand, attempts to group pieces of data into meaningful clusters of information.

However, explains Balzano, “as increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable, yet hard to find.”

Balzano plans to develop techniques that combine the two key approaches used in machine learning to decipher data, while being applicable to data that is considered “messy.” Messy data is data that has missing elements, may be somewhat corrupted, or is filled heterogeneous information – in other words, it describes most data sets in today’s world.

Balzano is an affiliated faculty member of both the Michigan Institute for Data Science (MIDAS) and the Michigan Institute for Computational Discovery and Engineering (MICDE). She is part of a MIDAS-supported research team working on single-cell genomic data analysis.

Read more about the NSF CAREER award…

U-M fosters thriving artificial intelligence and machine learning research

By | General Interest, HPC, News, Research

Research using machine learning and artificial intelligence — tools that allow computers to learn about and predict outcomes from massive datasets — has been booming at the University of Michigan. The potential societal benefits being explored on campus are numerous, from on-demand transportation systems to self-driving vehicles to individualized medical treatments to improved battery capabilities.

The ability of computers and machines generally to learn from their environments is having transformative effects on a host of industries — including finance, healthcare, manufacturing, and transportation — and could have an economic impact of $15 trillion globally according to one estimate.

But as these methods become more accurate and refined, and as the datasets needed become bigger and bigger, keeping up with the latest developments and identifying and securing the necessary resources — whether that means computing power, data storage services, or software development — can be complicated and time-consuming. And that’s not to mention complying with privacy regulations when medical data is involved.

“Machine learning tools have gotten a lot better in the last 10 years,” said Matthew Johnson-Roberson, Assistant Professor of Engineering in the Department of Naval Architecture & Marine Engineering and the Department of Electrical Engineering and Computer Science. “The field is changing now at such a rapid pace compared to what it used to be. It takes a lot of time and energy to stay current.”

Diagram representing the knowledge graph of an artificial intelligence system, courtesy of Jason Mars, assistant professor, Electrical Engineering and Computer Science, U-M

Johnson-Roberson’s research is focused on getting computers and robots to better recognize and adapt to the world, whether in driverless cars or deep-sea mapping robots.

“The goal in general is to enable robots to operate in more challenging environments with high levels of reliability,” he said.

Johnson-Roberson said that U-M has many of the crucial ingredients for success in this area — a deep pool of talented researchers across many disciplines ready to collaborate, flexible and personalized support, and the availability of computing resources that can handle storing the large datasets and heavy computing load necessary for machine learning.

“The people is one of the reasons I came here,” he said. “There’s a broad and diverse set of talented researchers across the university, and I can interface with lots of other domains, whether it’s the environment, health care, transportation or energy.”

“Access to high powered computing is critical for the computing-intensive tasks, and being able to leverage that is important,” he continued. “One of the challenges is the data. A major driver in machine learning is data, and as the datasets get more and more voluminous, so does the storage needs.”

Yuekai Sun, an assistant professor in the Statistics Department, develops algorithms and other computational tools to help researchers analyze large datasets, for example, in natural language processing. He agreed that being able to work with scientists from many different disciplines is crucial to his research.

“I certainly find the size of Michigan and the inherent diversity that comes with it very stimulating,” he said. “Having people around who are actually working in these application areas helps guide the direction and the questions that you ask.”

Sun is also working on analyzing the potential discriminatory effects of algorithms used in decisions like whether to give someone a loan or to grant prisoners parole.

“If you use machine learning, how do you hold an algorithm or the people who apply it accountable? What does it mean for an algorithm to be fair?” he said. “Can you check whether this notion of non-discrimination is satisfied?”

Jason Mars, an assistant professor in the Electrical Engineering and Computer Science department and co-founder of a successful spinoff called Clinc, is applying artificial intelligence to driverless car technology and a mobile banking app that has been adopted by several large financial institutions. The app, called Finie, provides a much more conversational interface between users and their financial information than other apps in the field.

“There is going to be an expansion of the number of problems solved and number of contributions that are AI-based,” Mars said. He predicted that more researchers at U-M will begin exploring AI and ML as they understand the potential.

“It’s going to require having the right partner, the right experts, the right infrastructure, and the best practices of how to use them,” he said.

He added that U-M does a “phenomenal job” in supporting researchers conducting AI and ML research.

“The level of support and service is awesome here,” he said. “Not to mention that the infrastructure is state of the art. We stay relevant to the best techniques and practices out there.”

Advanced Research Computing at U-M, in part through resources from the university-wide Data Science Initiative, provides computing infrastructure, consulting expertise, and support for interdisciplinary research projects to help scientists conducting artificial intelligence and machine learning research.

For example, Consulting for Statistics, Computing and Analytics Research, an ARC unit, has several consultants on staff with expertise in machine learning and predictive analysis with large, complex, and heterogeneous data. CSCAR recently expanded capacity to support very large-scale machine learning using tools such as Google’s TensorFlow.

CSCAR consultants are available by appointment or on a drop-in basis, free of charge. See cscar.research.umich.edu or email cscar@umich.edu for more information.

CSCAR also provides workshops on topics in machine learning and other areas of data science, including sessions on Machine Learning in Python, and an upcoming workshop in March titled “Machine Learning, Concepts and Applications.”

The computing resources available to machine learning and artificial intelligence researchers are significant and diverse. Along with the campus-wide high performance computing cluster known as Flux, the recently announced Big Data cluster Cavium ThunderX will give researchers a powerful new platform for hosting artificial intelligence and machine learning work. Both clusters are provided by Advanced Research Computing – Technology Services (ARC-TS).

All allocations on ARC-TS clusters include access to software packages that support AI/ML research, including TensorFlow, Torch, and Spark ML, among others.

ARC-TS also operates the Yottabyte Research Cloud (YBRC), a customizable computing platform that recently gained the capacity to host and analyze data governed by the HIPAA federal privacy law.

Also, the Michigan Institute for Data Science (MIDAS) (also a unit of ARC) has supported several AI/ML projects through its Challenge Initiative program, which has awarded more than $10 million in research support since 2015.

For example, the Analytics for Learners as People project is using sensor-based machine learning tools to translate data on academic performance, social media, and survey data into attributes that will form student profiles. Those profiles will help link academic performance and mental health with the personal attributes of students, including values, beliefs, interests, behaviors, background, and emotional state.

Another example is the Reinventing Public Urban Transportation and Mobility project, which is using predictive models based on machine learning to develop on-demand, multi-modal transportation systems for urban areas.

In addition, MIDAS supports student groups involved in this type of research such as the Michigan Student Artificial Intelligence Lab (MSAIL) and the Michigan Data Science Team (MDST).

(A version of this piece appeared in the University Record.)