“Get non-Real”: Department of Energy grant funds novel research in High-Performance Algorithms at U-M

By | Feature, Research

“Preparing for the future means that we must continue to invest in the development of next-generation algorithms for scientific computing,

Barbara Helland, Associate Director for Advanced Scientific Computing Research, DOE Office of Science
Source: www.energy.gov/science/articles/department-energy-invests-28-million-novel-research-high-performance-algorithms

New research from the University of Michigan will help revolutionize the data processing pipeline with state-of-the-art algorithms to optimize the collection and processing of any kind of data. Algorithms available now are built for real data, meaning real numbers, however, most of the data we see on the internet is non-real, like discrete data, or categorical. This project is part of a $2.8 million grant from the Department of Energy on algorithms research, which is the backbone of predictive modeling and simulation. The research will enable DOE to set new frontiers in physics, chemistry, biology, and other domains. 

“Preparing for the future means that we must continue to invest in the development of next-generation algorithms for scientific computing,” said Barbara Helland, Associate Director for Advanced Scientific Computing Research, DOE Office of Science. “Foundational research in algorithms is essential for ensuring their efficiency and reliability in meeting the emerging scientific needs of the DOE and the United States.”

The U-M project, led by associate professor Laura Balzano and assistant professor Hessam Mahdavifar, both of electrical engineering and computer science, is one of six chosen by DOE to cover several topics at the leading-edge of algorithms research. According to the DOE, researchers will explore algorithms for analyzing data from biology, energy storage, and other applications. They will develop fast and efficient algorithms as building blocks for tackling increasingly large data analysis problems from scientific measurements, simulations, and experiments. Projects will also address challenges in solving large-scale computational fluid dynamics and related problems.

Laura Balzano and Hessam Mahdavifar portraits

Laura Balzano, associate professor of electrical engineering and computer science (left); Hessam Mahdavifar assistant professor of electrical engineering and computer science (right)

Balzano and Mahdavifar, both Michigan Institute for Computational Discovery and Engineering (MICDE) affiliated faculty members, will use a $300,000 portion of the overall grant to study randomized sketching and compression for high-dimensional non-real-valued data with low-dimensional structures.

“Randomized sketching and subsampling algorithms are revolutionizing the data processing pipeline by allowing significant compression of redundant information,” said Balzano. “Sketches work well because scientific data are generally highly redundant in nature, often following a perturbed low-dimensional structure. Hence, low-rank models and sketching that preserves those model structures are ubiquitous in many machine learning and signal processing applications.” 

Even though a lot of the data used and processed in scientific and technological applications are best modeled mathematically as discrete, categorical or ordinal data, most state-of-the art randomized sketching algorithms focus on real-valued data. To add to this, in practical applications, treating high-dimensional data can be challenging in terms of computational and memory demands. Thus, the proposed project will significantly expand the applicability of randomized sketching.

“A key to data-driven modeling is to carefully reformulate the computational and data analysis challenges and take full advantage of the underlying mathematical structure that is often common across application areas,” said Krishna Garikipati, MICDE director and professor of mechanical engineering and mathematics.”This research and the work that Laura and Hessam are doing is critically important to the advancement of computational discovery.”

Introduction to Research Computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will introduce you to high performance computing on the Great Lakes cluster.  After a brief overview of the components of the cluster and the resources available there, the main body of the workshop will cover creating batch scripts and the options available to run jobs, and hands-on experience in submitting, tracking, and interpreting the results of submitted jobs. By the end of the workshop, every participant should have created a submission script, submitted a job, tracked its progress, and collected its output. Additional tools including high-performance data transfer services and interactive use of the cluster will also be covered.

To register and view more details, please refer to the linked TTC page.

Introduction to the Linux Command Line

By |

OVERVIEW

This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also generically referred to as “the command line”. Topics include: a brief overview of Linux, the Bash shell, navigating the file system, basic commands, shell redirection, permissions, processes, and the command environment. The workshop will also provide a quick introduction to nano a simple text editor that will be used in subsequent workshops to edit files.

 

To register and view more details, please refer to the linked TTC page

Advanced research computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will cover some more advanced topics in computing on the U-M Great Lakes Cluster. Topics to be covered include a review of common parallel programming models and basic use of Great Lakes; dependent and array scheduling; workflow scripting using bash; high-throughput computing using launcher; parallel processing in one or more of Python, R, and MATLAB; and profiling of parallel code using Allinea Performance Reports and Allinea MAP.

 

Please register at https://ttc.iss.lsa.umich.edu/ttc/sessions/advanced-research-computing-on-the-great-lakes-cluster-12/register/

Introduction to Research Computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will introduce you to high performance computing on the Great Lakes cluster.  After a brief overview of the components of the cluster and the resources available there, the main body of the workshop will cover creating batch scripts and the options available to run jobs, and hands-on experience in submitting, tracking, and interpreting the results of submitted jobs. By the end of the workshop, every participant should have created a submission script, submitted a job, tracked its progress, and collected its output. Additional tools including high-performance data transfer services and interactive use of the cluster will also be covered.

To register and view more details, please refer to the linked TTC page.

Introduction to the Linux Command Line

By |

OVERVIEW

This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also generically referred to as “the command line”. Topics include: a brief overview of Linux, the Bash shell, navigating the file system, basic commands, shell redirection, permissions, processes, and the command environment. The workshop will also provide a quick introduction to nano a simple text editor that will be used in subsequent workshops to edit files.

 

To register and view more details, please refer to the linked TTC page  (https://ttc.iss.lsa.umich.edu/ttc/sessions/introduction-to-the-linux-command-line-33/)

Data Sharing and Archiving

By |

OVERVIEW

For growing data volumes, how we manage data becomes more important. This session will cover the basics of managing data in a research environment such as those at ARC and nationally. Attendees of the course will be introduced to recommended tools for data sharing and transfer both on campus, off campus, and cloud.  They will learn how to prepare data for archive, including special high performance versions of tar and compression allowing significant performance benefits over the standard versions of the tools.
Lastly we will cover the properties and selection process of the appropriate general purpose  storage for data that requires long term preservation and active archiving that supports the largest data volumes in a way that controls costs and ease of management.
Requirements are basic command line.
To register and view more details, please refer to the linked TTC page.  

 

Advanced research computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will cover some more advanced topics in computing on the U-M Great Lakes Cluster. Topics to be covered include a review of common parallel programming models and basic use of Great Lakes; dependent and array scheduling; workflow scripting using bash; high-throughput computing using launcher; parallel processing in one or more of Python, R, and MATLAB; and profiling of parallel code using Allinea Performance Reports and Allinea MAP.

 

Please register at https://ttc.iss.lsa.umich.edu/ttc/sessions/advanced-research-computing-on-the-great-lakes-cluster-11/

Data Sharing and Archiving

By |

OVERVIEW

For growing data volumes, how we manage data becomes more important. This session will cover the basics of managing data in a research environment such as those at ARC and nationally. Attendees of the course will be introduced to recommended tools for data sharing and transfer both on campus, off campus, and cloud.  They will learn how to prepare data for archive, including special high performance versions of tar and compression allowing significant performance benefits over the standard versions of the tools.
Lastly we will cover the properties and selection process of the appropriate general purpose  storage for data that requires long term preservation and active archiving that supports the largest data volumes in a way that controls costs and ease of management.
Requirements are basic command line.
To register and view more details, please refer to the linked TTC page

 

Introduction to Research Computing on the Great Lakes Cluster

By |

OVERVIEW

This workshop will introduce you to high performance computing on the Great Lakes cluster.  After a brief overview of the components of the cluster and the resources available there, the main body of the workshop will cover creating batch scripts and the options available to run jobs, and hands-on experience in submitting, tracking, and interpreting the results of submitted jobs. By the end of the workshop, every participant should have created a submission script, submitted a job, tracked its progress, and collected its output. Additional tools including high-performance data transfer services and interactive use of the cluster will also be covered.

To register and view more details, please refer to the linked TTC page.