New research from the University of Michigan will help revolutionize the data processing pipeline with state-of-the-art algorithms to optimize the collection and processing of any kind of data. Algorithms available now are built for real data, meaning real numbers, however, most of the data we see on the internet is non-real, like discrete data, or categorical. This project is part of a $2.8 million grant from the Department of Energy on algorithms research, which is the backbone of predictive modeling and simulation. The research will enable DOE to set new frontiers in physics, chemistry, biology, and other domains.
“Preparing for the future means that we must continue to invest in the development of next-generation algorithms for scientific computing,” said Barbara Helland, Associate Director for Advanced Scientific Computing Research, DOE Office of Science. “Foundational research in algorithms is essential for ensuring their efficiency and reliability in meeting the emerging scientific needs of the DOE and the United States.”
The U-M project, led by associate professor Laura Balzano and assistant professor Hessam Mahdavifar, both of electrical engineering and computer science, is one of six chosen by DOE to cover several topics at the leading-edge of algorithms research. According to the DOE, researchers will explore algorithms for analyzing data from biology, energy storage, and other applications. They will develop fast and efficient algorithms as building blocks for tackling increasingly large data analysis problems from scientific measurements, simulations, and experiments. Projects will also address challenges in solving large-scale computational fluid dynamics and related problems.
Laura Balzano, associate professor of electrical engineering and computer science (left); Hessam Mahdavifar assistant professor of electrical engineering and computer science (right)
Balzano and Mahdavifar, both Michigan Institute for Computational Discovery and Engineering (MICDE) affiliated faculty members, will use a $300,000 portion of the overall grant to study randomized sketching and compression for high-dimensional non-real-valued data with low-dimensional structures.
“Randomized sketching and subsampling algorithms are revolutionizing the data processing pipeline by allowing significant compression of redundant information,” said Balzano. “Sketches work well because scientific data are generally highly redundant in nature, often following a perturbed low-dimensional structure. Hence, low-rank models and sketching that preserves those model structures are ubiquitous in many machine learning and signal processing applications.”
Even though a lot of the data used and processed in scientific and technological applications are best modeled mathematically as discrete, categorical or ordinal data, most state-of-the art randomized sketching algorithms focus on real-valued data. To add to this, in practical applications, treating high-dimensional data can be challenging in terms of computational and memory demands. Thus, the proposed project will significantly expand the applicability of randomized sketching.
“A key to data-driven modeling is to carefully reformulate the computational and data analysis challenges and take full advantage of the underlying mathematical structure that is often common across application areas,” said Krishna Garikipati, MICDE director and professor of mechanical engineering and mathematics.”This research and the work that Laura and Hessam are doing is critically important to the advancement of computational discovery.”