Explore ARCExplore ARC

Balzano wins NSF CAREER award for research on machine learning and big data involving physical, biological and social phenomena

By | General Interest, Happenings, News, Research

Prof. Laura Balzano received an NSF CAREER award to support research that aims to improve the use of machine learning in big data problems involving elaborate physical, biological, and social phenomena. The project, called “Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering,” is expected to have broad applicability in data science.

Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data, says Balzano. Typically the data is broken down in one of two ways. Dimensionality-reduction uses an algorithm to break down high-dimensional data into low-dimensional structure that is most relevant to the problem being solved. Clustering, on the other hand, attempts to group pieces of data into meaningful clusters of information.

However, explains Balzano, “as increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable, yet hard to find.”

Balzano plans to develop techniques that combine the two key approaches used in machine learning to decipher data, while being applicable to data that is considered “messy.” Messy data is data that has missing elements, may be somewhat corrupted, or is filled heterogeneous information – in other words, it describes most data sets in today’s world.

Balzano is an affiliated faculty member of both the Michigan Institute for Data Science (MIDAS) and the Michigan Institute for Computational Discovery and Engineering (MICDE). She is part of a MIDAS-supported research team working on single-cell genomic data analysis.

Read more about the NSF CAREER award…

U-M partners with Cavium on Big Data computing platform

By | Feature, General Interest, Happenings, HPC, News

A new partnership between the University of Michigan and Cavium Inc., a San Jose-based provider of semiconductor products, will create a powerful new Big Data computing cluster available to all U-M researchers.

The $3.5 million ThunderX computing cluster will enable U-M researchers to, for example, process massive amounts of data generated by remote sensors in distributed manufacturing environments, or by test fleets of automated and connected vehicles.

The cluster will run the Hortonworks Data Platform providing Spark, Hadoop MapReduce and other tools for large-scale data processing.

“U-M scientists are conducting groundbreaking research in Big Data already, in areas like connected and automated transportation, learning analytics, precision medicine and social science. This partnership with Cavium will accelerate the pace of data-driven research and opening up new avenues of inquiry,” said Eric Michielssen, U-M associate vice president for advanced research computing and the Louise Ganiard Johnson Professor of Engineering in the Department of Electrical Engineering and Computer Science.

“I know from experience that U-M researchers are capable of amazing discoveries. Cavium is honored to help break new ground in Big Data research at one of the top universities in the world,” said Cavium founder and CEO Syed Ali, who received a master of science in electrical engineering from U-M in 1981.

Cavium Inc. is a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking. The new U-M system will use dual socket servers powered by Cavium’s ThunderX ARMv8-A workload optimized processors.

The ThunderX product family is Cavium’s 64-bit ARMv8-A server processor for next generation Data Center and Cloud applications, and features high performance custom cores, single and dual socket configurations, high memory bandwidth and large memory capacity.

Alec Gallimore, the Robert J. Vlasic Dean of Engineering at U-M, said the Cavium partnership represents a milestone in the development of the College of Engineering and the university.

“It is clear that the ability to rapidly gain insights into vast amounts of data is key to the next wave of engineering and science breakthroughs. Without a doubt, the Cavium platform will allow our faculty and researchers to harness the power of Big Data, both in the classroom and in their research,” said Gallimore, who is also the Richard F. and Eleanor A. Towner Professor, an Arthur F. Thurnau Professor, and a professor both of aerospace engineering and of applied physics.

Along with applications in fields like manufacturing and transportation, the platform will enable researchers in the social, health and information sciences to more easily mine large, structured and unstructured datasets. This will eventually allow, for example, researchers to discover correlations between health outcomes and disease outbreaks with information derived from socioeconomic, geospatial and environmental data streams.

U-M and Cavium chose to run the cluster on Hortonworks Data Platform, which is based on open source Apache Hadoop. The ThunderX cluster will deliver high performance computer services for the Hadoop analytics and, ultimately, a total of three petabytes of storage space.

“Hortonworks is excited to be a part of forward-leading research at the University of Michigan exploring low-powered, high-performance computing,” said Nadeem Asghar, vice president and global head of technical alliances at Hortonworks. “We see this as a great opportunity to further expand the platform and segment enablement for Hortonworks and the ARM community.”

Workshop co-chaired by MIDAS co-director Prof. Hero releases proceedings on inference in big data

By | Al Hero, Educational, General Interest, Research

The National Academies Committee on Applied and Theoretical Statistics has released proceedings from its June 2016 workshop titled “Refining the Concept of Scientific Inference When Working with Big Data,” co-chaired by Alfred Hero, MIDAS co-director and the John H Holland Distinguished University Professor of Electrical Engineering and Computer Science.

The report can be downloaded from the National Academies website.

The workshop explored four key issues in scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models.

The workshop brought together statisticians, data scientists and domain researchers from different biomedical disciplines in order to identify new methodological developments that hold significant promise, and to highlight potential research areas for the future. It was partially funded by the National Institutes of Health Big Data to Knowledge Program, and the National Science Foundation Division of Mathematical Sciences.

Big Data: Improving the Scope, Quality and Accessibility of Financial Data

By |

The Office of Financial Research and the University of Michigan will host a joint conference, “Big Data: Improving the Scope, Quality, and Accessibility of Financial Data” in Ann Arbor, Michigan.  The conference will bring together a wide range of scholars, regulators, policymakers, and practitioners to explore how Big Data can be used to enhance financial stability and address other challenges in financial markets.

MIDAS Seminar: Sandy Pentland, MIT, on “Social Physics: Harnessing Big Data and Machine Learning in Support of Human Goals” — Dec. 11

By | Educational, Events

The convergence of big data and machine learning is giving rise to a new computational social science. Dr. Sandy Pentland, Director of the MIT Connection Science and Human Dynamics labs, will describe some of the insights gained in the areas of collective action, management, and decision science, and how his group is integrating these insights into monitoring and shaping investment in support of the UN’s Sustainable Development Goals, and formulation of new privacy and security regulations in both the EU and US.

U-M plays leading role in NSF Regional Big Data Hub

By | General Interest, News

A “big data brain trust” has been established by the National Science Foundation to bring together industry, government and academia to accelerate this emerging field and harness it to solve some of society’s toughest problems.

The University of Michigan will play a leading role in the new Midwest Big Data Innovation Hub—one of four that NSF has set up across the nation. U-M is one of five universities that will lead the Midwest hub. Professor Brian Athey, co-director of U-M’s Michigan Institute for Data Science, will lead the effort at U-M.

“We’re thrilled to be a part of this effort, and are looking forward to establishing dynamic partnerships that will coordinate big data expertise and resources to improve the region’s quality of life,” said Athey, who is the Michael Savageau Collegiate Professor and chair of the Department of Computational Medicine & Bioinformatics in the U-M Medical School and also a professor of psychiatry and internal medicine.

These hubs aim to develop partnerships that will use big data to address region-specific problems. Athey will lead a subgroup of the Midwest Hub that will address health sciences. H.V. Jagadish, U-M professor of electrical engineering and computer sciences, will lead a subgroup on transportation.

The Midwest Hub will focus its efforts in three areas:

  • Society, including smart cities and communities; network science; and business analytics
  • The natural and built world, including water, food and energy; digital agriculture; transportation; and advanced manufacturing
  • Health care and biomedical research

Other universities involved in the Midwest Hub are Illinois, Indiana, North Dakota and Iowa State. Partners include the city of Detroit, Ford Motor Co., General Motors, Domino’s Pizza, TechTown Detroit, Quicken Loans and the Henry Ford Health System.

The NSF award provides $1.25 million to set up the framework for bringing partners together to develop, plan and support regional big data partnerships and activities to address regional challenges.

“The Big Data Hubs program represents a unique approach to improving the impact of data science by establishing partnerships among like-minded stakeholders,” said Jim Kurose, NSF’s head of Computer and Information Science and Engineering. “In doing so, it enables teams of data science researchers to come together with domain experts, with cities and municipalities, and with anchor institutions to establish and grow collaborations that will accelerate progress in a wide range of science and education domains with the potential for great societal benefit.”

For more information:

Midwest Big Data Hub

Michigan Institute for Data Science

Midwest Big Data Hub press release from the University of Illinois

NSF press release

 

Ann Arbor Data Dive — Nov. 14

By | Educational, Events

The annual Ann Arbor Data Dive is scheduled for Nov. 14 at North Quad, 105 S. State St.

According to its website, the Data Dive is about empowering people to use data; connecting expderts and professionals with novices and students; and providing nonprofits and social service organizations the tools to work with data.

The daylong event is open and free to students an community members. Stories and datasets for local nonprofits are provided, and the attendees take on the rest. Data Divers are free to collaborate and work on the data in whatever way they choose. In the end, everyone has hands on experience with real data and nonprofits receive all ideas or results.

Agenda:

8:30am-9:00am Light breakfast and check-in
9:00am-10:00am Introductions to Data Dive and our 2015 clients
10:00am-1:00pm Work time
1:00pm-2:00pm Lunch is available
2:00pm-5:00pm Work time
5:00pm-6:00pm Wrap up data work/create presentation
6:00pm-6:30pm Snacks are available
6:30pm-7:30pm Presentations
7:30pm-9:00pm Post-Dive informal outings

For more information and to register, visit http://a2datadive.org/.

Slides/video available from MIDAS kickoff symposium

By | News

Slides and video are now available from the kickoff symposium for the Michigan Institute for Data Science (MIDAS) held on Oct. 6, 2015, in the Rackham Auditorium.

U-M speakers included Provost Martha Pollack, Interim Vice President for Research S. Jack Hu, MIDAS co-directors Brian Athey and Al Hero, and Associate Vice President for Advanced Research Computing Eric Michielssen.

The keynote address, titled “Privacy and Reproducibility in Data Science,” was given by Daniel Goroff, Vice President of the Alfred P. Sloan Foundation.

RFPs available for MIDAS Challenge Thrust awards — Nov. 30 submission deadline

By | General Interest, News

The Michigan Institute for Data Science (MIDAS) is pleased to announce the first competition for MIDAS Challenge Thrust awards. These awards are intended to stimulate research in key areas identified at the recent symposium and will lay the foundation for future funding from government, private foundations, or industry.

Requests for Proposals (RFPs) are available for awards in Learning Analytics and Data Science for Transportation. Up to two projects will be funded at a level of approximately $1.25 million each in both of these Challenge Thrust areas.

View the Requests for Proposals here.

White papers describing project goals and teaming arrangements are due November 30, 2015, and full proposals are due January 18, 2016. Awards will be announced on February 15, 2016.

Successful research projects will cut across disciplines, have the potential for disruptive impact in the field, and hold promise for advancing the methodological foundations of data science. Interested researchers can learn more about these two MIDAS Challenge Thrust areas and connect with potential collaborators at four upcoming town hall meetings.

Learning Analytics

  • Wednesday, October 21, 2015, 5:00 p.m. – 6:30 p.m., 1109 François-Xavier Bagnoud  (FXB) Building
  • Tuesday, November 17, 2015, 5:00 p.m. – 6:30 p.m., Kalamazoo Room, Michigan League

Data Science for Transportation

  • Thursday, October 22, 2015, 5:00 p.m. – 6:30 p.m., 1311 EECS
  • Tuesday, November 10, 2015, 5:00 p.m. – 6:30 p.m., Rackham Amphitheater

RFPs for the MIDAS Challenge Thrust awards in the Social Science and Health Science areas will be released in early 2016.

For more information, email midas-rfp@umich.edu or visit midas.umich.edu/rfp.