Workshop co-chaired by MIDAS co-director Prof. Hero releases proceedings on inference in big data

By | Al Hero, Educational, General Interest, Research | No Comments

The National Academies Committee on Applied and Theoretical Statistics has released proceedings from its June 2016 workshop titled “Refining the Concept of Scientific Inference When Working with Big Data,” co-chaired by Alfred Hero, MIDAS co-director and the John H Holland Distinguished University Professor of Electrical Engineering and Computer Science.

The report can be downloaded from the National Academies website.

The workshop explored four key issues in scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models.

The workshop brought together statisticians, data scientists and domain researchers from different biomedical disciplines in order to identify new methodological developments that hold significant promise, and to highlight potential research areas for the future. It was partially funded by the National Institutes of Health Big Data to Knowledge Program, and the National Science Foundation Division of Mathematical Sciences.

Combining simulation and experimentation yields complex crystal nanoparticle

By | General Interest, News, Research | No Comments

The most complex crystal designed and built from nanoparticles has been reported by researchers at Northwestern University and the University of Michigan. The work demonstrates that some of nature’s most complicated structures can be deliberately assembled if researchers can control the shapes of the particles and the way they connect using DNA.

The U-M researcher is Sharon C. Glotzer, the John W. Cahn Distinguished University Professor of Engineering and the Stuart W. Churchill Collegiate Professor of Chemical Engineering. The work is published in the March 3 issue of Science. ARC’s computational resources supported the work.

MIDAS announces second round of Data Science Challenge Initiative awards, in health and social science

By | Uncategorized | No Comments

Five research projects — three in health and two in social science — have been awarded funding in the second round of the Michigan Institute for Data Science Challenge Initiative program.

The projects will receive funding from MIDAS as part of the Data Science Initiative announced in fall 2015.

The goal of the multiyear MIDAS Challenge Initiatives program is to foster data science projects that have the potential to prompt new partnerships between U-M, federal research agencies and industry. The challenges are focused on four areas: transportation, learning analytics, social science and health science. For more information, visit midas.umich.edu/challenges.

The projects, determined by a competitive submission process, are:

  • Title: Michigan Center for Single-Cell Genomic Data Analysis
    Description: The center will establish methodologies to analyze sparse data collected from single-cell genome sequencing technologies. The center will bring together experts in mathematics, statistics and computer science with biomedical researchers.
    Lead researchers: Jun Li, Department of Human Genetics; Anna Gilbert, Mathematics
    Research team: Laura Balzano, Electrical Engineering and Computer Science; Justin Colacino, Environmental Health Sciences; Johann Gagnon-Bartsch, Statistics; Yuanfang Guan, Computational Medicine and Bioinformatics; Sue Hammoud, Human Genetics; Gil Omenn, Computational Medicine and Bioinformatics; Clay Scott, Electrical Engineering and Computer Science; Roman Vershynin, Mathematics; Max Wicha, Oncology.
  • Title: From Big Data to Vital Insights: Michigan Center for Health Analytics and Medical Prediction (M-CHAMP)
    Description: The center will house a multidisciplinary team that will confront a core methodological problem that currently limits health research — exploiting temporal patterns in longitudinal data for novel discovery and prediction.
    Lead researchers: Brahmajee Nallamothu, Internal Medicine; Ji Zhu, Statistics; Jenna Wiens, Electrical Engineering and Computer Science; Marcelline Harris, Nursing.
    Research team: T. Jack Iwashyna, Internal Medicine; Jeffrey McCullough, Health Management and Policy (SPH); Kayvan Najarian, Computational Medicine and Bioinformatics; Hallie Prescott, Internal Medicine; Andrew Ryan, Health Management and Policy (SPH); Michael Sjoding, Internal Medicine; Karandeep Singh, Learning Health Sciences (Medical School); Kerby Shedden, Statistics; Jeremy Sussman, Internal Medicine; Vinod Vydiswaran, Learning Health Sciences (Medical School); Akbar Waljee, Internal Medicine.
  • Title: Identifying Real-Time Data Predictors of Stress and Depression Using Mobile Technology
    Description: Using an app platform that integrates signals from both mobile phones and wearable sensors, the project will collect data from over 1,000 medical interns to identify the dynamic relationships between mood, sleep and circadian rhythms. These relationships will be utilized to inform the type and timing of personalized data feedback for a mobile micro-randomized intervention trial for depression under stress.
  • Lead researchers: Srijan Sen, Psychiatry; Margit Burmeister, Molecular and Behavioral Neuroscience.
    Research team:  Lawrence An, Internal Medicine; Amy Cochran, Mathematics; Elena Frank, Molecular and Behavioral Neuroscience; Daniel Forger, Mathematics; Thomas Insel (Verily Life Sciences); Susan Murphy, Statistics; Maureen Walton, Psychiatry; Zhou Zhao, Molecular and Behavioral Neuroscience.
  • Title: Computational Approaches for the Construction of Novel Macroeconomic Data
    Description: This project will develop an economic dataset construction system that takes as input economic expertise as well as social media data; will deploy a data construction service that hosts this construction tool; and will use this tool and service to build an “economic datapedia,” a compendium of user-curated economic datasets that are collectively published online.
    Lead researcher: Matthew Shapiro, Department of Economics
    Research team: Michael Cafarella, Computer Science and Engineering; Jia Deng, Electrical Engineering and Computer Science; Margaret Levenstein, Inter-university Consortium for Political and Social Research.
  • Title: A Social Science Collaboration for Research on Communication and Learning based upon Big Data
    Description: This project is a multidisciplinary collaboration meant to introduce social scientists, computer scientists and statisticians to the methods and theories of engaging observational data and the results of structured data collections in two pilot projects in the area of political communication and one investigating parenting issues. The projects involve the integration of geospatial, social media and longitudinal data.
    Lead researchers: Michael Traugott, Center for Political Studies, ISR; Trivellore Raghunathan, Biostatistics
    Research team: Leticia Bode, Communications, Georgetown University; Ceren Budak, U-M School of Information; Pamela Davis-Keane, U-M Psychology, ISR; Jonathan Ladd, Public Policy, Georgetown; Zeina Mneimneh, U-M Survey Research Center; Josh Pasek, U-M Communications; Rebecca Ryan, Public Policy, Georgetown; Lisa Singh, Public Policy, Georgetown; Stuart Soroka, U-M Communications.

For more details, see the press releases on the social science and health science projects.

MIDAS to host faculty meeting on NSF BIGDATA solicitation

By | Uncategorized | No Comments

The Michigan Institute for Data Science (MIDAS) will hold a faculty meeting at noon on Thursday, January 19 (Suite 7625, School of Public Health I, 1415 Washington Heights) for the NSF 17-534 “Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA)” solicitation.

The meeting will include an overview of the NSF solicitation, U-M Data Science Resources (MIDAS, CSCAR, ARC-TS) available to faculty responding to the NSF call, and an opportunity to network with other faculty.

MIDAS has also arranged for Sylvia Spengler, NSF CISE Program Director, to be available at 1:30 pm to answer questions regarding the BIGDATA solicitation.

We invite you to participate in the faculty meeting to share your ideas and interest in responding to this BIGDATA solicitation as well as interact with other faculty looking to respond to this funding mechanism.

For those unable to participate in person, you can join virtually using GoToMeeting:

A box lunch will be provided at the faculty meeting.  Your RSVP (https://goo.gl/forms/OYAuB8mWCOlx3fw73) is appreciated.

ARC Director Sharon Broude Geva elected vice-chair of Coalition for Academic Scientific Computing

By | Uncategorized | No Comments

Sharon Broude Geva, the Director of Advanced Research Computing at the University of Michigan, has been elected vice-chair of the Coalition for Academic Scientific Computation (CASC).

Founded in 1989, CASC advocates for the use of advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success. The organization’s members represent 83 institutions of higher education and national labs.

The vice-chair position is one of four elected CASC executive officers. The officers work closely as a team with the director of CASC. The vice-chair also leads CASC meeting program committees, is responsible for recruitment of new members, substitutes for the chair in his or her absences, and assists with moderating CASC meetings.

Geva served as CASC secretary in 2015 and 2016. Her term as vice-chair is effective for the 2017 calendar year.

The other executive officers for 2017 are are Rajendra Bose, Chair, Columbia University; Neil Bright, Secretary, Georgia Institute of Technology; and Andrew Sherman, Treasurer, Yale University. Curt Hillegas of Princeton University is immediate past chair.

Video, slides available from U-M presentations at SC16

By | Uncategorized | No Comments

Several University of Michigan researchers and research IT staff made presentations at the SC16 conference in Salt Lake City Nov. 13-17. Material from many of the talks is now available for viewing online:

  • Shawn McKee (Physics) and Ben Meekhof (ARC-TS) presented a demonstration of the Open Storage Research Infrastructure (OSiRIS) project at the U-M booth. The demonstration extended the OSiRIS network from its participating institutions in Michigan to the conference center in Utah. Meekhof also presented at a”Birds of a Feather” session on Ceph in HPC environments. More information, including slides, is available on the OSiRIS website.
  • Todd Raeker (ARC-TS) made a presentation on ConFlux, U-M’s new computational physics cluster, at the NVIDIA booth. Slides and video are available.
  • Nilmini Abeyratne, a Ph.D student in computer science, presented her project “Low Design-Risk Checkpointing Storage Solution for Exascale Supercomputers” at the Doctoral Showcase. A summary, slides, and poster can be viewed on the SC16 website.
  • Jeremy Hallum (ARC-TS) presented information on the Yottabyte Research Cloud at the U-M booth. His slides are available here.

Other U-M activity at the conference included Sharon Broude Geva, Director of Advanced Research Computing, participating in a panel titled “HPC Workforce Development: How Do We Find Them, Recruit Them, and Teach Them to Be Today’s Practitioners and Tomorrow’s Leaders?”; Quentin Stout (EECS) and Christiane Jablonowski (CLASP) teaching the “Parallel Computing 101” tutorial.

NVIDIA accepting applications for Graduate Fellowship Program

By | Uncategorized | No Comments

NVIDIA has launched its 16th Annual Graduate Fellowship Program, which awards grants and technical support to graduate students who are doing outstanding GPU-based research.

This year NVIDIA is especially seeking doctoral students pushing the envelope in artificial intelligence, deep neural networks, autonomous vehicles, and related fields. The Graduate Fellowship awards are now up to $50,000 per student. These grants will be awarded in the 2017-2018 academic year.

Since its inception in 2002, the NVIDIA Graduate Fellowship Program has awarded over 130 Ph.D. graduate students with grants that have helped accelerate their research efforts.

The NVIDIA Graduate Fellowship Program is open to applicants worldwide. The deadline for submitting applications is Jan. 16, 2017. Eligible graduate students will have already completed their first year of Ph.D. level studies in the areas of computer science, computer engineering, system architecture, electrical engineering or a related area. In addition, applicants must also be engaged in active research as part of their thesis work.

For more information on eligibility and how to apply, visit http://research.nvidia.com/relevant/graduate-fellowship-program or email fellowship@nvidia.com.

Blue Waters accepting proposals for allocations, fellowships, and undergrad internships

By | Uncategorized | No Comments

The GLCPC (Great Lakes Consortium for Petascale Computation) recently posted its call for proposals. Researchers from member institutions (including the University of Michigan) are eligible to apply for a Blue Waters allocation.  The application deadline is Friday, December 2nd.  More information can be found at: http://www.greatlakesconsortium.org/2016cfp.htm

Applications are also being accepted for Blue Waters Fellowships. Applications are due February 3, 2017. More information is available at: https://bluewaters.ncsa.illinois.edu/fellowships

Applications are now being accepted for Blue Waters undergraduate internships. Applications are due February 3, 2017.  More information is available at: https://bluewaters.ncsa.illinois.edu/internships

U-M professor Quentin Stout, a veteran of all 28 Supercomputing conferences, reflects on SC through the years

By | Uncategorized | No Comments
StoutJablonowski

Quentin Stout and Christian Jablonowski teaching the Parallel Computing 101 tutorial at SC07.

Quentin Stout, University of Michigan Professor of Computer Science and Engineering (CSE) and Climate and Space Sciences and Engineering (CLaSP), has attended all 28 of the Supercomputing conferences since the event begin in 1988. Stout is one of less than 20 so-called “SC Perennials” to have attended every one. He, along with Christiane Jablonowski, associate professor in CLaSP, have taught the Introduction to Parallel Computing tutorial at the conference for many years and are teaching it again this year. Stout, who has been at U-M since 1984, recently answered some questions about the evolution of the field of computer science and the area of supercomputing over the decades.

Question: What was the first SC conference like, and how has it changed over the years?

Stout: The first conference, in 1988, had about 1,500 people, compared to the over 10,000 now. Its focus was on supercomputing and the large centers at DOE, NASA, NSF, etc., along with the companies that were making these systems. There were also some researchers from academia and a few industrial users. The largest supercomputer user, NSA, had people at the conference, but they didn’t have a booth and their badge listed “Fort Meade” as their affiliation.

Over the years it has greatly broadened its scope to have a much broader international focus and more participation by universities, cluster vendors of all sizes, networking, storage, commercial software, educational efforts, etc. …

Originally I went to learn more about the field, meet people, see what the emerging areas were, and learn about the latest machines. I still go for these reasons, but now machines and software are improving in a more evolutionary fashion than the somewhat revolutionary changes at the beginning. Going from serial computers to vector or parallel ones was more exciting and groundbreaking than going from 100,000 cores to 1,000,000, though the latter is still challenging. Some things have stayed the same: the parties are still good, and companies are still entering and leaving the supercomputing area. For quite some time, if I brought home a coffee mug from a company, the company would go bankrupt in a few years. More recently, IBM developed the BlueGene series of machines, and grabbed the #1 spot in the top 500 rating of machines, but then dropped out of the market because it wasn’t selling enough machines to recoup the tremendous design cost.

One thing that has happened in computing field, not just the conference, is that scientific computing has a far smaller share of the market, even if you only consider the market for large systems. There have always been large database systems in corporations, but data analytics has greatly expanded the possibilities for profit, and hence there is more investment.

Question: What do you predict for the future of supercomputing?

Stout: The most “super” computers aren’t really single computers, but systems such as Google where they are continually processing a vast number of queries, answering them in fractions of a second by using sophisticated algorithms that combining myriad sources from throughout the world, all run on highly tuned systems that keep running even though they have so many components that they are always having to deal with faulty ones. The production users of supercomputers tend to submit a job, let it run for a long time, analyze the results (perhaps using sophisticated graphics), fix some errors or change some parameters, repeat. This isn’t the same as systems which are constantly ingesting data, analyzing it using algorithms that incorporate learning components, responding to increasingly complex queries. Academics, including some at U-M, are involved in this, but it is difficult to create even a scaled down version of a complete system in an academic computing center. You can view IBM’s Watson as being in this arena, and IBM is now betting that Watson will be a large part of its future.

Here’s an interesting cycle in computing: for over a decade some computational scientists and engineers have been using GPUs (graphics processing units). They are very difficult to use, and only applicable to certain types of problems, but inexpensive in terms of flops/$. However, many scientific computations require double precession arithmetic, which isn’t needed for graphics. Companies like NVIDIA, responding to the scientific computing market, began producing more expensive GPUs with double precision, and now systems such as U-M’s Flux computing cluster include GPUs on some of their boards.

However, there is a very rapidly growing demand for “deep learning.” The computationally intensive components of this can be run on GPUs relatively easily, but they don’t need double precision, just speed and plenty of parallelism. This summer NVIDIA released a new high-end chip with good double precision performance, but also added half precision, since that is all that is needed for deep learning. Deep learning might well surpass scientific computing as a GPU market.

 [NOTE: Visit the University of Michigan at SC16 at booth 1543.]