This project develops an instrument, called ConFlux, hosted at the University of Michigan, specifically designed to enable High Performance Computing (HPC) clusters to communicate seamlessly and at interactive speeds with data-intensive operations. The project establishes a hardware and software ecosystem to enable large scale data-driven modeling of multiscale physical systems. ConFlux will produce advances in predictive modeling in several disciplines including turbulent flows, materials physics, cosmology, climate science and cardiovascular flow modeling.
A wide range of phenomena exhibit emergent behavior that makes modeling very challenging. In this project, physics-constrained data-driven modeling approaches are pursued to account for the underlying complexity. These techniques require HPC applications (running on external clusters) to interact with large data sets at run time. ConFlux provides low latency communications for in- and out-of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated workflow — rather than in segregated operations — leading to significantly lower latencies, fewer algorithmic barriers and less data movement.
The ConFlux cluster consists of 43 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes and 17 IBM Power8 CPU two-socket “Garrison” compute nodes. Each of the Garrison nodes will also host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each node has a local high-speed flash memory for random access. ConFlux also has 2 large memory nodes (2TB RAM).
All compute and storage is connected via a 100 Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology will provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.
80% of ConFlux resources are restricted to use of research groups within the University of Michigan. The remaining 20% will be allocated to the external community based on data-driven modeling needs.
Course material developed from the usage of ConFlux is being integrated into the educational curriculum via several degree and certificate programs offered by two U-M institutes dedicated to computational and data sciences. Use of the ConFlux cluster will be extended to research groups outside of UM utilizing a number of Extreme Science and Engineering Discovery Environment (XSEDE) bridging tools and file-systems. Connections established through UM’s Office of Outreach and Diversity are being leveraged to extend the use of ConFlux to minority serving institutions and Historically Black Colleges and Universities. Using the programs developed by the Society of Women Engineers at U-M, middle and high school students will be engaged in hands-on educational modules in computing, physics and data.
Funding Organization: National Science Foundation, Division of Advanced Cyberinfrastructure
Funding Program: Major Research Instrumentation (MRI)
Term: Sept. 1, 2015 – Aug. 31, 2018 (estimated)
Amount: $3.46 million
Principal Investigators: Karthik Duraisamy is the principal investigator of the ConFlux project. He is a professor in the U-M Department of Aerospace Engineering, and Director of the Center for Data-Driven Computational Physics in the Michigan Institute for Computational Discovery and Engineering.