Funding Organization: National Science Foundation, Division of Advanced Cyberinfrastructure

Funding Program: Campus Cyberinfrastructure — Data, Networking and Innovation (CC*DNI)

Term: Sept. 1, 2015 – Aug. 31, 2020 (estimated)

Amount: $4.9 million

Abstract: Every field of science generates and utilizes data in various forms: programs, instrument outputs, papers, notes, applications, simulations, video and audio recordings, etc. The continuing and evolving challenge for scientists is how to store, access, transform, manage and curate the variety of data required to effectively conduct their research, and transparently share it with other researchers across campus or at other institutions. The OSiRIS project is addressing that challenge by combining an object-based software-defined storage technology with a monitored, managed network infrastructure to give scientists a distributed storage system which allows them to directly access their data from resources at any of the participating institutions. Furthermore, OSiRIS utilizes each institution’s existing authentication infrastructure to allow scientists to provide controlled access to their data across all participating institutions. By documenting and publishing designs, code, and operational experiences, the OSiRIS project serves as a replicable model for supporting data-intensive, multi-institutional science collaborations.

OSiRIS implements a Ceph-based petabyte-scale distributed data system by deploying object storage servers at each participating institution, connecting them via a managed high speed network, and distributing data based on the specific requirements of each science research domain. Ceph, an open source storage platform, supports multiple data access methods (traditional file, native object, and block), and allows configuration of access, replication, distribution, and integrity on a per-research-domain basis. OSiRIS is built on low-cost, commodity hardware and can deliver gigabytes per second of I/O bandwidth per node. The system monitors and manages the network paths between its partner institutions, science users and Ceph storage components by strategically deploying perfSONAR instances which have been augmented with a network discovery, monitoring, and management platform (Network Management Abstraction Layer). Globus Online servers provide access to data from outside OSiRIS. In addition, OSiRIS leverages Ceph’s software defined storage aspects to automate some data-lifecycle management tasks.

Principal InvestigatorsShawn McKee is the principal investigator of the OSiRIS project. He is a research scientist in the U-M Department of Physics, and Director of the Center for Network and Storage Enabled Collaborative Computational Science in the Michigan Institute for Computational Discovery and Engineering. Contact: smckee@umich.edu

Co-PIs:

  • Douglas Swany, Indiana University
  • Patrick Gossman, Wayne State University
  • Kenneth Merz, Michigan State University

Top Level Code and Documentation Repository: https://github.com/ShawnMcKee/OSiRIS