MICDE to provide data analysis and dissemination support for $18 million tobacco research center

By | General Interest, Happenings, News, Research

The University of Michigan School of Public Health will house a new, multi-institutional center focusing on modeling and predicting the impact of tobacco regulation, funded with an $18 million federal grant from the National Institutes of Health and the Food and Drug Administration.

The Center for the Assessment of the Public Health Impact of Tobacco Regulations will be part of the NIH and FDA’s Tobacco Centers of Regulatory Science, the centerpiece of an ongoing partnership formed in 2013 to generate critical research that informs the regulation of tobacco products.

The Michigan Institute for Computational Discovery and Engineering (MICDE) will support the center’s Data Analysis and Dissemination core by collecting national and regional survey data, conducting analysis of the use of tobacco products including vaping and e-cigarettes, and disseminate the resulting tobacco modeling parameters to other research centers and the Food and Drug Administration.

The center is led by MICDE affiliated faculty member Rafael Meza, associate professor of Epidemiology, and David Levy, professor of Oncology at Georgetown University.

For more on the center, see the press release from the U-M School of Public Health: https://sph.umich.edu/news/2018posts/tcors-091718.html

New private insurance claims dataset and analytic support now available to health care researchers

By | General Interest, Happenings, HPC, News | No Comments

The Institute for Healthcare Policy and Innovation (IHPI) is partnering with Advanced Research Computing (ARC) to bring two commercial claims datasets to campus researchers.

The OptumInsight and Truven Marketscan datasets contain nearly complete insurance claims and other health data on tens of millions of people representing the US private insurance population. Within each dataset, records can be linked longitudinally for over 5 years.  

To begin working with the data, researchers should submit a brief analysis plan for review by IHPI staff, who will create extracts or grant access to primary data as appropriate.

CSCAR consultants are available to provide guidance on computational and analytic methods for a variety of research aims, including use of Flux and other UM computing infrastructure for working with these large and complex repositories.

Contact Patrick Brady (pgbrady@umich.edu) at IHPI or James Henderson (jbhender@umich.edu) at CSCAR for more information.

The data acquisition and availability was funded by IHPI and the U-M Data Science Initiative.

[SC2] Launch of 2017 Visualization Challenge + Presentation: Simple Data Management with Signac

By |

Simon Adorf (PhD Candidate, Chem. Eng.) will give a presentation about “Simple Data Management with Signac“.

ABSTRACT: We will give a brief introduction to the signac data management framework for agile computational workflows, followed by presenting interactive examples using jupyter notebooks hosted online. The signac framework aids in the management of large and heterogeneous data spaces. It provides a simple and robust data model to create a well-defined indexable storage layout for data and metadata. This makes it easier to operate on large data spaces, streamlines post-processing and analysis and makes data collectively accessible.

Everyone is encouraged to bring a laptop in order to be able to follow along.


The Scientific Computing Student Club, partnered with MICDE, the U-M 3D Lab and NVIDIA, will officially launch the 2017 NVIDIA Visualization Challenge aimed for students to use the latest visualization tools and technology to show their research data in creative ways. The first prize will include sponsorship to show their work at the Supercomputing ’17 Visualization Showcase, and more. Join us at the meeting to learn more.

Sponsored byUM3DLabLogo


ICPSR webinar: Introduction to SEAD 2.0, The Next Generation of SEAD Data Services — Jan. 21

By | Educational, Events

The Inter-university Consortium for Political and Social Research (ICPSR) is hosting a webinar to introduce the next phase of the Sustainable Environmental Actionable Data (SEAD) project. SEAD is an NSF sponsored project to create data services to meet the needs of sustainability science research.

Date: Thursday, Jan. 21, 2016

Time: 1:30 – 2:30 p.m., EST

Presenters: Anna Ovchinnikova, SEAD Training and User Support Specialist, and James Myers, SEAD Co-PI

Registration and more information: Webinar web site.

ARC-TS launches high-speed research storage service

By | General Interest, News

U-M investigators involved in data-intensive research are getting a new tool to help them store, manage and analyze large data sets.

Advanced Research Computing – Technology Services announced that a new service, Turbo Research Storage, is available to researchers on all U-M campuses.

Turbo allows researchers to access their data in place, making real-time analysis of large data sets possible. Learn more or order the Turbo service.

Researchers no longer will need to spend time and resources building their own storage or looking for solutions outside campus. Instead, they can access, process, and analyze data with Turbo, allowing them to focus on their science.

Turbo Research Storage provides scalable storage and is capable of moving data at speeds of up to 40 gigabits per second. This matches the high performance capabilities of Flux, the shared U-M computing cluster.

Turbo also gives researchers the option of two security levels, one for some types of sensitive data and one for non-sensitive data. For questions regarding use of Turbo for sensitive data, please visit the ITS Sensitive Data Guide.

“Turbo provides a solution for researchers looking to take advantage of Big Data, high performance computing, and roaming,” said Brock Palen, associate director of Advanced Research Computing Technology Services (ARC-TS). “It also meets increasing security requirements without the worry about where and how data is stored, accessed, and shared.”

Collaboration and sharing of information are key characteristics of today’s research activities. Turbo is designed for joint work on shared files across a research group, so there is no need for multiple copies of important datasets or complex permission configurations for individual users within a group.

The service is designed to easily connect with Flux, as well as off-campus computing systems and collaborators.

Access to Turbo is limited to researchers. The service is funded by an IT capital request specifically targeted for research use.

Turbo Research Storage is provided by the newly formed ARC-TS, the research computing arm of Information and Technology Services, operating under the auspices of Advanced Research Computing in the Office of Research. ARC-TS is the one-stop destination for delivery of research computing services to researchers across campus.

Turbo is available in increments of 1 terabyte, at a cost of $19.20 per replicated TB per month.

Seminar: Brad Malin, Vanderbilt University: “Learning to Love De-identification in Biomedical Research” — May 11, Palmer Commons

By | Educational, Events
Brad Malin, PhD (@bradmalin) is Vice Chair for Research in the Dept. of Biomedical Informatics and Director of the Health Information Privacy Laboratory at Vanderbilt University. Dr. Malin describes his work as “bringing you better health through data, analytics, and policy.” He adds, “I am not, however, a conventional data scientist. I am driven by a concern that our society lacks the infrastructure to make the most of the data we generate. As such, I complemented my education with training in public policy and management to investigate how biology, computer science, and societal affairs can be blended to maximize the potential.”
Key details
  • “Learning to Love De-identification in Biomedical Research”
  • Brad Malin, PhD, Vanderbilt University (Guest Speaker)
  • Monday, May 11, Noon – 1 pm (Add to Calendar)
  • Palmer Commons, Floor 5, Forum Hall
  • Lunch will be provided (near Floor 5 entrance to Forum Hall)
  • RSVP requested.
The talk is sponsored by the Department of Learning Health Sciences, University of Michigan Medical School.
Over the past several decades, numerous approaches have been developed to remove and obscure patient identifying information in the context of biomedical research.  Generally, this approach to privacy protection, which is often called “de-identification” has been codified in regulations and laws, including the Common Rule and the Privacy Rule of the Health Insurance Portability and Accountability Act of 1996.  There is a now a great opportunity to erect learning health systems on top of de-identified medical record systems; however, there is trepidation because the past decade has also witnessed a number of investigations into how to “re-identify” such information to the patients from whom it was derived. These demonstration attacks have called the strength of such privacy protections into question.  The goal of this talk is to review why re-identification happens both from a computational and policy perspective, the extent to which such violations can be averted using risk analysis strategies, and how we can leverage de-identified patient data en masse to support large scale association studies. In this talk, Brad Malin, PhD, from Vanderbilt University will draw upon his experience in building one of the world’s largest de-identified electronic medical record systems and the experiences of the NIH-sponsored Electronic Medical Records and Genomics (eMERGE) Consortium.

Data will be deleted from /scratch on Flux if unused for 90 days

By | General Interest, News

Over the past several months, a huge amount of data (491 TB) has accumulated in the /scratch directory on the Flux computing cluster. /scratch is meant for data relating to currently running jobs, and the buildup of data is threatening the performance of Flux for all users.

Therefore, ARC will begin deleting data from /scratch that have not been accessed for 90 consecutive days.

Flux account owners with unused data have begun receiving emails warning that their data will be deleted.

Account owners in this situation can move their data to another system such as ITS Value Storage or their own equipment using the dedicated transfer nodes on Flux with high speed network connections available for that purpose.

For more information on Value Storage, see the ITS website.

For more information on transfer nodes, see the ARC website.

If you have any questions, please contact hpc-support@umich.edu.