Research highlights: Running climate models in the cloud

By | General Interest, News, Research

Xianglei Huang

Can cloud computing systems help make climate models easier to run? Assistant research scientist Xiuhong Chen and MICDE affiliated faculty Xianglei Huang, from Climate and Space Sciences and Engineering (CLASP), provide some answers to this question in an upcoming issue of Computers & Geoscience (Vol. 98, Jan. 2017, online publication link: http://dx.doi.org/10.1016/j.cageo.2016.09.014).

Teaming up with co-authors Dr. Chaoyi Jiao and Prof. Mark Flanner, also in CLASP, as well as Brock Palen and Todd Raeker from U-M’s Advanced Research Computing – Technology Services (ARC-TS), they compared the reliability and efficiency of Amazon’s Web Service – Elastic Compute 2 (AWS EC2) with U-M’s Flux high performance computing (HPC) cluster in running the Community Earth System Model (CESM), a flagship climate model in the U.S. developed by the National Center for Atmospheric Research.

The team was able to run the CESM in parallel on an AWS EC2 virtual cluster with minimal packaging and code compiling effort, finding that the AWS EC2 can render a parallelization efficiency comparable to Flux, the U-M HPC cluster, when using up to 64 cores. When using more than 64 cores, the communication time between virtual EC2 nodes exceeded the communication time in Flux.

Until now, climate and earth systems simulations had relied on numerical model suites that run on thousands of dedicated HPC cores for hours, days or weeks, depending on the size and scale of each model. Although these HPC resources have the advantage of being supported and maintained by trained IT support staff, making them easier to use them, they are expensive and not readily available to every investigator that needs them.

Furthermore, the systems within reach are sometimes not large enough to run simulations at the desired scales. Commercial cloud systems, on the other hand, are cheaper and accessible to everyone, and have grown significantly in the last few years. One potential drawback of cloud systems is that the user needs to provide and install all the software and the IT expertise needed to run the simulations’ packages.

Chen and Huang’s work represents an important firstxiangleihuangpost2016 step in the use of cloud computing in large-scale climate simulations. Now, cloud computing systems can be considered a viable alternate option to traditional HPC clusters for computational research, potentially allowing researchers to leverage the computational power offered by a cloud environment.

This study was sponsored by the Amazon Climate Initiative through a grant awarded to Prof. Huang. The local simulation in U-M was made possible by a DoE grant awarded to Prof. Huang.

Top image: http://www.cesm.ucar.edu/

Video, slides available: “Advanced Research Computing at Michigan, An Overview,” Brock Palen, ARC-TS

By | General Interest, News

Video (http://myumi.ch/aAG7x) and slides (http://myumi.ch/aV7kz) are now available from Advanced Research Computing – Technology Services (ARC-TS) Associate Director Brock Palen’s presentation “Advanced Research Computing at Michigan, An Overview.”

Palen gave the talk on June 27, 2016, outlining the resources and services available from ARC-TS as well as from off-campus resource providers.

Student groups can access Flux at no charge under Flux Academic Use program

By | Educational | No Comments

Undergraduate groups can now access Flux, U-M’s shared computing cluster, at no cost under the new Flux Academic Use program from Advanced Research Computing (ARC). Flux Academic Use aims to provide undergraduates with experience in high performance computing and access to computational resources for their projects; it is not meant for faculty-led research. Jobs submitted under Flux Academic Use will run only when unused cycles are available. To be most efficient, student groups should use short or checkpointed jobs to take advantage of these available cycles. Student groups can also purchase Flux allocations for jobs that are higher priority or time constrained; those allocations can also work in conjunction with the free Flux Academic Use jobs. Undergraduate groups must have a faculty sponsor to be eligible for Flux Academic Use. For more information, or to request time under Flux Academic Use, please email hpc-support@umich.edu.

HPC outage due to power maintenance, Saturday, April 2

By | Events

The Modular Data Center, which houses the Flux HPC cluster, will be without power starting from 6 a.m. to approximately 6 p.m. on Saturday, April 2, for preventative maintenance of electrical infrastructure on North Campus.  Additional networking maintenance for campus storage systems is going to start at 11pm Saturday night at the Administrative Services Building machine room which will also impact the HPC clusters.

Therefore, we expect Flux, Armis, /scratch and transfer hosts to be out of service from 6 a.m. until at least midnight.

During the outage, annual preventative maintenance on the MDC will also take place. ARC-TS will also take advantage of the outage to install firmware updates to our new InfiniBand switch.

We will update the outage schedule as needed on Twitter at @ARCTS_UM.

New nodes added to increase capacity of Flux HPC cluster

By | General Interest, News

As part of regular upgrades to our high performance computing resources, Advanced Research Computing – Technology Services (ARC-TS) has increased the capacity of the Flux HPC cluster.

These newest (seventh generation) nodes include:

  • 167 Standard Flux nodes with 24 cores and 128 GB RAM each
  • 4 Large Memory Flux nodes with 56 cores and 1.5 TB RAM each
  • Connected via EDR (100 Gbps) Infiniband
  • Adds a total of 4,232 cores and 27 TB RAM to Flux.

For complete descriptions of the various Flux configurations, see our HPC Resources page.

The additional resources allow Flux to handle more jobs simultaneously and increases the number of cores and amount of memory on the largest nodes.

Jobs can request the new cores by using the standard PBS options (http://arc-ts.umich.edu/software/torque/) and/or requesting the node property “haswell”. Please note that the new nodes are likely to be popular and jobs that request the new nodes may wait longer to start than jobs with less-specific resource requirements.

New ARC-TS program offers free cycles on Flux to undergraduates

By | General Interest, News

Undergraduates working on research that requires high performance computing resources can now use the Flux HPC cluster at no cost.

Flux is the shared computing cluster available across campus, operated by Advanced Research Computing – Technology Services (ARC-TS). Under ARC-TS’s new Flux for Undergraduates program, student groups and individuals with faculty sponsors can access unused computing cycles on Flux for free.

The first student group to take advantage of this program is the Michigan Data Science Team, which was created in Fall 2015 with the goal of helping U-M students enter Big Data competitions. The team enters competitions through sites like Kaggle, and is one of the first such teams affiliated with a university.

The group’s organizer, Jonathan Stroud, a Computer Science and Engineering graduate student, said team members were maxing out the capabilities of their laptops when they first started.

“For the first couple of competitions, we made sure we picked a problem that people could do on their laptops,” Stroud said. “Still, every night before bed, they would set up their experiments and they ran all night.”

L-R: Anthony Kremin, Ben Bray, Wei Lee, Curtis Fenner, Jimmy Hsu, Alex Chojnacki, Alexander Zaitzeff, Jonathan Stroud, Jared Webb, Tianpei Xie, Helena Zeng, Xiang Li, Xinyu Tan, Jianming Sang, Guangsha Shi

He said success in the data science competitions typically depends on trying several approaches simultaneously, which can be taxing on computing resources. Stroud said the team typically uses software such as Python, R, and Matlab. Team members come from a wide range of disciplines, including Engineering, Applied Math, Physics, and one from the Music School, Stroud said.

Jacob Abernethy, assistant professor of Electrical Engineering and Computer Science, is the group’s faculty advisor. He wrote some funding for the group into his NSF CAREER proposal that was awarded in 2015. He said after the group’s first competition, he surveyed the students as to what worked and what didn’t. He said one of the clearest responses was the need for more robust computing resources.

“Our top two competitors talked about maxing out the resources on not only their own laptop, but also on the clusters provided them by their advisors,” Abernethy said. “It became clear that we needed to talk about Flux.”

He said a key method to the machine learning and data science experimentation process is the use of cross-validation, that is, testing the performance of a set of parameters on several subsets of data simultaneously. “This leads to a very obvious need for a distributed system in which we can execute a large number of ‘embarrassingly parallel’ tasks quickly,” Abernethy said.

Being able to use Flux “has been helping us a lot,” Stroud added. “We’ve been contacted by other schools to see how they can do the same thing.”

Jobs submitted under Flux For Undergraduates will run only when unused cycles are available and will be requeued when those resources are needed by standard Flux jobs. To be most efficient, student groups should use short or checkpointed jobs to take advantage of these available cycles.

Student groups can also purchase Flux allocations for jobs that are higher priority or time constrained; those allocations can also work in conjunction with the free Flux for Undergraduates jobs.

“The goal is to provide undergraduates with experience in high performance computing, and access to computational resources for their projects,” said Brock Palen, Associate Director of ARC-TS.

Undergraduate groups and individuals must have sponsorship from a faculty member. To request resources through Flux for Undergraduates, please fill out this form. An abstract of the intended activity must be submitted.

Questions can be directed to arc-contact@umich.edu.

HPC workshops (introductory, intermediate, and advanced) scheduled for Jan. 19 through Feb. 9

By | Educational, Events
A series of on-campus high performance computing workshops sponsored by ARC will be held in the coming weeks:

HPC100 — Introduction to the Linux Command Line for HPC
Tuesday, Jan. 19, 9 a.m. – noon
Thursday, Jan. 21, 1 – 4 p.m.
All sessions in B250 East Hall
This course will familiarize students with the basics of accessing and interacting with high-performance computers using the GNU/Linux operating system’s command line. For more information, and to register, visit this page. (Please sign up for only one session.)

HPC101 — High Performance Computing Workshop
Wednesday, Jan.20, 1 – 5 p.m.
Wednesday, Jan. 27, 1 – 5 p.m.
All sessions in B250 East Hall
This course provides an overview of cluster computing in general and how to use the Flux cluster in particular. (Prerequisite: HPC 100 or equivalent.)
For more information, and to register, visit this page. (Please sign up for only one session.)

HPC201 — Advanced High Performance Computing Workshop
Friday, Feb. 2, 1 – 5 p.m., B250 East Hall
Tuesday, Feb. 9, 1 – 5 p.m., B254 East Hall
This course will cover some more advanced topics in cluster computing on the U-M Flux Cluster. Topics to be covered include a review of common parallel programming models and basic use of Flux; dependent and array scheduling; advanced troubleshooting and analysis using checkjob, qstat, and other tools; use of common scientific applications including Python, MATLAB, and R in parallel environments; parallel debugging and profiling of C and Fortran code, including logging, gdb (line-oriented debugging), ddt (GUI-based debugging) and map (GUI-based profiling) of MPI and OpenMP programs; and an introduction to using GPUs. (Prerequisite: HPC101 or equivalent.)
For more information, and to register, visit this page. (Please sign up for only one session.)

HPC User Meetup — Fri., Dec. 18

By | Educational, Events

Users of high performance computing resources are invited to meet Flux operators and support staff in person at an upcoming user meeting:

  • Friday, December 18, 1-5pm 1180 Duderstadt Center

There is not a set agenda; come at any time and stay as long as you please. You can come and talk about your use of any sort of computational resource, Flux, Hadoop, XSEDE, Amazon, or other.

Ask any questions you may have. The Flux staff will work with you on your specific projects, or just show you new things that can help you optimize your research.

This is also a good time to meet other researchers doing similar work.

This is open to anyone interested; it is not limited to Flux users.

Examples potential topics:

• What ARC-TS services are there, and how to access them?
• How to make the most of PBS and learn its features specific to your work?
• I want to do X, do you have software capable of it?
• What is special about GPU/Xeon Phi/Accelerators?
• Are there resources for people without budgets?
• I want to apply for grant X, but it has certain limitations. What support can ARC-TS provide?
• I want to learn more about the compiler and debugging?
• I want to learn more about performance tuning, can you look at my code with me?
• Etc.

Researchers in School of Public Health now pay lower costs for Flux

By | News

The School of Public Health will now share the costs of access to the Flux shared computing cluster with its researchers. The Medical School, the College of Engineering, and the College of Literature, Science and the Arts also share costs of Flux access.

Updated rates for SPH researchers are can be found on the Ordering Services page of the Advanced Research Computing – Technology Services website.

Please email hpc-support@umich.edu with any questions.