New HPC resources to replace Flux and updates to Armis are coming. They will run a new scheduling system (Slurm). You will need to learn the commands in this system and update your batch files to successfully run jobs. Read on to learn the details and how to get training and adapt your files.
In anticipation of these changes, ARC-TS has created the test cluster “Beta,” which will provide a testing environment for the transition to Slurm. Slurm will be used on Great Lakes; the Armis HIPAA-aligned cluster; and a new cluster called “Lighthouse” which will succeed the Flux Operating Environment in early 2019.
Currently, Flux and Armis use the Torque (PBS) resource manager and the Moab scheduling system; when completed, Great Lakes and Lighthouse will use the Slurm scheduler and resource manager, which will enhance the performance and reliability of the new resources. Armis will transition from Torque to Slurm in early 2019.
The Beta test cluster is available to all Flux users, who can login via ssh at ‘beta.arc-ts.umich.edu’. Beta has its own /home directory, so users will need to create or transfer any files they need, via scp/sftp or Globus.
Support staff from ARC-TS and individual academic units will conduct several in-person and online training sessions to help users become familiar with Slurm. We have been testing Slurm for several months, and believe the performance gains, user communications, and increased reliability will significantly improve the efficiency and effectiveness of the HPC environment at U-M.
The tentative time frame for replacing or transitioning current ARC-TS resources is:
- Flux to Great Lakes, first half of 2019
- Armis from Torque to Slurm, January 2019
- Flux Operating Environment to Lighthouse, first half of 2019
- Open OnDemand on Beta, which replaces ARC Connect for web-based job submissions, Jupyter Notebooks, Matlab, and additional software packages, fall 2018