Next Generation HPC? What Spark, TensorFlow, and Chapel are teaching us about large-scale numerical computing

Tue., March 22, 2016

For years, the academic science and engineering community was almost alone in pursuing very large-scale numerical computing, and MPI – the 1990s-era message passing library – was the lingua franca for such work. But starting in the mid-2000s, others became interesting in large-scale computing on data. First internet-scale companies like Google and Yahoo! started performing fairly basic analytics tasks at enormous scale, and now many others are tackling increasingly complex and data-heavy machine-learning computations, which involve very familiar scientific computing tasks such as linear algebra, unstructured mesh decomposition, and numerical optimization. But these new communities have created programming environments which emphasize what we’ve learned about computer science and programmability since 1994 – with greater levels of abstraction and encapsulation, separating high-level computation from the low-level implementation details, and some in HPC are starting to notice. This talk will give a brief introduction to Apache Spark environment and Google’s Tensor Flow machine-learning package for high-level numerical computation, as well as the HPC-focussed Chapel language from Cray, to show where each can be used today and how they might be used in the future. The slides for this talk, and examples for each package along with a virtual machine which can be used for running them, will be available at https://github.com/ljdursi/Spark-Chapel-TF-UMich-2016 .

References:
– “HPC Is Dying, and MPI Is Killing It”: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/
– Apache Spark: http://spark.apache.org
– Chapel: http://chapel.cray.com
– TensorFlow: https://www.tensorflow.org

Jonathan Dursi s200_jonathan.dursi_(Scientific Associate and Software Engineering, Ontario Institute for Cancer Research) has spent over twenty-five years using scientific computing at the largest scale to advance science and R&D across a range of disciplines. He received his Ph.D. in astrophysics from the University of Chicago in 2004, doing very large-scale supernova simulations with the DOE ASCI ASAP program at the Flash Centre, and since then has been: a senior research associate at the Canadian Institute for Theoretical Astrophysics; an analyst for SciNet, Canada’s largest HPC centre; then served a secondment as the first, interim, CTO for Compute Canada (which is a bit like Canada’s XSEDE); then in 2015 moved into cancer bioinformatics at the Ontario Institute for Cancer Research, where is a Scientific Associate and Software Engineer at the Ontario Institute for Cancer Research.  At OICR, he currently works with both very large traditional genomics data sets for the international TCGA+ICGC Pancancer consortium, and still manages to do some number-crunching for a new genome sequencing device, the Oxford Nanopore Technologies MinION, which generates floating-point signal strength data as a strand of DNA passes through a nanopore.