Venue: 2636 GGBA
Bio: Anoushka is a third-year PhD student in Prof. Venkat Viswanathan’s group at the University of Michigan. Her research interests include machine learning for materials design and electrochemical battery modeling.
Abstract: The paradigm of molecular machine learning for material screening has accelerated material development cycles, improved efficiency, and reduced costs. However, current state-of-the-art molecular property prediction models still require labeled training data generated using wet-lab experiments or Density Functional Theory (DFT) calculations. Their utility is limited by the scarcity and heterogeneity of labeled materials datasets. Foundation models (FMs) offer a solution to this: these models use self-supervised pre-training strategies to leverage unlabeled datasets and learn representations of data that can be applied to downstream tasks. Large unlabeled datasets of billions of synthesizable molecules are readily available. Prior attempts to train FMs for molecular property prediction demonstrate promise; however, equivariant geometric models trained using supervised learning are still more accurate. This can be attributed to the fact that foundation models are extremely expensive to train and can be difficult to interpret; they require huge computing budgets, complex distributed computing techniques, and extensive hyperparameter searches. Our work addresses these challenges on three fronts: (1) we have prototyped a scalable workflow for distributed training of molecular foundation models (2) we have trained large foundation models using this workflow which demonstrates state-of-the-art molecular property prediction capabilities across several benchmarks, and (3) we have applied model interpretability strategies such as the attention visualization to shed insight on molecular structure relationships learn by the transformer.