Published January 25, 2023 | Version v1
Journal article Open

Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry

  • 1. University of Chicago

Description

Mass spectrometry is a vital tool in the analytical chemist's toolkit, commonly used to identify the presence of known compounds and elucidate unknown chemical structures. All of these applications rely on having previously measured spectra for known substances. Computational methods for predicting mass spectra from chemical structures can be used to augment existing spectral databases with predicted spectra from previously unmeasured molecules. In this paper, we present a method for prediction of electron ionization–mass spectra (EI–MS) of small molecules that combines physically plausible substructure enumeration and deep learning, which we term rapid approximate subset-based spectra prediction (RASSP). The first of our two models, FormulaNet, produces a probability distribution over chemical subformulae to achieve a state-of-the-art forward prediction accuracy of 92.9% weighted (Stein) dot product and database lookup recall (within top 10 ranked spectra) of 98.0% when evaluated against the NIST 2017 Mass Spectral Library. The second model, SubsetNet, produces a probability distribution over vertex subsets of the original molecule graph to achieve similar forward prediction accuracy and superior generalization in the high-resolution, low-data regime. Spectra predicted by our best model improve upon the previous state-of-the-art spectral database lookup error rate by a factor of 2.9×, reducing the lookup error (top 10) from 5.7 to 2.0%. Both models can train on and predict spectral data at arbitrary resolution. Source code and predicted EI–MS spectra for 73.2M small molecules from PubChem will be made freely accessible online.

Data availability

The code for this work can be found at github.com/thejonaslab/rassp-public. Predicted spectra for the 70M+ small molecules in PubChem can be found at spectroscopy.ai.

Files

Rapid-Approximate-Subset-Based-Spectra-Prediction-for-Electron-Ionization-Mass-Spectrometry.pdf

Files (2.3 MB)

Name Size Download all
Supporting information
md5:a5eefd8bd505e433b2b3ba945f689d03
540.4 kB Preview Download
Article
md5:11d6bce542c274c5cd35f06c3de6e4d9
1.7 MB Preview Download

Additional details

Identifiers

DOI
10.1021/acs.analchem.2c02093
Other
oai:uchicago.tind.io:5439

Funding

University of Chicago
Startup funds

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computational and Applied Mathematics, Computer Science, Statistics