Published January 25, 2023
| Version v1
Journal article
Open
Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry
Description
Mass spectrometry is a vital tool in the analytical chemist's toolkit, commonly used to identify the presence of known compounds and elucidate unknown chemical structures. All of these applications rely on having previously measured spectra for known substances. Computational methods for predicting mass spectra from chemical structures can be used to augment existing spectral databases with predicted spectra from previously unmeasured molecules. In this paper, we present a method for prediction of electron ionization–mass spectra (EI–MS) of small molecules that combines physically plausible substructure enumeration and deep learning, which we term rapid approximate subset-based spectra prediction (RASSP). The first of our two models, FormulaNet, produces a probability distribution over chemical subformulae to achieve a state-of-the-art forward prediction accuracy of 92.9% weighted (Stein) dot product and database lookup recall (within top 10 ranked spectra) of 98.0% when evaluated against the NIST 2017 Mass Spectral Library. The second model, SubsetNet, produces a probability distribution over vertex subsets of the original molecule graph to achieve similar forward prediction accuracy and superior generalization in the high-resolution, low-data regime. Spectra predicted by our best model improve upon the previous state-of-the-art spectral database lookup error rate by a factor of 2.9×, reducing the lookup error (top 10) from 5.7 to 2.0%. Both models can train on and predict spectral data at arbitrary resolution. Source code and predicted EI–MS spectra for 73.2M small molecules from PubChem will be made freely accessible online.
Data availability
The code for this work can be found at github.com/thejonaslab/rassp-public. Predicted spectra for the 70M+ small molecules in PubChem can be found at spectroscopy.ai.Files
Rapid-Approximate-Subset-Based-Spectra-Prediction-for-Electron-Ionization-Mass-Spectrometry.pdf
Files
(2.3 MB)
| Name | Size | Download all |
|---|---|---|
|
Supporting information md5:a5eefd8bd505e433b2b3ba945f689d03 |
540.4 kB | Preview Download |
|
Article md5:11d6bce542c274c5cd35f06c3de6e4d9 |
1.7 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1021/acs.analchem.2c02093
- Other
- oai:uchicago.tind.io:5439
Funding
- University of Chicago
- Startup funds