Published September 27, 2024 | Version v1
Journal article Open

Utilization of a natural language processing-based approach to determine the composition of artifact residues

  • 1. University of Chicago

Description

Background: Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.

Results: Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.

Conclusions: Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.

Data availability

All scripts and datasets used in this study are freely available on GitHub: https://github.com/tungprime/NLP_and_composition_of_artifact_residues.

Files

Utilization-of-a-natural-language-processing-based-approach-to-determine-the-composition-of-artifact-residues.pdf

Additional details

Identifiers

DOI
10.1186/s12859-024-05888-2
Other
oai:uchicago.tind.io:13599

Funding

Burroughs Wellcome Fund
Postdoctoral Enrichment Award
National Science Foundation
1906607
National Science Foundation
1419506

UChicago Information

Division(s)
Biological Sciences Division, Physical Sciences Division
Department(s)
Mathematics, Molecular Genetics and Cell Biology