Published March 4, 2024 | Version v1
Journal article Open

Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess

  • 1. Toyota Technological Institute at Chicago
  • 2. University of Chicago

Description

Plasmids alter microbial evolution and lifestyles by mobilizing genes that often confer fitness in changing environments across clades. Yet our ecological and evolutionary understanding of naturally occurring plasmids is far from complete. Here we developed a machine-learning model, PlasX, which identified 68,350 non-redundant plasmids across human gut metagenomes and organized them into 1,169 evolutionarily cohesive 'plasmid systems' using our sequence containment-aware network-partitioning algorithm, MobMess. Individual plasmids were often country specific, yet most plasmid systems spanned across geographically distinct human populations. Cargo genes in plasmid systems included well-known determinants of fitness, such as antibiotic resistance, but also many others including enzymes involved in the biosynthesis of essential nutrients and modification of transfer RNAs, revealing a wide repertoire of likely fitness determinants in complex environments. Our study introduces computational tools to recognize and organize plasmids, and uncovers the ecological and evolutionary patterns of diverse plasmids in naturally occurring habitats through plasmid systems.

Data availability

Reproducible analyses of reference plasmids and chromosomes are available at https://doi.org/10.5281/zenodo.5732024. The PlasX model as well as our analyses of known and predicted plasmids are available at https://doi.org/10.5281/zenodo.5843600. For all metagenomes, we have compiled the contigs, taxonomic abundances and PlasX scores at https://doi.org/10.5281/zenodo.8175278, gene calls at https://doi.org/10.5281/zenodo.5730987 and gene annotations at https://doi.org/10.5281/zenodo.5731658. We have deposited long and short sequencing reads from B. fragilis isolates into the NCBI Sequence Read Archive (PRJNA782184). We obtained a list of 16,168 plasmids from the 2019_03_05 version of PLSDB. We also downloaded the entire collection of 13,471 complete bacterial genome assemblies from NCBI RefSeq (accessed 26 October 2019), using instructions at https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#allcomplete. We also downloaded the more recent 2021_06_23_v2 version of PLSDB, which contains 34,513 plasmid sequences. We downloaded the collection of all ICE sequences (n = 552) from ICEberg 2.0 (https://db-mml.sjtu.edu.cn/ICEberg/; accessed 30 September 2022). We also downloaded 455 prophage sequences from the NCBI Virus data portal (https://www.ncbi.nlm.nih.gov/labs/virus; accessed 30 September 2022). We downloaded fastq files for 1,782 short-read and paired-end metagenomes from the NCBI Sequence Read Archive using the program 'fastq-dump'. The metagenomes and original studies are listed in Supplementary Table 7. We annotated antibiotic-resistance genes using two databases. First, we searched against a database of resistance protein family HMMs from Resfams (v1.2, dated 27 January 2015; 'Core' database at http://www.dantaslab.org/resfams). Second, we ran rgi (v5.2.0; https://github.com/arpcard/rgi) to search for similarity in the CARD database of resistance genes.

We have released two open-source software packages, PlasX100 and MobMess101, along with detailed installation and usage instructions. We used the program anvi-run-workflow with --workflow contigs implemented70 in anvi'o71 v7.1, which uses Snakemake72 to execute previously defined steps (https://merenlab.org/anvio-workflows/) and to generate anvi'o contigs-db files (https://anvio.org/m/contigs-db). These steps include first running Prodigal73 to call genes and then running DIAMOND74 v2.0 and HMMER75 v3.3 on amino acid sequences to determine gene functions against the Clusters of Orthologous Groups (COGs)30 and Protein Family Database31 models (Pfams) v32.0, respectively. We clustered genes using MMSeqs2 (ref. 76; v10.6d92c); identified sequence subtypes using mash79 (v2.2.2); analysed plasmids using PlasClass20 (v0.1.0-2-gb80a4f4), PPR-Meta32, Platon33, Deeplasmid27 (Docker image sha256:10809927e2c8a14cf86231801b804b0bd4bddf600821d17fd8b7e41a15c562c0) and MOB-suite25 (v3.0.1); visualized networks using Cytoscape94 (v3.8); performed taxonomic assignment using Kraken 2 (ref. 87; v2.1.2) and Bracken88 (v2.5); and ran correlation analysis using FastSpar89 (v1.0.0).

Files

Diverse-plasmid-systems-and-their-ecology-across-human-gut-metagenomes-revealed-by-PlasX-and-MobMess.pdf

Additional details

Identifiers

DOI
10.1038/s41564-024-01610-3
Other
oai:uchicago.tind.io:11291

Funding

NIDDK
RC2 DK122394
Simons Foundation
687269
Sloan Foundation

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Medicine, Microbiology
Center(s) or Institute(s)
Marine Biological Laboratory