Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess
- 1. Toyota Technological Institute at Chicago
- 2. University of Chicago
Description
Data availability
Reproducible analyses of reference plasmids and chromosomes are available at https://doi.org/10.5281/zenodo.5732024. The PlasX model as well as our analyses of known and predicted plasmids are available at https://doi.org/10.5281/zenodo.5843600. For all metagenomes, we have compiled the contigs, taxonomic abundances and PlasX scores at https://doi.org/10.5281/zenodo.8175278, gene calls at https://doi.org/10.5281/zenodo.5730987 and gene annotations at https://doi.org/10.5281/zenodo.5731658. We have deposited long and short sequencing reads from B. fragilis isolates into the NCBI Sequence Read Archive (PRJNA782184). We obtained a list of 16,168 plasmids from the 2019_03_05 version of PLSDB. We also downloaded the entire collection of 13,471 complete bacterial genome assemblies from NCBI RefSeq (accessed 26 October 2019), using instructions at https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#allcomplete. We also downloaded the more recent 2021_06_23_v2 version of PLSDB, which contains 34,513 plasmid sequences. We downloaded the collection of all ICE sequences (n = 552) from ICEberg 2.0 (https://db-mml.sjtu.edu.cn/ICEberg/; accessed 30 September 2022). We also downloaded 455 prophage sequences from the NCBI Virus data portal (https://www.ncbi.nlm.nih.gov/labs/virus; accessed 30 September 2022). We downloaded fastq files for 1,782 short-read and paired-end metagenomes from the NCBI Sequence Read Archive using the program 'fastq-dump'. The metagenomes and original studies are listed in Supplementary Table 7. We annotated antibiotic-resistance genes using two databases. First, we searched against a database of resistance protein family HMMs from Resfams (v1.2, dated 27 January 2015; 'Core' database at http://www.dantaslab.org/resfams). Second, we ran rgi (v5.2.0; https://github.com/arpcard/rgi) to search for similarity in the CARD database of resistance genes.
We have released two open-source software packages, PlasX100 and MobMess101, along with detailed installation and usage instructions. We used the program anvi-run-workflow with --workflow contigs implemented70 in anvi'o71 v7.1, which uses Snakemake72 to execute previously defined steps (https://merenlab.org/anvio-workflows/) and to generate anvi'o contigs-db files (https://anvio.org/m/contigs-db). These steps include first running Prodigal73 to call genes and then running DIAMOND74 v2.0 and HMMER75 v3.3 on amino acid sequences to determine gene functions against the Clusters of Orthologous Groups (COGs)30 and Protein Family Database31 models (Pfams) v32.0, respectively. We clustered genes using MMSeqs2 (ref. 76; v10.6d92c); identified sequence subtypes using mash79 (v2.2.2); analysed plasmids using PlasClass20 (v0.1.0-2-gb80a4f4), PPR-Meta32, Platon33, Deeplasmid27 (Docker image sha256:10809927e2c8a14cf86231801b804b0bd4bddf600821d17fd8b7e41a15c562c0) and MOB-suite25 (v3.0.1); visualized networks using Cytoscape94 (v3.8); performed taxonomic assignment using Kraken 2 (ref. 87; v2.1.2) and Bracken88 (v2.5); and ran correlation analysis using FastSpar89 (v1.0.0).
Files
Diverse-plasmid-systems-and-their-ecology-across-human-gut-metagenomes-revealed-by-PlasX-and-MobMess.pdf
Files
(42.0 MB)
| Name | Size | Download all |
|---|---|---|
|
Article md5:bb40c7bde5926a9433f8171bad66b5bc |
6.2 MB | Preview Download |
|
md5:25fa4de4ed3d6114bbf5cb23817a9426
|
35.8 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1038/s41564-024-01610-3
- Other
- oai:uchicago.tind.io:11291
Funding
- NIDDK
- RC2 DK122394
- Simons Foundation
- 687269
- Sloan Foundation