Published November 18, 2022 | Version v1
Journal article Open

A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome

  • 1. Rutgers University
  • 2. University of Chicago
  • 3. Boston University
  • 4. Helmholtz Institute for Functional Marine Biodiversity
  • 5. Université Paris Cité

Description

H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is globally used as the M. tuberculosis reference sequence. Here, we present Bact-Builder, a pipeline that uses consensus building to generate complete and accurate bacterial genome sequences and apply it to three independently cultured and sequenced H37Rv aliquots of a single laboratory stock. Two of the 4,417,942 base-pair long H37Rv assemblies are 100% identical, with the third differing by a single nucleotide. Compared to the existing H37Rv reference, the new sequence contains ~6.4 kb additional base pairs, encoding ten new regions that include insertions in PE/PPE genes and new paralogs of esxN and esxJ, which are differentially expressed compared to the reference genes. New sequencing and de novo assemblies with Bact-Builder confirm that all 10 regions, plus small additional polymorphisms, are also present in the commonly used H37Rv strains NR123, TMC102, and H37Rv1998. Thus, Bact-Builder shows promise as an improved method to perform accurate and reproducible de novo assemblies of bacterial genomes, and our work provides important updates to the primary M. tuberculosis reference genome.

Files

Comprehensive-update-to-the-Mycobacterium-tuberculosis-H37Rv-reference-genome.pdf

Files (14.2 MB)

Name Size Download all
Supplementary information
md5:e1fee117d0ca38c05f729bd10fda9822
6.8 MB Preview Download
Peer review file
md5:f19b43bd1822ab46cabd713a31aa570b
414.8 kB Preview Download
Description of additional supplementary files
md5:2d45d7ff535fc103fe2be41d60723f51
84.2 kB Preview Download
Supplementary dataset 1
md5:b5866c4f44da7efb4360395f3fd0dc27
16.3 kB Download
Supplementary dataset 2
md5:22257185decc4541247960723baf9bcb
1.3 MB Preview Download
Supplementary dataset 3
md5:12326a21a961dc0ea38aae1e07e68817
3.0 MB Preview Download
md5:8b1e895adc86615bbaf8f7a13d6d9d9d
9.3 kB Download
Reporting summary
md5:d46d58965a0e79cc713aef233ca43ec0
221.6 kB Preview Download
Article
md5:037cd2fa56fe08a113fb2159cbdcfee6
2.4 MB Preview Download

Additional details

Identifiers

DOI
10.1038/s41467-022-34853-x
Other
oai:uchicago.tind.io:5078

Funding

National Institute of Allergy and Infectious Diseases
U19AI11276
National Institute of Allergy and Infectious Diseases
U19AI162598
National Institutes of Health
R00-GM118907
National Institutes of Health
R01-AI146198
National Institutes of Health
Agilent Early Career Professor Award

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Medicine, Microbiology