Published January 7, 2013 | Version v1
Journal article Open

Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study Designs

  • 1. University of Chicago

Description

Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ∼8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.

Files

journal.pone.0053608.pdf

Files (8.2 MB)

Name Size Download all
Article
md5:86cd2135062ec48b1510c8aa201c1a5a
877.0 kB Preview Download
Supporting information
md5:de6d58aa9f7b4a56a11393b75f353e4f
7.3 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pone.0053608
Other
oai:uchicago.tind.io:10590

Funding

National Institutes of Health
HL092206
Digestive Diseases Research Core Center
pilot and feasibility grant

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics