Published May 27, 2016 | Version v1
Journal article Open

Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

Description

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

Data availability

The following data sets were generated:
Wang SH Raj A Shim H Gilad Y Pritchard JK Stephens M Engelmann B Li YI Harpak A (2015) Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE75290). http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75290

The following previously published data sets were used:
Khan Z (2015) Mass-spectrometry measurements in 60 human LCLs Publicly available at ProteomeXchange (accession no: PXD001406). http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD001406
Battle A Khan Z Wang SH Mitrano A Ford MJ Pritchard JK Gilad Y (2015) Impact of regulatory variation from RNA to protein Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE61742). http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61742
Lappalainen T Dermitzakis E (2013) RNA-seq measurementsin 86 human LCLs Publicly available at EMBL European Bioinformatics Institute (accession no: E-GEUV-1). http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/

Files

Thousands-of-novel-translated-open-reading-frames-in-humans-inferred-by-ribosome-footprint-profiling.pdf

Files (2.5 MB)

Name Size Download all
Supplementary file
md5:8923d450a765575a11abf445f5d2e5c4
65.2 kB Download
Article
md5:b564605fdce0912107a80e28a0846988
2.4 MB Preview Download

Additional details

Identifiers

DOI
10.7554/eLife.13328
Other
oai:uchicago.tind.io:5748

Funding

National Institutes of Health
HG007036
National Institutes of Health
MH084703
National Institutes of Health
HG02585
Howard Hughes Medical Institute

UChicago Information

Division(s)
Biological Sciences Division, Physical Sciences Division
Department(s)
Human Genetics, Statistics