Published July 30, 2019 | Version v1
Journal article Open

Detailed modeling of positive selection improves detection of cancer driver genes

Description

Identifying driver genes from somatic mutations is a central problem in cancer biology. Existing methods, however, either lack explicit statistical models, or use models based on simplistic assumptions. Here, we present driverMAPS (Model-based Analysis of Positive Selection), a model-based approach to driver gene identification. This method explicitly models positive selection at the single-base level, as well as highly heterogeneous background mutational processes. In particular, the selection model captures elevated mutation rates in functionally important sites using multiple external annotations, and spatial clustering of mutations. Simulations under realistic evolutionary models demonstrate the increased power of driverMAPS over current approaches. Applying driverMAPS to TCGA data of 20 tumor types, we identified 159 new potential driver genes, including the mRNA methyltransferase METTL3-METTL14. We experimentally validated METTL3 as a tumor suppressor gene in bladder cancer, providing support to the important role mRNA modification plays in tumorigenesis.

Data availability

The simulated dataset may be downloaded from Zenodo [https://doi.org/10.5281/zenodo.2932987], and the filtered somatic mutation lists from 20 tumor types that were used as input files for driverMAPS may also be downloaded from Zenodo [https://doi.org/10.5281/zenodo.1209412]. Original somatic mutation list files without filtering were downloaded from TCGA GDAC website (version: analyses__2016_01_28) [https://gdac.broadinstitute.org/]. RNA sequencing and CNVs data for the 20 tumor types were downloaded from cBioPortal [https://www.cbioportal.org/]. Gene annotation data were downloaded from GENCODE (version 19, Feb 2014) [https://www.gencodegenes.org/]. The source data underlying Figs. 6C, 6D, Supplementary Figs. 6 and 7 are provided as a Source Data file.

The driverMAPS software is available from the driverMAPS website [https://szhao06.bitbucket.io/driverMAPS-documentation/docs/download.html]. The source code for driverMAPS is available from the Bitbucket repository [https://bitbucket.org/szhao06/maps]. Other software used in this study are TCGA GDAC firehose_get (version 0.4.6) [http://gdac.broadinstitute.org/] and ANNOVAR (version 2016Feb01) [http://annovar.openbioinformatics.org/en/latest/].

Files

Detailed-modeling-of-positive-selection-improves-detection-of-cancer-driver-genes.pdf

Files (6.7 MB)

Name Size Download all
Source data
md5:6245cfcd4434a32b22d163a4e96f216a
219.3 kB Download
Article
md5:9fa4389ab99babeb95866aa18be10dcb
2.1 MB Preview Download
Supplementary information
md5:e5ebc507c6ef3ae6afa155c24be36f2e
4.3 MB Preview Download

Additional details

Identifiers

DOI
10.1038/s41467-019-11284-9
Other
oai:uchicago.tind.io:5736

Funding

National Institutes of Health
MH110531
National Institutes of Health
HG002585

UChicago Information

Division(s)
Biological Sciences Division, Physical Sciences Division
Department(s)
Computer Science, Human Genetics, Statistics