Published February 12, 2024 | Version v1
Journal article Open

Accounting for isoform expression increases power to identify genetic regulation of gene expression

  • 1. University of Chicago
  • 2. University of California, Los Angeles

Description

A core problem in genetics is molecular quantitative trait locus (QTL) mapping, in which genetic variants associated with changes in the molecular phenotypes are identified. One of the most-studied molecular QTL mapping problems is expression QTL (eQTL) mapping, in which the molecular phenotype is gene expression. It is common in eQTL mapping to compute gene expression by aggregating the expression levels of individual isoforms from the same gene and then performing linear regression between SNPs and this aggregated gene expression level. However, SNPs may regulate isoforms from the same gene in different directions due to alternative splicing, or only regulate the expression level of one isoform, causing this approach to lose power. Here, we examine a broader question: which genes have at least one isoform whose expression level is regulated by genetic variants? In this study, we propose and evaluate several approaches to answering this question, demonstrating that "isoform-aware" methods—those that account for the expression levels of individual isoforms—have substantially greater power to answer this question than standard "gene-level" eQTL mapping methods. We identify settings in which different approaches yield an inflated number of false discoveries or lose power. In particular, we show that calling an eGene if there is a significant association between a SNP and any isoform fails to control False Discovery Rate, even when applying standard False Discovery Rate correction. We show that similar trends are observed in real data from the GEUVADIS and GTEx studies, suggesting the possibility that similar effects are present in these consortia.

Data availability

The code used for simulations, preprocessing, running methods, and evaluating results can be found at https://github.com/nlapier2/isoQTL. All data used in this manuscript is publicly available with the exception of the GTEx genotypes, which must be obtained by request as described in https://gtexportal.org/home/protectedDataAccess. The GTEx expression data, covariates, and population information can be obtained from https://gtexportal.org/home/datasets. The GEUVADIS genotype data can be obtained online from https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/files/ and the expression data can be obtained online from https://www.internationalgenome.org/data-portal/data-collection/geuvadis. Gene annotations were obtained via the Ensembl GRCh38 build 88 GTF file, which can be obtained from http://ftp.ensembl.org/pub/release-88/gtf/homo_sapiens/.

Files

journal.pcbi.1011857.pdf

Files (2.3 MB)

Name Size Download all
Article
md5:274130b5fe9682eebcef4c223df27462
1.8 MB Preview Download
Supporting information
md5:b00844d148dc050fa04796353eb2e4a2
501.8 kB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1011857
Other
oai:uchicago.tind.io:11253

Funding

Hanna H Gray
fellows program
Sloan
fellows program
National Science Foundation
2106908
National Institutes of Health
U01HG011715
National Institutes of Health
R56HG010812

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics