Published September 2, 2024 | Version v1
Journal article Open

Global impact of unproductive splicing on human gene expression

Description

Alternative splicing (AS) in human genes is widely viewed as a mechanism for enhancing proteomic diversity. AS can also impact gene expression levels without increasing protein diversity by producing 'unproductive' transcripts that are targeted for rapid degradation by nonsense-mediated decay (NMD). However, the relative importance of this regulatory mechanism remains underexplored. To better understand the impact of AS–NMD relative to other regulatory mechanisms, we analyzed population-scale genomic data across eight molecular assays, covering various stages from transcription to cytoplasmic decay. We report threefold more unproductive splicing compared with prior estimates using steady-state RNA. This unproductive splicing compounds across multi-intronic genes, resulting in 15% of transcript molecules from protein-coding genes being unproductive. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are as often associated with NMD-induced expression level differences as with differences in protein isoform usage. Our findings suggest that much of the impact of AS is mediated by NMD-induced changes in gene expression rather than diversification of the proteome.

Data availability

Publicly available data sequence data generated as part of this study (naRNA-seq and H3K36ME3 Cut&Tag) is publicly available and has been deposited in Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/) under accession GSE252006. Other publicly data utilized in this study included genotypes were downloaded from the 1000 Genomes project (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/). FASTQ files of short-read RNA-seq data of shRNA dKD of SMG6 and SMG7 in HeLa cells and shRNA controls from a previous study (SRA accession SRP083135) are available at https://www.ncbi.nlm.nih.gov/sra. Other publicly available short-read sequencing data are described in Supplementary Fig. 1. For these datasets, we obtained FASTQ files of standard short-read RNA-seq data from the following accession numbers: ENA project accession PRJEB3365 (steady-state RNA-seq produced by the GEUVADIS consortium), SRA project accession PRJNA268086 (H3K4me1, H3K4me3 and H2K27ac ChIP-seq) and SRA project accession PRJNA302818 (4sU RNA-seq). The 3′ sequencing APA data were obtained as a sample by peak expression matrix from authors of a previous study (SRA accession number SRP223759). The data were aligned to GRCh38 and transcript release v34 annotations from Gencode (https://www.gencodegenes.org/human/). Some analyses (Supplementary Methods) also utilized v37 annotations.

Pipelines and all original code are available at Zenodo via https://doi.org/10.5281/zenodo.12571961 (ref. 85) and also at Github via https://github.com/bfairkun/ChromatinSplicingQTLs/.

Files

Global-impact-of-unproductive-splicing-on-human-gene-expression.pdf

Files (29.1 MB)

Name Size Download all
Article
md5:82cf558aa3a2e5a30cf24cf4ec096462
16.3 MB Preview Download
md5:ef3979c29e605318126991a0831af25f
12.7 MB Preview Download

Additional details

Identifiers

DOI
10.1038/s41588-024-01872-x
Other
oai:uchicago.tind.io:13326

Funding

National Institutes of Health
R01GM130738
National Institutes of Health
R01HG011067
National Institutes of Health
R35GM147498
GREGoR Consortium
W. M. Keck Foundation

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics, Medicine