Global impact of unproductive splicing on human gene expression

Fair, Benjamin; Buen Abad Najar, Carlos F.; Zhao, Junxing; Lozano, Stephanie; Reilly, Austin; Mossian, Gabriela; Staley, Jonathan P.; Wang, Jingxin; Li, Yang I.

doi:10.6082/314g5-etm33

Published September 2, 2024 | Version v1

Journal article Open

Global impact of unproductive splicing on human gene expression

1. University of Chicago

Alternative splicing (AS) in human genes is widely viewed as a mechanism for enhancing proteomic diversity. AS can also impact gene expression levels without increasing protein diversity by producing 'unproductive' transcripts that are targeted for rapid degradation by nonsense-mediated decay (NMD). However, the relative importance of this regulatory mechanism remains underexplored. To better understand the impact of AS–NMD relative to other regulatory mechanisms, we analyzed population-scale genomic data across eight molecular assays, covering various stages from transcription to cytoplasmic decay. We report threefold more unproductive splicing compared with prior estimates using steady-state RNA. This unproductive splicing compounds across multi-intronic genes, resulting in 15% of transcript molecules from protein-coding genes being unproductive. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are as often associated with NMD-induced expression level differences as with differences in protein isoform usage. Our findings suggest that much of the impact of AS is mediated by NMD-induced changes in gene expression rather than diversification of the proteome.

Data availability

Publicly available data sequence data generated as part of this study (naRNA-seq and H3K36ME3 Cut&Tag) is publicly available and has been deposited in Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/) under accession GSE252006. Other publicly data utilized in this study included genotypes were downloaded from the 1000 Genomes project (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/). FASTQ files of short-read RNA-seq data of shRNA dKD of SMG6 and SMG7 in HeLa cells and shRNA controls from a previous study (SRA accession SRP083135) are available at https://www.ncbi.nlm.nih.gov/sra. Other publicly available short-read sequencing data are described in Supplementary Fig. 1. For these datasets, we obtained FASTQ files of standard short-read RNA-seq data from the following accession numbers: ENA project accession PRJEB3365 (steady-state RNA-seq produced by the GEUVADIS consortium), SRA project accession PRJNA268086 (H3K4me1, H3K4me3 and H2K27ac ChIP-seq) and SRA project accession PRJNA302818 (4sU RNA-seq). The 3′ sequencing APA data were obtained as a sample by peak expression matrix from authors of a previous study (SRA accession number SRP223759). The data were aligned to GRCh38 and transcript release v34 annotations from Gencode (https://www.gencodegenes.org/human/). Some analyses (Supplementary Methods) also utilized v37 annotations.

Pipelines and all original code are available at Zenodo via https://doi.org/10.5281/zenodo.12571961 (ref. 85) and also at Github via https://github.com/bfairkun/ChromatinSplicingQTLs/.

Files