Exploring and mitigating shortcomings in single-cell differential expression analysis with a new statistical paradigm
Creators
- 1. University of Chicago
- 2. University of Michigan
Description
Background: Differential expression analysis is pivotal in single-cell transcriptomics for unraveling cell-type–specific responses to stimuli. While numerous methods are available to identify differentially expressed genes in single-cell data, recent evaluations of both single-cell–specific methods and methods adapted from bulk studies have revealed significant shortcomings in performance. In this paper, we dissect the four major challenges in single-cell differential expression analysis: excessive zeros, normalization, donor effects, and cumulative biases. These "curses" underscore the limitations and conceptual pitfalls in existing workflows.
Results: To address the limitations of current single-cell differential expression analysis methods, we propose GLIMES, a statistical framework that leverages UMI counts and zero proportions within a generalized Poisson/Binomial mixed-effects model to account for batch effects and within-sample variation. We rigorously benchmarked GLIMES against six existing differential expression methods using three case studies and simulations across different experimental scenarios, including comparisons across cell types, tissue regions, and cell states. Our results demonstrate that GLIMES is more adaptable to diverse experimental designs in single-cell studies and effectively mitigates key shortcomings of current approaches, particularly those related to normalization procedures. By preserving biologically meaningful signals, GLIMES offers improved performance in detecting differentially expressed genes.
Conclusions: By using absolute RNA expression rather than relative abundance, GLIMES improves sensitivity, reduces false discoveries, and enhances biological interpretability. This paradigm shift challenges existing workflows and highlights the need for careful consideration of normalization strategies, ultimately paving the way for more accurate and robust single-cell transcriptomic analyses.
Data availability
All scRNA-seq datasets used in this study are publicly available. Processed and de-identified human single-cell RNA sequencing data scRNA-seq dataset of post-menopausal fallopian tubes has been deposited at Cellxgene under the following URL: https://cellxgene.cziscience.com/collections/d36ca85c-3e8b-444c-ba3e-a645040c6185. The raw human spinal cord scRNA-seq dataset used in case study 2 is available in the following URL: https://als-st.nygenome.org/. The droplet scRNA-seq data used in case study 3 is deposited under the Gene Expression Omnibus under the accession number GSE96583. The dataset is also available in R through the Bioconductor ExperimentHub package. We provide an R package, GLIMES, implementing Poisson-glmm and Binomial-glmm methods for DE analysis discussed in this study. The GLIMES package is available from GitHub (https://github.com/C-HW/GLIMES) under the BSD 3-Clause license. In addition, the R source code to reproduce all data analysis in the study is available on Zenodo at DOI: https://zenodo.org/records/14279028.Files
Exploring-and-mitigating-shortcomings-in-single-cell-differential-expression-analysis-with-a-new-statistical-paradigm.pdf
Files
(44.0 MB)
| Name | Size | Download all |
|---|---|---|
|
Article md5:a73da2ba92c84365ddd8e21b87be0c4d |
4.9 MB | Preview Download |
|
md5:7642a074762655f6074e77118493eb62
|
39.1 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1186/s13059-025-03525-6
- Other
- oai:uchicago.tind.io:14813
Related works
- Cites
- https://doi.org/10.5281/zenodo.14279028 (URL)
Funding
- National Institutes of Health
- R01 GM126553
- National Institutes of Health
- R01 HG011883
- National Institutes of Health
- HG012927
- National Science Foundation
- 2016307
- Unknown funder
- Sloan Research Fellowship