Published June 19, 2024 | Version v1
Journal article Open

A robust model for cell type-specific interindividual variation in single-cell RNA sequencing data

  • 1. University of Chicago

Description

Single-cell RNA sequencing (scRNA-seq) has been widely used to characterize cell types based on their average gene expression profiles. However, most studies do not consider cell type-specific variation across donors. Modelling this cell type-specific inter-individual variation could help elucidate cell type-specific biology and inform genes and cell types underlying complex traits. We therefore develop a new model to detect and quantify cell type-specific variation across individuals called CTMM (Cell Type-specific linear Mixed Model). We use extensive simulations to show that CTMM is powerful and unbiased in realistic settings. We also derive calibrated tests for cell type-specific interindividual variation, which is challenging given the modest sample sizes in scRNA-seq. We apply CTMM to scRNA-seq data from human induced pluripotent stem cells to characterize the transcriptomic variation across donors as cells differentiate into endoderm. We find that almost 100% of transcriptome-wide variability between donors is differentiation stage-specific. CTMM also identifies individual genes with statistically significant stage-specific variability across samples, including 85 genes that do not have significant stage-specific mean expression. Finally, we extend CTMM to partition interindividual covariance between stages, which recapitulates the overall differentiation trajectory. Overall, CTMM is a powerful tool to illuminate cell type-specific biology in scRNA-seq.

Data availability

Processed single cell count data from iPSCs are publicly available from Zenodo: https://zenodo.org/record/3625024#.Xil-0y2cZ0s. OneK1K single-cell gene expression data are publicly available via Gene Expression Omnibus (GSE196830 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE196830]). The simulated datasets and imputed pseudobulk data were fully reproducible using the code provided in the study and the publicly available iPSCs and OneK1K data. All data generated during this study are included in this published article and its supplementary information files.

The CTMM Python package, along with Python (version 3.11.5) and R (version 4.3.1) code used for all analyses in this paper, is available at: https://github.com/Minhui-Chen/CTMM.

Files

Robust-model-for-cell-type-specific-interindividual-variation-in-single-cell-RNA-sequencing-data.pdf

Files (13.1 MB)

Name Size Download all
Article
md5:09db564be30e46900641f5ceb1633405
1.4 MB Preview Download
Supplementary data files
md5:278494d02b3298080c827fbc330da23d
77.9 kB Preview Download
Supplementary information files
md5:9e9a19921b0403adcfafa8c7ae577740
11.6 MB Preview Download

Additional details

Identifiers

DOI
10.1038/s41467-024-49242-9
Other
oai:uchicago.tind.io:12674

Funding

Unknown funder
K25HL157603

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Medicine