Published October 19, 2023 | Version v1
Journal article Open

GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership

Description

Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.

Data availability

The fastTopics R package is available on GitHub (https://github.com/stephenslab/fastTopics) and CRAN (https://cran.r-project.org/package=fastTopics). A Seurat wrapper for fastTopics is available from https://github.com/stephenslab/seurat-wrappers. The data sets supporting the conclusions of this article are available in Zenodo repositories (https://doi.org/10.5281/zenodo.7962782; https://doi.org/10.5281/zenodo.7962831). These Zenodo repositories also include the source code implementing the analyses and workflowr websites (https://doi.org/10.12688/f1000research.20843.1) for browsing the code and results. Permission to use the source code in these repositories is granted under the MIT license. Numerical implementations of the contributed statistical methods, including tools for visualizing the results generated by these methods, are available from the fastTopics R package (http://arxiv.org/abs/2105.13440; https://www.github.com/stephenslab/fastTopics) under the MIT license. The gene sets used in the GSEA were compiled into an R package (https://github.com/stephenslab/pathways), also distributed under the MIT license. All data sets used in the study were obtained from public sources (https://www.10xgenomics.com/support/single-cell-gene-expression; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103354; https://shendurelab.github.io/mouse-atac/; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96772). A description of how these data sets were used is provided in the "Methods" section.

Files

GoM-DE.pdf

Files (67.8 MB)

Name Size Download all
Article
md5:43aadb43e64835ed1fb8bc345981b8a6
5.8 MB Preview Download
Additional files
md5:8b90b6d9c8e0c9d18b28636673c090f7
62.0 MB Preview Download

Additional details

Identifiers

DOI
10.1186/s13059-023-03067-9
Other
oai:uchicago.tind.io:10049

Funding

National Human Genome Research Institute
5R01HG002585

UChicago Information

Division(s)
Biological Sciences Division, Physical Sciences Division
Department(s)
Genetics, Genomics, and Systems Biology, Human Genetics, Medicine, Statistics