Published August 28, 2023 | Version v1
Journal article Open

SFyNCS detects oncogenic fusions involving non-coding sequences in cancer

Description

Fusion genes are well-known cancer drivers. However, most known oncogenic fusions are protein-coding, and very few involve non-coding sequences due to lack of suitable detection tools. We develop SFyNCS to detect fusions of both protein-coding genes and non-coding sequences from transcriptomic sequencing data. The main advantage of this study is that we use somatic structural variations detected from genomic data to validate fusions detected from transcriptomic data. This allows us to comprehensively evaluate various fusion detection and filtering strategies and parameters. We show that SFyNCS has superior sensitivity and specificity over existing algorithms through extensive benchmarking in cancer cell lines and patient samples. We then apply SFyNCS to 9565 tumor samples across 33 tumor types in The Cancer Genome Atlas cohort and detect a total of 165,139 fusions. Among them, 72% of the fusions involve non-coding sequences. We find a long non-coding RNA to recurrently fuse with various oncogenes in 3% of prostate cancers. In addition, we discover fusions involving two non-coding RNAs in 32% of dedifferentiated liposarcomas and experimentally validated the oncogenic functions in mouse model.

Data availability

RNA-Seq data for 9565 tumor and 715 normal samples from TCGA (Supplementary Table S5) were downloaded from Genomic Data Commons (https://portal.gdc.cancer.gov/). RNA-Seq data for MCF7, HCT116 and K562 cell lines were downloaded from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) with accession SRX5414642 (MCF7, CCLE), SRX159831 (MCF7, ENCODE), SRX6378523 (MCF7 Weber et al.), SRX6378524 (MCF7 Weber et al.), SRX5414471 (HCT116, CCLE) and SRX159835 (HCT116, ENCODE), SRX5414683 (K562, CCLE), SRX1603406 (K562, ENCODE) and SRX1603407 (K562, ENCODE). RNA-Seq data for two normal adipose tissue samples (SRX636240, SRX640265) from Genotype-Tissue Expression (GTEx) were downloaded from NCBI SRA. The H3K27ac ChIP-Seq signals for PC-3 cell line (ENCFF224GSO) and prostate gland (ENCFF143LGC) were downloaded from ENCODE portal (https://www.encodeproject.org/). The GTEx RNA-Seq read coverage in the region of NONHSAG108579.1 was downloaded from UCSC (https://genome.ucsc.edu/).

Somatic SVs in TCGA samples were obtained from a recent Pan-cancer Analysis of Whole Genomes (PCAWG) study (26). Somatic SVs in MCF7 were downloaded from the Dependency Map (DepMap) portal (https://depmap.org/portal/). Fusions in TCGA samples identified by Arriba, DEEPEST and STAR-Fusion were downloaded from the related publications (3,12,16). Fusions in MCF7 identified by FusionCatcher (v1.0), InFusion (v0.8), MapSplic2 (v2.2.1), SOAPfuse (v1.2.7) and STAR-Fusion (v1.5.0) were downloaded from the previous study (19). Fusions in MCF7 identified by EasyFuse (v1.3.0) were provided by Dr. Ugur Sahin. The subtypes of sarcomas were obtained from a previous study (33).

All coordinates were based on hg38 reference genome. GENCODE v29 was used for gene annotation. NOCODE v6 and lncRNAKB v7 were used to annotate non-coding genes that are not annotated by GENOCDE.

The SFyNCS package is available at https://github.com/yanglab-computationalgenomics/SFyNCS (permanent DOI 10.5281/zenodo.8222797).

Files

SFyNCS-detects-oncogenic-fusions-involving-non-coding-sequences-in-cancer.pdf

Files (47.1 MB)

Name Size Download all
Graphical abstract
md5:d57e091881e50e6c32a59c67cbed2afb
153.7 kB Download
Supplementary data
md5:a94641b6ccbac9a0e88b1a7b875ae692
45.0 MB Preview Download
Article
md5:734216d2c3251a6783ef5b448e0b88b9
2.0 MB Preview Download

Additional details

Identifiers

DOI
10.1093/nar/gkad705
Other
oai:uchicago.tind.io:7725

Funding

National Institutes of Health
R01CA269977
University of Chicago
Comprehensive Cancer Center

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Ben May Department for Cancer Research, Human Genetics
Center(s) or Institute(s)
Comprehensive Cancer Center