Gene Fusion Discovery with INTEGRATE Zhang, Jin; Maher, Christopher A
Methods in molecular biology (Clifton, N.J.),
2020, Volume:
2079
Journal Article
Next-generation sequencing (NGS) has become the primary technology for discovering gene fusions. Decreasing NGS costs have resulted in a growing quantity of patients with whole transcriptome ...sequencing (RNA-seq) and whole genome sequencing (WGS) data. We developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read alignment. INTEGRATE has become widely adopted by the larger cancer research community to discover biologically and clinically relevant gene fusions. Here we explain the rationale driving the development of the INTEGRATE tool and describe the detailed practical procedures for applying INTEGRATE to discover gene fusions using NGS data. INTEGRATE can be applied to both combined data and RNA-seq only data.
Recurrent gene fusions, typically associated with haematological malignancies and rare bone and soft-tissue tumours, have recently been described in common solid tumours. Here we use an integrative ...analysis of high-throughput long- and short-read transcriptome sequencing of cancer cells to discover novel gene fusions. As a proof of concept, we successfully used integrative transcriptome sequencing to 're-discover' the BCR-ABL1 (ref. 10) gene fusion in a chronic myelogenous leukaemia cell line and the TMPRSS2-ERG gene fusion in a prostate cancer cell line and tissues. Additionally, we nominated, and experimentally validated, novel gene fusions resulting in chimaeric transcripts in cancer cell lines and tumours. Taken together, this study establishes a robust pipeline for the discovery of novel gene chimaeras using high-throughput sequencing, opening up an important class of cancer-related mutations for comprehensive characterization.
Somatic mutations within non-coding regions and even exons may have unidentified regulatory consequences that are often overlooked in analysis workflows. Here we present RegTools ( www.regtools.org ...), a computationally efficient, free, and open-source software package designed to integrate somatic variants from genomic data with splice junctions from bulk or single cell transcriptomic data to identify variants that may cause aberrant splicing. We apply RegTools to over 9000 tumor samples with both tumor DNA and RNA sequence data. RegTools discovers 235,778 events where a splice-associated variant significantly increases the splicing of a particular junction, across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotate them with the Variant Effect Predictor, SpliceAI, and Genotype-Tissue Expression junction counts and compare our results to other tools that integrate genomic and transcriptomic data. While many events are corroborated by the aforementioned tools, the flexibility of RegTools also allows us to identify splice-associated variants in known cancer drivers, such as TP53, CDKN2A, and B2M, and other genes.
Tumors are typically sequenced to depths of 75x–100x (exome) or 30x–50x (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid, ...or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ∼312x) whole genome sequencing and exome capture (up to ∼433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ∼200,000 putative SNVs by sequencing them to depths of ∼1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ∼250,000x sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP: phs000159).
Display omitted
•Current sequencing strategies are inadequate given the complexity of most tumors•Current analysis strategies perform poorly, missing rare clinically relevant variants•A comprehensive strategy allows for a more definitive model of tumor clonal architecture•We present a comprehensively sequenced and validated case as a community resource
By sequencing a tumor to exceptionally high depth and breadth using multiple platforms, we demonstrate the inability of standard 30x–50x WGS sequencing to capture both low-frequency variants and clonal complexity. We evaluate current state-of-the-art sequencing and analytic techniques and offer a dataset that will serve as a valuable resource for developing the next generation of genomic analysis tools.
Cadmium is a transition metal ion that is highly toxic in biological systems. Although relatively rare in the Earth's crust, anthropogenic release of cadmium since industrialization has increased ...biogeochemical cycling and the abundance of the ion in the biosphere. Despite this, the molecular basis of its toxicity remains unclear. Here we combine metal-accumulation assays, high-resolution structural data and biochemical analyses to show that cadmium toxicity, in Streptococcus pneumoniae, occurs via perturbation of first row transition metal ion homeostasis. We show that cadmium uptake reduces the millimolar cellular accumulation of manganese and zinc, and thereby increases sensitivity to oxidative stress. Despite this, high cellular concentrations of cadmium (~17 mM) are tolerated, with negligible impact on growth or sensitivity to oxidative stress, when manganese and glutathione are abundant. Collectively, this work provides insight into the molecular basis of cadmium toxicity in prokaryotes, and the connection between cadmium accumulation and oxidative stress.
Colorectal cancer (CRC) is the most common gastrointestinal malignancy in the U.S.A. and approximately 50% of patients develop metastatic disease (mCRC). Despite our understanding of long non-coding ...RNAs (lncRNAs) in primary colon cancer, their role in mCRC and treatment resistance remains poorly characterized. Therefore, through transcriptome sequencing of normal, primary, and distant mCRC tissues we find 148 differentially expressed RNAs Associated with Metastasis (RAMS). We prioritize RAMS11 due to its association with poor disease-free survival and promotion of aggressive phenotypes in vitro and in vivo. A FDA-approved drug high-throughput viability assay shows that elevated RAMS11 expression increases resistance to topoisomerase inhibitors. Subsequent experiments demonstrate RAMS11-dependent recruitment of Chromobox protein 4 (CBX4) transcriptionally activates Topoisomerase II alpha (TOP2α). Overall, recent clinical trials using topoisomerase inhibitors coupled with our findings of RAMS11-dependent regulation of TOP2α supports the potential use of RAMS11 as a biomarker and therapeutic target for mCRC.
Despite the increasing quantity of tools for accurately predicting gene fusion candidates from sequencing data, we are still faced with the critical challenge of visualizing the corresponding gene ...fusion products to infer their biological consequence (i.e. novel protein and increased gene expression). This is currently accomplished by manually inspecting and inferring the biological consequence of top scoring gene fusion candidates. This labor-intensive process could be made easier by automating the annotation of gene fusion products and generating easily interpretable visualizations. We developed a gene fusion visualization tool, called INTEGRATE-Vis, that generates comprehensive, highly customizable, publication-quality graphics focused on annotating each gene fusion at the transcript- and protein-level and assessing expression within an individual sample or across a patient cohort. INTEGRATE-Vis is the first comprehensive gene fusion visualization tool to help a user infer the potential consequence of a gene fusion event. It has potential utility in both research and clinical settings. INTEGRATE-Vis is available at https://github.com/ChrisMaherLab/INTEGRATE-Vis .
Recurrent gene fusions are a prevalent class of mutations arising from the juxtaposition of 2 distinct regions, which can generate novel functional transcripts that could serve as valuable ...therapeutic targets in cancer. Therefore, we aim to establish a sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy. Not only did a paired-end approach provide a greater dynamic range in comparison with single read based approaches, but it clearly distinguished the high-level "driving" gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level "passenger" gene fusions. Also, the comprehensiveness of a paired-end approach enabled the discovery of 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded previous approaches. Using the paired-end transcriptome sequencing approach, we observed read-through mRNA chimeras, tissue-type restricted chimeras, converging transcripts, diverging transcripts, and overlapping mRNA transcripts. Last, we successfully used paired-end transcriptome sequencing to detect previously undescribed ETS gene fusions in prostate tumors. Together, this study establishes a highly specific and sensitive approach for accurately and comprehensively cataloguing chimeras within a sample using paired-end transcriptome sequencing.
While miRNAs are increasingly linked to various immune responses, whether they can be targeted for regulating in vivo inflammatory processes such as endotoxin-induced Gram-negative sepsis is not ...known. Production of cytokines by the dendritic cells (DCs) plays a critical role in response to endotoxin, lipopolysaccharide (LPS). We profiled the miRNA and mRNA of CD11c+ DCs in an unbiased manner and found that at baseline, miR-142-3p was among the most highly expressed endogenous miRs while IL-6 was among the most highly expressed mRNA after LPS stimulation. Multiple computational algorithms predicted the IL-6 3′ untranslated region (UTR) to be a target of miR-142-3p. Studies using luciferase reporters carrying wild-type (WT) and mutant IL-6 3′UTR confirmed IL-6 as a target for miR-142-3p. In vitro knockdown and overexpression studies demonstrated a critical and specific role for miR142-3p in regulating IL-6 production by the DCs after LPS stimulation. Importantly, treatment of only WT but not the IL-6–deficient (IL-6−/−) mice with locked nucleic acid (LNA)–modified phosphorothioate oligonucleotide complementary to miR 142-3p reduced endotoxin-induced mortality. These results demonstrate a critical role for miR-142-3p in regulating DC responses to LPS and provide proof of concept for targeting miRs as a novel strategy for treatment of endotoxin-induced mortality.
Transcription occurs across more than 70% of the human genome and more than half of currently annotated genes produce functional noncoding RNAs. Of these transcripts, the majority-long, noncoding ...RNAs (lncRNAs)-are greater than 200 nucleotides in length and are necessary for various roles in the cell. It is increasingly appreciated that these lncRNAs are relevant in both health and disease states, with the brain expressing the largest number of lncRNAs compared to other organs. Glioblastoma (GBM) is an aggressive, fatal brain tumor that demonstrates remarkable intratumoral heterogeneity, which has made the development of effective therapies challenging. The cooperation between genetic and epigenetic alterations drives rapid adaptation that allows therapeutic evasion and recurrence. Given the large repertoire of lncRNAs in normal brain tissue and the well-described roles of lncRNAs in molecular and cellular processes, these transcripts are important to consider in the context of GBM heterogeneity and treatment resistance. Herein, we review the general mechanisms and biological roles of lncRNAs, with a focus on GBM, as well as RNA-based therapeutics currently in development.