Stored biological samples with pathology information and medical records are invaluable resources for translational medical research. However, RNAs extracted from the archived clinical tissues are ...often substantially degraded. RNA degradation distorts the RNA-seq read coverage in a gene-specific manner, and has profound influences on whole-genome gene expression profiling.
We developed the transcript integrity number (TIN) to measure RNA degradation. When applied to 3 independent RNA-seq datasets, we demonstrated TIN is a reliable and sensitive measure of the RNA degradation at both transcript and sample level. Through comparing 10 prostate cancer clinical samples with lower RNA integrity to 10 samples with higher RNA quality, we demonstrated that calibrating gene expression counts with TIN scores could effectively neutralize RNA degradation effects by reducing false positives and recovering biologically meaningful pathways. When further evaluating the performance of TIN correction using spike-in transcripts in RNA-seq data generated from the Sequencing Quality Control consortium, we found TIN adjustment had better control of false positives and false negatives (sensitivity = 0.89, specificity = 0.91, accuracy = 0.90), as compared to gene expression analysis results without TIN correction (sensitivity = 0.98, specificity = 0.50, accuracy = 0.86).
TIN is a reliable measurement of RNA integrity and a valuable approach used to neutralize in vitro RNA degradation effect and improve differential gene expression analysis.
We used single cell RNA-Seq to examine molecular heterogeneity in multiple myeloma (MM) in 597 CD138 positive cells from bone marrow aspirates of 15 patients at different stages of disease ...progression. 790 genes were selected by coefficient of variation (CV) method and organized cells into four groups (L1-L4) using unsupervised clustering. Plasma cells from each patient clustered into at least two groups based on gene expression signature. The L1 group contained cells from all MGUS patients having the lowest expression of genes involved in the oxidative phosphorylation, Myc targets, and mTORC1 signaling pathways (p < 1.2 × 10
). In contrast, the expression level of these pathway genes increased progressively and were the highest in L4 group containing only cells from MM patients with t(4;14) translocations. A 44 genes signature of consistently overexpressed genes among the four groups was associated with poorer overall survival in MM patients (APEX trial, p < 0.0001; HR, 1.83; 95% CI, 1.33-2.52), particularly those treated with bortezomib (p < 0.0001; HR, 2.00; 95% CI, 1.39-2.89). Our study, using single cell RNA-Seq, identified the most significantly affected molecular pathways during MM progression and provided a novel signature predictive of patient prognosis and treatment stratification.
Single cell RNA sequencing is a technology that provides the capability of analyzing the transcriptome of a single cell from a population. So far, single cell RNA sequencing has been focused mostly ...on human cells due to the larger starting amount of RNA template for subsequent amplification. One of the major challenges of applying single cell RNA sequencing to microbial cells is to amplify the femtograms of the RNA template to obtain sufficient material for downstream sequencing with minimal contamination. To achieve this goal, efforts have been focused on multiround RNA amplification, but would introduce additional contamination and bias. In this work, we for the first time coupled a microfluidic platform with multiple displacement amplification technology to perform single cell whole transcriptome amplification and sequencing of Porphyromonas somerae, a microbe of interest in endometrial cancer, as a proof-of-concept demonstration of using single cell RNA sequencing tool to unveil gene expression heterogeneity in single microbial cells. Our results show that the bacterial single-cell gene expression regulation is distinct across different cells, supporting widespread heterogeneity.
Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) ...Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.
Tobacco smoking is responsible for over 90% of lung cancer cases, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained ...obscure.
We performed gene expression analysis using HG-U133A Affymetrix chips on 135 fresh frozen tissue samples of adenocarcinoma and paired noninvolved lung tissue from current, former and never smokers, with biochemically validated smoking information. ANOVA analysis adjusted for potential confounders, multiple testing procedure, Gene Set Enrichment Analysis, and GO-functional classification were conducted for gene selection. Results were confirmed in independent adenocarcinoma and non-tumor tissues from two studies. We identified a gene expression signature characteristic of smoking that includes cell cycle genes, particularly those involved in the mitotic spindle formation (e.g., NEK2, TTK, PRC1). Expression of these genes strongly differentiated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue (p<0.001 and fold-change >1.5, for each comparison), consistent with an important role for this pathway in lung carcinogenesis induced by smoking. These changes persisted many years after smoking cessation. NEK2 (p<0.001) and TTK (p = 0.002) expression in the noninvolved lung tissue was also associated with a 3-fold increased risk of mortality from lung adenocarcinoma in smokers.
Our work provides insight into the smoking-related mechanisms of lung neoplasia, and shows that the very mitotic genes known to be involved in cancer development are induced by smoking and affect survival. These genes are candidate targets for chemoprevention and treatment of lung cancer in smokers.
Next generation sequencing (NGS) assays with large targeted gene panels can comprehensively profile cancer somatic mutations in a tumor sample. Given the rapid adoption of such assays for circulating ...tumor DNA (ctDNA) analysis in clinical oncology, it is essential for the community to understand their analytical performance in liquid biopsy settings. Here, we directly compared five ctDNA NGS assays, most of which having a panel of 400 or more genes, with simulated samples harboring mutations relevant to solid tumors or myeloid malignancy. Our results indicate that the detection sensitivity and reproducibility of all five assays was 90% or higher when the mutations were at 0.5% or 1.0% allele frequency, and with optimal DNA input of 30 ng or 50 ng per vendor's protocol. The performances decreased and varied dramatically, when mutations were at a 0.1% allele frequency and/or when a lower genomic input of 10 ng DNA was used. Interestingly, one of the assays repeatedly showed higher rate of false positivity than the others across two different sample sets. Multiple intrinsic technical factors pertaining to the NGS assays were further investigated. Notable differences among the assays were seen for depth of coverage and background noise, which profoundly impacted assay performance. The results derived from this study are highly informative and provide a framework to assess and select suitable assays for specific application in cancer monitoring and potential clinical use.
Gastroblastoma is a rare distinctive biphasic tumor of the stomach. The molecular biology of gastroblastoma has not been studied, and no affirmative diagnostic markers have been developed. We ...retrieved two gastroblastomas from the consultation practices of the authors and performed transcriptome sequencing on formalin-fixed paraffin-embedded tissue. Recurrent predicted fusion genes were validated at genomic and RNA levels. The presence of the fusion gene was confirmed on two additional paraffin-embedded cases of gastroblastoma. Control cases of histologic mimics (biphasic synovial sarcoma, leiomyoma, leiomyosarcoma, desmoid-type fibromatosis, EWSR1-FLI1-positive Ewing sarcoma, Wilms' tumor, gastrointestinal stromal tumor, plexiform fibromyxoma, Sonic hedgehog-type medulloblastomas, and normal gastric mucosa and muscularis propria were also analyzed. The gastroblastomas affected two males and two females aged 9-56 years. Transcriptome sequencing identified recurrent somatic MALAT1-GLI1 fusion genes, which were predicted to retain the key domains of GLI1. The MALAT1-GLI1 fusion gene was validated by break-apart and dual-fusion FISH and RT-PCR. The additional two gastroblastomas were also positive for the MALAT1-GLI1 fusion gene. None of the other control cases harbored MALAT1-GLI1. Overexpression of GLI1 in the cases of gastroblastomas was confirmed at RNA and protein levels. Pathway analysis revealed activation of the Sonic hedgehog pathway in gastroblastoma and gene expression profiling showed that gastroblastomas grouped together and were most similar to Sonic hedgehog-type medulloblastomas. In summary, we have identified an oncogenic MALAT1-GLI1 fusion gene in all cases of gastroblastoma that may serve as a diagnostic biomarker. The fusion gene is predicted to encode a protein that includes the zinc finger domains of GLI1 and results in overexpression of GLI1 protein and activation of the Sonic hedgehog pathway.
Fifteen percent of lung cancer cases occur in never-smokers and show characteristics that are molecularly and clinically distinct from those in smokers. Epidermal growth factor receptor (EGFR) gene ...mutations, which are correlated with sensitivity to EGFR-tyrosine kinase inhibitors (EGFR-TKIs), are more frequent in never-smoker lung cancers. In this study, microRNA (miRNA) expression profiling of 28 cases of never-smoker lung cancer identified aberrantly expressed miRNAs, which were much fewer than in lung cancers of smokers and included miRNAs previously identified (e.g., up-regulated miR-21) and unidentified (e.g., down-regulated miR-138) in those smoker cases. The changes in expression of some of these miRNAs, including miR-21, were more remarkable in cases with EGFR mutations than in those without these mutations. A significant correlation between phosphorylated-EGFR (p-EGFR) and miR-21 levels in lung carcinoma cell lines and the suppression of miR-21 by an EGFR-TKI, AG1478, suggest that the EGFR signaling is a pathway positively regulating miR-21 expression. In the never-smoker-derived lung adenocarcinoma cell line H3255 with mutant EGFR and high levels of p-EGFR and miR-21, antisense inhibition of miR-21 enhanced AG1478-induced apoptosis. In a never-smoker-derived adenocarcinoma cell line H441 with wild-type EGFR, the antisense miR-21 not only showed the additive effect with AG1478 but also induced apoptosis by itself. These results suggest that aberrantly increased expression of miR-21, which is enhanced further by the activated EGFR signaling pathway, plays a significant role in lung carcinogenesis in never-smokers, as well as in smokers, and is a potential therapeutic target in both EGFR-mutant and wild-type cases.
MicroRNAs play a role in regulating diverse biological processes and have considerable utility as molecular markers for diagnosis and monitoring of human disease. Several technologies are available ...commercially for measuring microRNA expression. However, cross-platform comparisons do not necessarily correlate well, making it difficult to determine which platform most closely represents the true microRNA expression level in a tissue. To address this issue, we have analyzed RNA derived from cell lines, as well as fresh frozen and formalin-fixed paraffin embedded tissues, using Affymetrix, Agilent, and Illumina microRNA arrays, NanoString counting, and Illumina Next Generation Sequencing. We compared the performance within- and between the different platforms, and then verified these results with those of quantitative PCR data. Our results demonstrate that the within-platform reproducibility for each method is consistently high and although the gene expression profiles from each platform show unique traits, comparison of genes that were commonly detectable showed that detection of microRNA transcripts was similar across multiple platforms.