Many methods have been used to determine differential gene expression from single-cell RNA (scRNA)-seq data. We evaluated 36 approaches using experimental and synthetic data and found considerable ...differences in the number and characteristics of the genes that are called differentially expressed. Prefiltering of lowly expressed genes has important effects, particularly for some of the methods developed for bulk RNA-seq data analysis. However, we found that bulk RNA-seq analysis methods do not generally perform worse than those developed specifically for scRNA-seq. We also present conquer, a repository of consistently processed, analysis-ready public scRNA-seq data sets that is aimed at simplifying method evaluation and reanalysis of published results. Each data set provides abundance estimates for both genes and transcripts, as well as quality control and exploratory analysis reports.
The fine detail provided by sequencing-based transcriptome surveys suggests that RNA-seq is likely to become the platform of choice for interrogating steady state RNA. In order to discover ...biologically important changes in expression, we show that normalization continues to be an essential step in the analysis. We outline a simple and effective method for performing normalization and show dramatically improved results for inferring differential expression in simulated and publicly available data sets.
A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based ...methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of 'sharing of information' across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/.
It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data ...analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
Immune-checkpoint blockade has revolutionized cancer therapy. In particular, inhibition of programmed cell death protein 1 (PD-1) has been found to be effective for the treatment of metastatic ...melanoma and other cancers. Despite a dramatic increase in progression-free survival, a large proportion of patients do not show durable responses. Therefore, predictive biomarkers of a clinical response are urgently needed. Here we used high-dimensional single-cell mass cytometry and a bioinformatics pipeline for the in-depth characterization of the immune cell subsets in the peripheral blood of patients with stage IV melanoma before and after 12 weeks of anti-PD-1 immunotherapy. During therapy, we observed a clear response to immunotherapy in the T cell compartment. However, before commencing therapy, a strong predictor of progression-free and overall survival in response to anti-PD-1 immunotherapy was the frequency of CD14
CD16
HLA-DR
monocytes. We confirmed this by conventional flow cytometry in an independent, blinded validation cohort, and we propose that the frequency of monocytes in PBMCs may serve in clinical decision support.
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome ...composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
A platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies, but despite initial efforts it remains crucial to further investigate ...the technology for quantification of complex transcriptomes. Here we undertake native RNA sequencing of polyA + RNA from two human cell lines, analysing ~5.2 million aligned native RNA reads. To enable informative comparisons, we also perform relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects currently hamper its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.
CRISPR-Cas-based genome editing holds great promise for targeting genetic disorders, including inborn errors of hepatocyte metabolism. Precise correction of disease-causing mutations in adult tissues ...in vivo, however, is challenging. It requires repair of Cas9-induced double-stranded DNA (dsDNA) breaks by homology-directed mechanisms, which are highly inefficient in nondividing cells. Here we corrected the disease phenotype of adult phenylalanine hydroxylase (Pah)
mice, a model for the human autosomal recessive liver disease phenylketonuria (PKU)
, using recently developed CRISPR-Cas-associated base editors
. These systems enable conversion of C∙G to T∙A base pairs and vice versa, independent of dsDNA break formation and homology-directed repair (HDR). We engineered and validated an intein-split base editor, which allows splitting of the fusion protein into two parts, thereby circumventing the limited cargo capacity of adeno-associated virus (AAV) vectors. Intravenous injection of AAV-base editor systems resulted in Pah
gene correction rates that restored physiological blood phenylalanine (L-Phe) levels below 120 µmol/l 5. We observed mRNA correction rates up to 63%, restoration of phenylalanine hydroxylase (PAH) enzyme activity, and reversion of the light fur phenotype in Pah
mice. Our findings suggest that targeting genetic diseases in vivo using AAV-mediated delivery of base-editing agents is feasible, demonstrating potential for therapeutic application.
The edgeR package, an R-based tool within the Bioconductor project, offers a flexible statistical framework for detection of changes in abundance based on counts. In this chapter, we illustrate the ...use of edgeR on a human embryonic stem cell dataset, in particular for RNA-seq and ChIP-seq data. We focus on a step-by-step statistical analysis of differential expression, going from raw data to a list of putative differentially expressed genes and give examples of integrative analysis using the ChIP-seq data. We emphasize data quality spot checks and the use of positive controls throughout the process and give practical recommendations for reproducible research.