Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the ...characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack ...of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.
The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves ...pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular
edgeR
package to import, organise, filter and normalise the data, followed by the
limma
package with its
voom
method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the
Glimma
package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.
graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such ...dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration.
The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well-established Bioconductor workflows ( limma , edgeR and DESeq2 ) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility.
The Glimma R package is available from http://bioconductor.org/packages/Glimma/ .
su.s@wehi.edu.au , law@wehi.edu.au or mritchie@wehi.edu.au.
Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes ...have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from https://www.bioconductor.org/packages/scPipe.
Archetypal human pluripotent stem cells (hPSC) are widely considered to be equivalent in developmental status to mouse epiblast stem cells, which correspond to pluripotent cells at a late ...post-implantation stage of embryogenesis. Heterogeneity within hPSC cultures complicates this interspecies comparison. Here we show that a subpopulation of archetypal hPSC enriched for high self-renewal capacity (ESR) has distinct properties relative to the bulk of the population, including a cell cycle with a very low G1 fraction and a metabolomic profile that reflects a combination of oxidative phosphorylation and glycolysis. ESR cells are pluripotent and capable of differentiation into primordial germ cell-like cells. Global DNA methylation levels in the ESR subpopulation are lower than those in mouse epiblast stem cells. Chromatin accessibility analysis revealed a unique set of open chromatin sites in ESR cells. RNA-seq at the subpopulation and single cell levels shows that, unlike mouse epiblast stem cells, the ESR subset of hPSC displays no lineage priming, and that it can be clearly distinguished from gastrulating and extraembryonic cell populations in the primate embryo. ESR hPSC correspond to an earlier stage of post-implantation development than mouse epiblast stem cells.
A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of ...nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz.
Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes ...to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied.
Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis.
In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Innate lymphoid cells (ILCs) are enriched at mucosal surfaces, where they provide immune surveillance. All ILC subsets develop from a common progenitor that gives rise to pre-committed progenitors ...for each of the ILC lineages. Currently, the temporal control of gene expression that guides the emergence of these progenitors is poorly understood. We used global transcriptional mapping to analyze gene expression in different ILC progenitors. We identified PD-1 to be specifically expressed in PLZF+ ILCp and revealed that the timing and order of expression of the transcription factors NFIL3, ID2, and TCF-1 was critical. Importantly, induction of ILC lineage commitment required only transient expression of NFIL3 prior to ID2 and TCF-1 expression. These findings highlight the importance of the temporal program that permits commitment of progenitors to the ILC lineage, and they expand our understanding of the core transcriptional program by identifying potential regulators of ILC development.
Display omitted
•ILCp transcriptomics define the blueprint for hierarchical ILC development•PD-1 identifies the PLZF-expressing ILC precursor in the bone marrow•Transient NFIL3 expression prior to ID2 expression is required for ILC development•ID2 and TCF-1 are required to extinguish stem cell and B and T cell gene programs
Seillet et al. define the hierarchical blueprint for ILC development using global transcriptomic analyses of ILC progenitors. This revealed that PD-1 is a key marker of ILCp and uncovered a regulatory circuit governed by NFIL3 in regulating ID2 and TCF-1 essential for ILC differentiation.
Tumors are composed of phenotypically heterogeneous cancer cells that often resemble various differentiation states of their lineage of origin. Within this hierarchy, it is thought that an immature ...subpopulation of tumor-propagating cancer stem cells (CSCs) differentiates into non-tumorigenic progeny, providing a rationale for therapeutic strategies that specifically eradicate CSCs or induce their differentiation. The clinical success of these approaches depends on CSC differentiation being unidirectional rather than reversible, yet this question remains unresolved even in prototypically hierarchical malignancies, such as acute myeloid leukemia (AML). Here, we show in murine and human models of AML that, upon perturbation of endogenous expression of the lineage-determining transcription factor PU.1 or withdrawal of established differentiation therapies, some mature leukemia cells can de-differentiate and reacquire clonogenic and leukemogenic properties. Our results reveal plasticity of CSC maturation in AML, highlighting the need to therapeutically eradicate cancer cells across a range of differentiation states.
Display omitted
•Reversible PU.1 knockdown provides a genetic model of AML differentiation therapy•Mature AML-derived cells can revert to a leukemogenic state upon PU.1 suppression•Mouse and human APL cells can regain clonogenicity after ATRA-induced differentiation
Intratumoral phenotypic heterogeneity in acute myeloid leukemia (AML) and many other cancers is thought to follow a hierarchical cancer stem cell model. Dickins and colleagues show here that mature, non-leukemogenic AML cells can reacquire leukemia-initiating activity and promote disease progression through de-differentiation.