Abstract
Summary
Data sparsity in single-cell experiments prevents an accurate assessment of gene expression when visualized in a low-dimensional space. Here, we introduce Nebulosa, an R package that ...uses weighted kernel density estimation to recover signals lost through drop-out or low expression.
Availability and implementation
Nebulosa can be easily installed from www.github.com/powellgenomicslab/Nebulosa.
Supplementary information
Supplementary data are available at Bioinformatics online.
Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many tissues, as well as both primary and stem cell-derived cell lines. An important facet of these ...studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an individual cell based on its transcriptional profile. Here, we present scPred, a new generalizable method that is able to provide highly accurate classification of single cells, using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning probability-based prediction method. We apply scPred to scRNA-seq data from pancreatic tissue, mononuclear cells, colorectal tumor biopsies, and circulating dendritic cells and show that scPred is able to classify individual cells with high accuracy. The generalized method is available at https://github.com/powellgenomicslab/scPred/.
The human immune system displays substantial variation between individuals, leading to differences in susceptibility to autoimmune disease. We present single-cell RNA sequencing (scRNA-seq) data from ...1,267,758 peripheral blood mononuclear cells from 982 healthy human subjects. For 14 cell types, we identified 26,597 independent cis-expression quantitative trait loci (eQTLs) and 990 trans-eQTLs, with most showing cell type-specific effects on gene expression. We subsequently show how eQTLs have dynamic allelic effects in B cells that are transitioning from naïve to memory states and demonstrate how commonly segregating alleles lead to interindividual variation in immune function. Finally, using a Mendelian randomization approach, we identify the causal route by which 305 risk loci contribute to autoimmune disease at the cellular level. This work brings together genetic epidemiology with scRNA-seq to uncover drivers of interindividual variation in the immune system.
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker ...selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.
Recent developments in single-cell RNA sequencing (scRNA-seq) platforms have vastly increased the number of cells typically assayed in an experiment. Analysis of scRNA-seq data is multidisciplinary ...in nature, requiring careful consideration of the application of statistical methods with respect to the underlying biology. Few analysis packages exist that are at once robust, are computationally fast, and allow flexible integration with other bioinformatics tools and methods.
ascend is an R package comprising tools designed to simplify and streamline the preliminary analysis of scRNA-seq data, while addressing the statistical challenges of scRNA-seq analysis and enabling flexible integration with genomics packages and native R functions, including fast parallel computation and efficient memory management. The package incorporates both novel and established methods to provide a framework to perform cell and gene filtering, quality control, normalization, dimension reduction, clustering, differential expression, and a wide range of visualization functions.
ascend is designed to work with scRNA-seq data generated by any high-throughput platform and includes functions to convert data objects between software packages. The ascend workflow is simple and interactive, as well as suitable for implementation by a broad range of users, including those with little programming experience.
While single-cell transcriptional profiling has greatly increased our capacity to interrogate biology, accurate cell classification within and between datasets is a key challenge. This is ...particularly so in pluripotent stem cell-derived organoids which represent a model of a developmental system. Here, clustering algorithms and selected marker genes can fail to accurately classify cellular identity while variation in analyses makes it difficult to meaningfully compare datasets. Kidney organoids provide a valuable resource to understand kidney development and disease. However, direct comparison of relative cellular composition between protocols has proved challenging. Hence, an unbiased approach for classifying cell identity is required.
The R package, scPred, was trained on multiple single cell RNA-seq datasets of human fetal kidney. A hierarchical model classified cellular subtypes into nephron, stroma and ureteric epithelial elements. This model, provided in the R package DevKidCC ( github.com/KidneyRegeneration/DevKidCC ), was then used to predict relative cell identity within published kidney organoid datasets generated using distinct cell lines and differentiation protocols, interrogating the impact of such variations. The package contains custom functions for the display of differential gene expression within cellular subtypes.
DevKidCC was used to directly compare between distinct kidney organoid protocols, identifying differences in relative proportions of cell types at all hierarchical levels of the model and highlighting variations in stromal and unassigned cell types, nephron progenitor prevalence and relative maturation of individual epithelial segments. Of note, DevKidCC was able to distinguish distal nephron from ureteric epithelium, cell types with overlapping profiles that have previously confounded analyses. When applied to a variation in protocol via the addition of retinoic acid, DevKidCC identified a consequential depletion of nephron progenitors.
The application of DevKidCC to kidney organoids reproducibly classifies component cellular identity within distinct single-cell datasets. The application of the tool is summarised in an interactive Shiny application, as are examples of the utility of in-built functions for data presentation. This tool will enable the consistent and rapid comparison of kidney organoid protocols, driving improvements in patterning to kidney endpoints and validating new approaches.
Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major ...projects like ENCODE and Illumina Body Map 2.0.
We aligned 21,504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56,861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource called intropolis available at http://intropolis.rail.bio . We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data.
Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA.
Objectives
A recent single‐cell RNA sequencing study by Wilk et al. suggested that plasmablasts can transdifferentiate into ‘developing neutrophils’ in patients with severe COVID‐19 disease. We ...explore the evidence for this.
Methods
We downloaded the original data and code used by the authors in their study to replicate their findings and explore the possibility that regressing out variables may have led the authors to overfit their data.
Results
The lineage relationship between plasmablasts and developing neutrophils breaks down when key features are not regressed out, and the data are not overfitted during the analysis.
Conclusion
Plasmablasts do not transdifferentiate into developing neutrophils. The single‐cell RNA sequencing is a powerful technique for biological discovery and hypothesis generation. However, caution should be exercised in the bioinformatic analysis and interpretation of the data and findings cross‐validated by orthogonal techniques.
Single cell RNA sequencing can generate unexpected hypotheses. However, these should be rigorously tested and cross‐validated by orthogonal techniques. In this case, there is no evidence that plasmablasts differentiate into neutrophils in severe COVID‐19 disease.
Children infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) develop less severe coronavirus disease 2019 (COVID-19) than adults. The mechanisms for the age-specific ...differences and the implications for infection-induced immunity are beginning to be uncovered. We show by longitudinal multimodal analysis that SARS-CoV-2 leaves a small footprint in the circulating T cell compartment in children with mild/asymptomatic COVID-19 compared to adult household contacts with the same disease severity who had more evidence of systemic T cell interferon activation, cytotoxicity and exhaustion. Children harbored diverse polyclonal SARS-CoV-2-specific naïve T cells whereas adults harbored clonally expanded SARS-CoV-2-specific memory T cells. A novel population of naïve interferon-activated T cells is expanded in acute COVID-19 and is recruited into the memory compartment during convalescence in adults but not children. This was associated with the development of robust CD4+ memory T cell responses in adults but not children. These data suggest that rapid clearance of SARS-CoV-2 in children may compromise their cellular immunity and ability to resist reinfection.
Display omitted
•Children have diverse polyclonal SARS-CoV-2-specific naïve T cells.•Adults have clonally expanded exhausted SARS-CoV-2-specific memory T cells.•Interferon-activated naïve T cells differentiate into memory T cells in adults but not children.•Adults but not children develop robust memory T cell responses to SARS-CoV-2.
Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and ...trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.