Abstract
Summary
Data sparsity in single-cell experiments prevents an accurate assessment of gene expression when visualized in a low-dimensional space. Here, we introduce Nebulosa, an R package that ...uses weighted kernel density estimation to recover signals lost through drop-out or low expression.
Availability and implementation
Nebulosa can be easily installed from www.github.com/powellgenomicslab/Nebulosa.
Supplementary information
Supplementary data are available at Bioinformatics online.
Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many tissues, as well as both primary and stem cell-derived cell lines. An important facet of these ...studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an individual cell based on its transcriptional profile. Here, we present scPred, a new generalizable method that is able to provide highly accurate classification of single cells, using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning probability-based prediction method. We apply scPred to scRNA-seq data from pancreatic tissue, mononuclear cells, colorectal tumor biopsies, and circulating dendritic cells and show that scPred is able to classify individual cells with high accuracy. The generalized method is available at https://github.com/powellgenomicslab/scPred/.
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants ...exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.
Full text
Available for:
IJS, NUK, SBMB, UL, UM, UPUK
Advances in droplet-based single-cell RNA-sequencing (scRNA-seq) have dramatically increased throughput, allowing tens of thousands of cells to be routinely sequenced in a single experiment. In ...addition to cells, droplets capture cell-free "ambient" RNA predominantly caused by lysis of cells during sample preparation. Samples with high ambient RNA concentration can create challenges in accurately distinguishing cell-containing droplets and droplets containing ambient RNA. Current methods to separate these groups often retain a significant number of droplets that do not contain cells or empty droplets. Additionally, there are currently no methods available to detect droplets containing damaged cells, which comprise partially lysed cells, the original source of the ambient RNA.
Here, we describe DropletQC, a new method that is able to detect empty droplets, damaged, and intact cells, and accurately distinguish them from one another. This approach is based on a novel quality control metric, the nuclear fraction, which quantifies for each droplet the fraction of RNA originating from unspliced, nuclear pre-mRNA. We demonstrate how DropletQC provides a powerful extension to existing computational methods for identifying empty droplets such as EmptyDrops.
We implement DropletQC as an R package, which can be easily integrated into existing single-cell analysis workflows.
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker ...selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.
The human immune system displays substantial variation between individuals, leading to differences in susceptibility to autoimmune disease. We present single-cell RNA sequencing (scRNA-seq) data from ...1,267,758 peripheral blood mononuclear cells from 982 healthy human subjects. For 14 cell types, we identified 26,597 independent cis-expression quantitative trait loci (eQTLs) and 990 trans-eQTLs, with most showing cell type-specific effects on gene expression. We subsequently show how eQTLs have dynamic allelic effects in B cells that are transitioning from naïve to memory states and demonstrate how commonly segregating alleles lead to interindividual variation in immune function. Finally, using a Mendelian randomization approach, we identify the causal route by which 305 risk loci contribute to autoimmune disease at the cellular level. This work brings together genetic epidemiology with scRNA-seq to uncover drivers of interindividual variation in the immune system.
Identity by descent (IBD) is a fundamental concept in genetics and refers to alleles that are descended from a common ancestor in a base population. Identity by state (IBS) simply refers to alleles ...that are the same, irrespective of whether they are inherited from a recent ancestor. In modern applications, IBD relationships are estimated from genetic markers in individuals without any known relationship. This can lead to erroneous inference because a consistent base population is not used. We argue that the purpose of most IBD calculations is to predict IBS at unobserved loci. Recognizing this aim leads to better methods to estimating IBD with benefits in mapping genes, estimating genetic variance and predicting inbreeding depression.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Cardiac differentiation of human pluripotent stem cells (hPSCs) requires orchestration of dynamic gene regulatory networks during stepwise fate transitions but often generates immature cell types ...that do not fully recapitulate properties of their adult counterparts, suggesting incomplete activation of key transcriptional networks. We performed extensive single-cell transcriptomic analyses to map fate choices and gene expression programs during cardiac differentiation of hPSCs and identified strategies to improve in vitro cardiomyocyte differentiation. Utilizing genetic gain- and loss-of-function approaches, we found that hypertrophic signaling is not effectively activated during monolayer-based cardiac differentiation, thereby preventing expression of HOPX and its activation of downstream genes that govern late stages of cardiomyocyte maturation. This study therefore provides a key transcriptional roadmap of in vitro cardiac differentiation at single-cell resolution, revealing fundamental mechanisms underlying heart development and differentiation of hPSC-derived cardiomyocytes.
Display omitted
•Single-cell RNA-seq during cardiac hPSC differentiation reveals cellular heterogeneity•A key cardiac regulatory gene, HOPX, is rarely expressed during in vitro differentiation•HOPX is a key in vitro regulator of cardiomyocyte hypertrophy and maturation
Friedman et al. performed single-cell transcriptional analysis over a time course of in vitro cardiac differentiation from human pluripotent stem cells. They utilized these data to identify the requirement of hypertrophic stimuli for expression of a cardiac regulatory gene, HOPX, to generate cardiomyocytes more accurately reflecting in vivo heart development.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the ...relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits.
Multiple system atrophy is a sporadic alpha-synucleinopathy that typically affects patients in their sixth decade of life and beyond. The defining clinical features of the disease include progressive ...autonomic failure, parkinsonism, and cerebellar ataxia leading to significant disability. Pathologically, multiple system atrophy is characterized by glial cytoplasmic inclusions containing filamentous alpha-synuclein. Neuronal inclusions also have been reported but remain less well defined. This study aimed to further define the spectrum of neuronal pathology in 35 patients with multiple system atrophy (20 male, 15 female; mean age at death 64.7 years; median disease duration 6.5 years, range 2.2 to 15.6 years). The morphologic type, topography, and frequencies of neuronal inclusions, including globular cytoplasmic (Lewy body-like) neuronal inclusions, were determined across a wide spectrum of brain regions. A correlation matrix of pathologic severity also was calculated between distinct anatomic regions of involvement (striatum, substantia nigra, olivary and pontine nuclei, hippocampus, forebrain and thalamus, anterior cingulate and neocortex, and white matter of cerebrum, cerebellum, and corpus callosum). The major finding was the identification of widespread neuronal inclusions in the majority of patients, not only in typical disease-associated regions (striatum, substantia nigra), but also within anterior cingulate cortex, amygdala, entorhinal cortex, basal forebrain and hypothalamus. Neuronal inclusion pathology appeared to follow a hierarchy of region-specific susceptibility, independent of the clinical phenotype, and the severity of pathology was duration-dependent. Neuronal inclusions also were identified in regions not previously implicated in the disease, such as within cerebellar roof nuclei. Lewy body-like inclusions in multiple system atrophy followed the stepwise anatomic progression of Lewy body-spectrum disease inclusion pathology in 25.7% of patients with multiple system atrophy, including a patient with visual hallucinations. Further, the presence of Lewy body-like inclusions in neocortex, but not hippocampal alpha-synuclein pathology, was associated with cognitive impairment (P = 0.002). However, several cases had the presence of isolated Lewy body-like inclusions at atypical sites (e.g. thalamus, deep cerebellar nuclei) that are not typical for Lewy body-spectrum disease. Finally, interregional correlations (rho ≥ 0.6) in pathologic glial and neuronal lesion burden suggest shared mechanisms of disease progression between both discrete anatomic regions (e.g. basal forebrain and hippocampus) and cell types (neuronal and glial inclusions in frontal cortex and white matter, respectively). These findings suggest that in addition to glial inclusions, neuronal pathology plays an important role in the developmental and progression of multiple system atrophy.