Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable ...denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
Spatial and molecular characteristics determine tissue function, yet high-resolution methods to capture both concurrently are lacking. Here, we developed high-definition spatial transcriptomics, ...which captures RNA from histological tissue sections on a dense, spatially barcoded bead array. Each experiment recovers several hundred thousand transcript-coupled spatial barcodes at 2-μm resolution, as demonstrated in mouse brain and primary breast cancer. This opens the way to high-resolution spatial analysis of cells and tissues.
Multimodal measurements of single-cell profiles are proving increasingly useful for characterizing cell states and regulatory mechanisms. In the present study, we developed PHAGE-ATAC (Assay for ...Transposase-Accessible Chromatin), a massively parallel droplet-based method that uses phage displaying, engineered, camelid single-domain antibodies ('nanobodies') for simultaneous single-cell measurements of protein levels and chromatin accessibility profiles, and mitochondrial DNA-based clonal tracing. We use PHAGE-ATAC for multimodal analysis in primary human immune cells, sample multiplexing, intracellular protein analysis and the detection of SARS-CoV-2 spike protein in human cell populations. Finally, we construct a synthetic high-complexity phage library for selection of antigen-specific nanobodies that bind cells of particular molecular profiles, opening an avenue for protein detection, cell characterization and screening with single-cell genomics.
Understanding gene function and regulation in homeostasis and disease requires knowledge of the cellular and tissue contexts in which genes are expressed. Here, we applied four single-nucleus RNA ...sequencing methods to eight diverse, archived, frozen tissue types from 16 donors and 25 samples, generating a cross-tissue atlas of 209,126 nuclei profiles, which we integrated across tissues, donors, and laboratory methods with a conditional variational autoencoder. Using the resulting cross-tissue atlas, we highlight shared and tissue-specific features of tissue-resident cell populations; identify cell types that might contribute to neuromuscular, metabolic, and immune components of monogenic diseases and the biological processes involved in their pathology; and determine cell types and gene modules that might underlie disease mechanisms for complex traits analyzed by genome-wide association studies.
As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the ...exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing.
The critical functions of the human liver are coordinated through the interactions of hepatic parenchymal and non‐parenchymal cells. Recent advances in single‐cell transcriptional approaches have ...enabled an examination of the human liver with unprecedented resolution. However, dissociation‐related cell perturbation can limit the ability to fully capture the human liver’s parenchymal cell fraction, which limits the ability to comprehensively profile this organ. Here, we report the transcriptional landscape of 73,295 cells from the human liver using matched single‐cell RNA sequencing (scRNA‐seq) and single‐nucleus RNA sequencing (snRNA‐seq). The addition of snRNA‐seq enabled the characterization of interzonal hepatocytes at a single‐cell resolution, revealed the presence of rare subtypes of liver mesenchymal cells, and facilitated the detection of cholangiocyte progenitors that had only been observed during in vitro differentiation experiments. However, T and B lymphocytes and natural killer cells were only distinguishable using scRNA‐seq, highlighting the importance of applying both technologies to obtain a complete map of tissue‐resident cell types. We validated the distinct spatial distribution of the hepatocyte, cholangiocyte, and mesenchymal cell populations by an independent spatial transcriptomics data set and immunohistochemistry. Conclusion: Our study provides a systematic comparison of the transcriptomes captured by scRNA‐seq and snRNA‐seq and delivers a high‐resolution map of the parenchymal cell populations in the healthy human liver.
Genome-wide association studies (GWAS) identify genetic variants associated with traits or diseases. GWAS never directly link variants to regulatory mechanisms. Instead, the functional annotation of ...variants is typically inferred by post hoc analyses. A specific class of deep learning-based methods allows for the prediction of regulatory effects per variant on several cell type-specific chromatin features. We here describe "DeepWAS", a new approach that integrates these regulatory effect predictions of single variants into a multivariate GWAS setting. Thereby, single variants associated with a trait or disease are directly coupled to their impact on a chromatin feature in a cell type. Up to 61 regulatory SNPs, called dSNPs, were associated with multiple sclerosis (MS, 4,888 cases and 10,395 controls), major depressive disorder (MDD, 1,475 cases and 2,144 controls), and height (5,974 individuals). These variants were mainly non-coding and reached at least nominal significance in classical GWAS. The prediction accuracy was higher for DeepWAS than for classical GWAS models for 91% of the genome-wide significant, MS-specific dSNPs. DSNPs were enriched in public or cohort-matched expression and methylation quantitative trait loci and we demonstrated the potential of DeepWAS to generate testable functional hypotheses based on genotype data alone. DeepWAS is available at https://github.com/cellmapslab/DeepWAS.
Abstract
Motivation
Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene ...expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements.
Results
We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites.
Availability and implementation
ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM.
Supplementary information
Supplementary data are available at Bioinformatics online.
Understanding the genetic and molecular drivers of phenotypic heterogeneity across individuals is central to biology. As new technologies enable fine-grained and spatially resolved molecular ...profiling, we need new computational approaches to integrate data from the same organ across different individuals into a consistent reference and to construct maps of molecular and cellular organization at histological and anatomical scales. Here, we review previous efforts and discuss challenges involved in establishing such a common coordinate framework, the underlying map of tissues and organs. We focus on strategies to handle anatomical variation across individuals and highlight the need for new technologies and analytical methods spanning multiple hierarchical scales of spatial resolution.
Satija, Regev, Marioni, and colleagues recommend approaches to create a reference map of the human body down to the single-cell level—a task made challenging by the diverse human form.