Abstract
Motivation
Gene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell ...RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner.
Results
Therefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data.
Availability and implementation
The ESCO implementation is available as R package ESCO. Users can either download the development version via github (https://github.com/JINJINT/ESCO) or the archived version via Zenodo (https://zenodo.org/record/4455890).
Supplementary information
Supplementary data are available at Bioinformatics online.
The clinical diagnosis of Alzheimer's disease, at its early stage, remains a difficult task. Advanced imaging technologies and laboratory assays to detect Aβ peptides Aβ42 and Aβ40, total and ...phosphorylated tau in CSF provide a set of biomarkers of developing AD brain pathology and facilitate the diagnostic process. The search for biofluid biomarkers, other than in CSF, and the development of biomarker assays have accelerated significantly and now represent the fastest-growing field in AD research. The goal of this study was to determine the differential enrichment of noncoding RNAs (ncRNAs) in plasma-derived extracellular vesicles (EV) of AD patients and Cognitively Normal controls (NC). Using RNA-seq, we profiled four significant classes of ncRNAs: miRNAs, snoRNAs, tRNAs, and piRNAs. We report a significant enrichment of SNORDs – a group of snoRNAs, in AD samples compared to NC. To verify the differential enrichment of two clusters of SNORDs – SNORD115 and SNORD116, localized on human chromosome 15q11-q13, we used plasma samples of an independent group of AD patients and NC. We applied ddPCR technique and identified SNORD115 and SNORD116 with a high discriminatory power to differentiate AD samples from NC. The results of our study present evidence that AD is associated with changes in the enrichment of SNORDs, transcribed from imprinted genomic loci, in plasma EV and provide a rationale to further explore the validity of those SNORDs as plasma biomarkers of AD.
•SNORD115 and SNORD116 are enriched in plasma EVs of AD patients compared to control.•SNORD transcripts in brain are differentially expressed between AD-E4+ and AD-E3/3.•SNORD signatures have a high discriminatory power to differentiate AD from control.
The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. ...However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Obsessive-compulsive disorder (OCD) is a chronic and severe psychiatric disorder for which effective treatment options are limited. Structural and functional neuroimaging studies have consistently ...implicated the orbitofrontal cortex (OFC) and striatum in the pathophysiology of the disorder. Recent genetic evidence points to involvement of components of the excitatory synapse in the etiology of OCD. However, the transcriptional alterations that could link genetic risk to known structural and functional abnormalities remain mostly unknown. To assess potential transcriptional changes in the OFC and two striatal regions (caudate nucleus and nucleus accumbens) of OCD subjects relative to unaffected comparison subjects, we sequenced messenger RNA transcripts from these brain regions. In a joint analysis of all three regions, 904 transcripts were differentially expressed between 7 OCD versus 8 unaffected comparison subjects. Region-specific analyses highlighted a smaller number of differences, which concentrated in caudate and nucleus accumbens. Pathway analyses of the 904 differentially expressed transcripts showed enrichment for genes involved in synaptic signaling, with these synapse-associated genes displaying lower expression in OCD subjects relative to unaffected comparison subjects. Finally, we estimated that cell type fractions of medium spiny neurons were lower whereas vascular cells and astrocyte fractions were higher in tissue of OCD subjects. Together, these data provide the first unbiased examination of differentially expressed transcripts in both OFC and striatum of OCD subjects. These transcripts encoded synaptic proteins more often than expected by chance, and thus implicate the synapse as a vulnerable molecular compartment for OCD.
Whole-exome sequencing studies have been useful for identifying genes that, when mutated, affect risk for autism spectrum disorder (ASD). Nonetheless, the association signal primarily arises from de ...novo protein-truncating variants, as opposed to the more common missense variants. Despite their commonness in humans, determining which missense variants affect phenotypes and how remains a challenge. We investigate the functional relevance of de novo missense variants, specifically whether they are likely to disrupt protein interactions, and nominate novel genes in risk for ASD through integrated genomic, transcriptomic, and proteomic analyses.
Utilizing our previous interactome perturbation predictor, we identify a set of missense variants that are likely disruptive to protein-protein interactions. For genes encoding the disrupted interactions, we evaluate their expression patterns across developing brains and within specific cell types, using both bulk and inferred cell-type-specific brain transcriptomes. Connecting all disrupted pairs of proteins, we construct an "ASD disrupted network." Finally, we integrate protein interactions and cell-type-specific co-expression networks together with published association data to implicate novel genes in ASD risk in a cell-type-specific manner.
Extending earlier work, we show that de novo missense variants that disrupt protein interactions are enriched in individuals with ASD, often affecting hub proteins and disrupting hub interactions. Genes encoding disrupted complementary interactors tend to be risk genes, and an interaction network built from these proteins is enriched for ASD proteins. Consistent with other studies, genes identified by disrupted protein interactions are expressed early in development and in excitatory and inhibitory neuronal lineages. Using inferred gene co-expression for three neuronal cell types-excitatory, inhibitory, and neural progenitor-we implicate several hundred genes in risk (FDR Formula: see text0.05), ~ 60% novel, with characteristics of genuine ASD genes. Across cell types, these genes affect neuronal morphogenesis and neuronal communication, while neural progenitor cells show strong enrichment for development of the limbic system.
Some analyses use the imperfect guilt-by-association principle; results are statistical, not functional.
Disrupted protein interactions identify gene sets involved in risk for ASD. Their gene expression during brain development and within cell types highlights how they relate to ASD.
Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging ...multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models.
To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes.
We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease).
We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
DNA methylation (DNAm), the addition of a methyl group to a cytosine in DNA, plays an important role in the regulation of gene expression. Single-nucleotide polymorphisms (SNPs) associated with ...schizophrenia (SZ) by genome-wide association studies (GWAS) often influence local DNAm levels. Thus, DNAm alterations, acting through effects on gene expression, represent one potential mechanism by which SZ-associated SNPs confer risk. In this study, we investigated genome-wide DNAm in postmortem superior temporal gyrus from 44 subjects with SZ and 44 non-psychiatric comparison subjects using Illumina Infinium MethylationEPIC BeadChip microarrays, and extracted cell-type-specific methylation signals by applying tensor composition analysis. We identified SZ-associated differential methylation at 242 sites, and 44 regions containing two or more sites (FDR cutoff of q = 0.1) and determined a subset of these were cell-type specific. We found mitotic arrest deficient 1-like 1 (MAD1L1), a gene within an established GWAS risk locus, harbored robust SZ-associated differential methylation. We investigated the potential role of MAD1L1 DNAm in conferring SZ risk by assessing for colocalization among quantitative trait loci for methylation and gene transcripts (mQTLs and tQTLs) in brain tissue and GWAS signal at the locus using multiple-trait-colocalization analysis. We found that mQTLs and tQTLs colocalized with the GWAS signal (posterior probability >0.8). Our findings suggest that alterations in MAD1L1 methylation and transcription may mediate risk for SZ at the MAD1L1-containing locus. Future studies to identify how SZ-associated differential methylation affects MAD1L1 biological function are indicated.
Understanding lung immunity requires an unbiased profiling of tissue-resident T cells at their precise anatomical locations within the lung, but such information has not been characterized in the ...immunized mouse model. In this pilot study, using 10x Genomics Chromium and Visium platform, we performed an integrative analysis of spatial transcriptome with single-cell RNA-seq and single-cell ATAC-seq on lung cells from mice after immunization using a well-established Klebsiella pneumoniae infection model. We built an optimized deconvolution pipeline to accurately decipher specific cell-type compositions by anatomic location. We discovered that combining scATAC-seq and scRNA-seq data may provide more robust cell-type identification, especially for lineage-specific T helper cells. Combining all three modalities, we observed a dynamic change in the location of T helper cells as well as their corresponding chemokines. In summary, our proof-of-principle study demonstrated the power and potential of single-cell multi-omics analysis to uncover spatial- and cell-type-dependent mechanisms of lung immunity.
Display omitted
•Deconvolution workflow was verified to study lung immunity using ST•15 lung cell types were identified by integrating scRNA-seq and scATAC-seq data•Th17 cells were found proximal to airways than Th1 upon Klebsiella pneumoniae re-challenge•Massive immune responses were activated in airways upon K. pneumoniae re-challenge
Biological sciences; Immunology; Omics; Transcriptomics
When assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by ...evaluating gene expression at the cellular level. Both data types lend insights into disease etiology. With current technologies, scRNA-seq data are known to be noisy. Constrained by costs, scRNA-seq data are typically generated from a relatively small number of subjects, which limits their utility for some analyses, such as identification of gene expression quantitative trait loci (eQTLs). To address these issues while maintaining the unique advantages of each data type, we develop a Bayesian method (bMIND) to integrate bulk and scRNA-seq data. With a prior derived from scRNA-seq data, we propose to estimate sample-level cell type-specific (CTS) expression from bulk expression data. The CTS expression enables large-scale sample-level downstream analyses, such as detection of CTS differentially expressed genes (DEGs) and eQTLs. Through simulations, we show that bMIND improves the accuracy of sample-level CTS expression estimates and increases the power to discover CTS DEGs when compared to existing methods. To further our understanding of two complex phenotypes, autism spectrum disorder and Alzheimer's disease, we apply bMIND to gene expression data of relevant brain tissue to identify CTS DEGs. Our results complement findings for CTS DEGs obtained from snRNA-seq studies, replicating certain DEGs in specific cell types while nominating other novel genes for those cell types. Finally, we calculate CTS eQTLs for 11 brain regions by analyzing Genotype-Tissue Expression Project data, creating a new resource for biological insights.
Little is known on the financial well-being of families raising children with autism spectrum disorders (ASD). Family financial well-being has important impacts on the development of children with ...ASD. The study uses a 2019 survey collected from Chinese families raising a child with ASD (N = 3064) to examine their financial well-being and its association with health expenditures for children. Extensive control variables (i.e., demographic and socioeconomic characteristics of children, respondents, and their families) are adjusted in analyses. Findings suggest that the amount of health expenditures is negatively associated with respondents’ perception of their financial status. The significance of health expenditures disappears after household material hardship is adjusted. Health expenditures affect financial well-being mainly through resource competitions against family needs.