We report a multi-omic resource generated by applying quantitative trait locus (xQTL) analyses to RNA sequence, DNA methylation and histone acetylation data from the dorsolateral prefrontal cortex of ...411 older adults who have all three data types. We identify SNPs significantly associated with gene expression, DNA methylation and histone modification levels. Many of these SNPs influence multiple molecular features, and we demonstrate that SNP effects on RNA expression are fully mediated by epigenetic features in 9% of these loci. Further, we illustrate the utility of our new resource, xQTL Serve, by using it to prioritize the cell type(s) most affected by an xQTL. We also reanalyze published genome wide association studies using an xQTL-weighted analysis approach and identify 18 new schizophrenia and 2 new bipolar susceptibility variants, which is more than double the number of loci that can be discovered with a larger blood-based expression eQTL resource.
Risk variants for schizophrenia affect more than 100 genomic loci, yet cell- and tissue-specific roles underlying disease liability remain poorly characterized. We have generated for two cortical ...areas implicated in psychosis, the dorsolateral prefrontal cortex and anterior cingulate cortex, 157 reference maps from neuronal, neuron-depleted and bulk tissue chromatin for two histone marks associated with active promoters and enhancers, H3-trimethyl-Lys4 (H3K4me3) and H3-acetyl-Lys27 (H3K27ac). Differences between neuronal and neuron-depleted chromatin states were the major axis of variation in histone modification profiles, followed by substantial variability across subjects and cortical areas. Thousands of significant histone quantitative trait loci were identified in neuronal and neuron-depleted samples. Risk variants for schizophrenia, depressive symptoms and neuroticism were significantly over-represented in neuronal H3K4me3 and H3K27ac landscapes. Our Resource, sponsored by PsychENCODE and CommonMind, highlights the critical role of cell-type-specific signatures at regulatory and disease-associated noncoding sequences in the human frontal lobe.
A novel estimator for the two-way partial AUC Chaibub Neto, Elias; Yadav, Vijay; Sieberts, Solveig K ...
BMC medical informatics and decision making,
02/2024, Letnik:
24, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The two-way partial AUC has been recently proposed as a way to directly quantify partial area under the ROC curve with simultaneous restrictions on the sensitivity and specificity ranges of ...diagnostic tests or classifiers. The metric, as originally implemented in the tpAUC R package, is estimated using a nonparametric estimator based on a trimmed Mann-Whitney U-statistic, which becomes computationally expensive in large sample sizes. (Its computational complexity is of order Formula: see text, where Formula: see text and Formula: see text represent the number of positive and negative cases, respectively). This is problematic since the statistical methodology for comparing estimates generated from alternative diagnostic tests/classifiers relies on bootstrapping resampling and requires repeated computations of the estimator on a large number of bootstrap samples.
By leveraging the graphical and probabilistic representations of the AUC, partial AUCs, and two-way partial AUC, we derive a novel estimator for the two-way partial AUC, which can be directly computed from the output of any software able to compute AUC and partial AUCs. We implemented our estimator using the computationally efficient pROC R package, which leverages a nonparametric approach using the trapezoidal rule for the computation of AUC and partial AUC scores. (Its computational complexity is of order Formula: see text, where Formula: see text.). We compare the empirical bias and computation time of the proposed estimator against the original estimator provided in the tpAUC package in a series of simulation studies and on two real datasets.
Our estimator tended to be less biased than the original estimator based on the trimmed Mann-Whitney U-statistic across all experiments (and showed considerably less bias in the experiments based on small sample sizes). But, most importantly, because the computational complexity of the proposed estimator is of order Formula: see text, rather than Formula: see text, it is much faster to compute when sample sizes are large.
The proposed estimator provides an improvement for the computation of two-way partial AUC, and allows the comparison of diagnostic tests/machine learning classifiers in large datasets where repeated computations of the original estimator on bootstrap samples become too expensive to compute.
The temporal molecular changes that lead to disease onset and progression in Alzheimer's disease (AD) are still unknown. Here we develop a temporal model for these unobserved molecular changes with a ...manifold learning method applied to RNA-Seq data collected from human postmortem brain samples collected within the ROS/MAP and Mayo Clinic RNA-Seq studies. We define an ordering across samples based on their similarity in gene expression and use this ordering to estimate the molecular disease stage-or disease pseudotime-for each sample. Disease pseudotime is strongly concordant with the burden of tau (Braak score, P = 1.0 × 10
), Aβ (CERAD score, P = 1.8 × 10
), and cognitive diagnosis (P = 3.5 × 10
) of late-onset (LO) AD. Early stage disease pseudotime samples are enriched for controls and show changes in basic cellular functions. Late stage disease pseudotime samples are enriched for late stage AD cases and show changes in neuroinflammation and amyloid pathologic processes. We also identify a set of late stage pseudotime samples that are controls and show changes in genes enriched for protein trafficking, splicing, regulation of apoptosis, and prevention of amyloid cleavage pathways. In summary, we present a method for ordering patients along a trajectory of LOAD disease progression from brain transcriptomic data.
While schizophrenia differs between males and females in the age of onset, symptomatology, and disease course, the molecular mechanisms underlying these differences remain uncharacterized.
To address ...questions about the sex-specific effects of schizophrenia, we performed a large-scale transcriptome analysis of RNA sequencing data from 437 controls and 341 cases from two distinct cohorts from the CommonMind Consortium.
Analysis across the cohorts identified a reproducible gene expression signature of schizophrenia that was highly concordant with previous work. Differential expression across sex was reproducible across cohorts and identified X- and Y-linked genes, as well as those involved in dosage compensation. Intriguingly, the sex expression signature was also enriched for genes involved in neurexin family protein binding and synaptic organization. Differential expression analysis testing a sex-by-diagnosis interaction effect did not identify any genome-wide signature after multiple testing corrections. Gene coexpression network analysis was performed to reduce dimensionality from thousands of genes to dozens of modules and elucidate interactions among genes. We found enrichment of coexpression modules for sex-by-diagnosis differential expression signatures, which were highly reproducible across the two cohorts and involved a number of diverse pathways, including neural nucleus development, neuron projection morphogenesis, and regulation of neural precursor cell proliferation.
Overall, our results indicate that the effect size of sex differences in schizophrenia gene expression signatures is small and underscore the challenge of identifying robust sex-by-diagnosis signatures, which will require future analyses in larger cohorts.
To map the genetics of gene expression in metabolically relevant tissues and investigate the diversity of expression SNPs (eSNPs) in multiple tissues from the same individual, we collected four ...tissues from approximately 1000 patients undergoing Roux-en-Y gastric bypass (RYGB) and clinical traits associated with their weight loss and co-morbidities. We then performed high-throughput genotyping and gene expression profiling and carried out a genome-wide association analyses for more than 100,000 gene expression traits representing four metabolically relevant tissues: liver, omental adipose, subcutaneous adipose, and stomach. We successfully identified 24,531 eSNPs corresponding to about 10,000 distinct genes. This represents the greatest number of eSNPs identified to our knowledge by any study to date and the first study to identify eSNPs from stomach tissue. We then demonstrate how these eSNPs provide a high-quality disease map for each tissue in morbidly obese patients to not only inform genetic associations identified in this cohort, but in previously published genome-wide association studies as well. These data can aid in elucidating the key networks associated with morbid obesity, response to RYGB, and disease as a whole.
A key goal of biomedical research is to elucidate the complex network of gene interactions underlying complex traits such as common human diseases. Here we detail a multistep procedure for ...identifying potential key drivers of complex traits that integrates DNA-variation and gene-expression data with other complex trait data in segregating mouse populations. Ordering gene expression traits relative to one another and relative to other complex traits is achieved by systematically testing whether variations in DNA that lead to variations in relative transcript abundances statistically support an independent, causative or reactive function relative to the complex traits under consideration. We show that this approach can predict transcriptional responses to single gene-perturbation experiments using gene-expression data in the context of a segregating mouse population. We also demonstrate the utility of this approach by identifying and experimentally validating the involvement of three new genes in susceptibility to obesity.
Alzheimer's disease (AD) is an incurable neurodegenerative disease currently affecting 1.75% of the US population, with projected growth to 3.46% by 2050. Identifying common genetic variants driving ...differences in transcript expression that confer AD risk is necessary to elucidate AD mechanism and develop therapeutic interventions. We modify the FUSION transcriptome-wide association study (TWAS) pipeline to ingest gene expression values from multiple neocortical regions.
A combined dataset of 2003 genotypes clustered to 1000 Genomes individuals from Utah with Northern and Western European ancestry (CEU) was used to construct a training set of 790 genotypes paired to 888 RNASeq profiles from temporal cortex (TCX = 248), prefrontal cortex (FP = 50), inferior frontal gyrus (IFG = 41), superior temporal gyrus (STG = 34), parahippocampal cortex (PHG = 34), and dorsolateral prefrontal cortex (DLPFC = 461). Following within-tissue normalization and covariate adjustment, predictive weights to impute expression components based on a gene's surrounding cis-variants were trained. The FUSION pipeline was modified to support input of pre-scaled expression values and support cross validation with a repeated measure design arising from the presence of multiple transcriptome samples from the same individual across different tissues.
Cis-variant architecture alone was informative to train weights and impute expression for 6780 (49.67%) autosomal genes, the majority of which significantly correlated with gene expression; FDR < 5%: N = 6775 (99.92%), Bonferroni: N = 6716 (99.06%). Validation of weights in 515 matched genotype to RNASeq profiles from the CommonMind Consortium (CMC) was (72.14%) in DLPFC profiles. Association of imputed expression components from all 2003 genotype profiles yielded 8 genes significantly associated with AD (FDR < 0.05): APOC1, EED, CD2AP, CEACAM19, CLPTM1, MTCH2, TREM2, and KNOP1.
We provide evidence of cis-genetic variation conferring AD risk through 8 genes across six distinct genomic loci. Moreover, we provide expression weights for 6780 genes as a valuable resource to the community, which can be abstracted across the neocortex and a wide range of neuronal phenotypes.
Schizophrenia and bipolar disorder are serious mental illnesses that affect more than 2% of adults. While large-scale genetics studies have identified genomic regions associated with disease risk, ...less is known about the molecular mechanisms by which risk alleles with small effects lead to schizophrenia and bipolar disorder. In order to fill this gap between genetics and disease phenotype, we have undertaken a multi-cohort genomics study of postmortem brains from controls, individuals with schizophrenia and bipolar disorder. Here we present a public resource of functional genomic data from the dorsolateral prefrontal cortex (DLPFC; Brodmann areas 9 and 46) of 986 individuals from 4 separate brain banks, including 353 diagnosed with schizophrenia and 120 with bipolar disorder. The genomic data include RNA-seq and SNP genotypes on 980 individuals, and ATAC-seq on 269 individuals, of which 264 are a subset of individuals with RNA-seq. We have performed extensive preprocessing and quality control on these data so that the research community can take advantage of this public resource available on the Synapse platform at http://CommonMind.org .
Identifying variations in DNA that increase susceptibility to disease is one of the primary aims of genetic studies using a forward genetics approach. However, identification of ...disease-susceptibility genes by means of such studies provides limited functional information on how genes lead to disease. In fact, in most cases there is an absence of functional information altogether, preventing a definitive identification of the susceptibility gene or genes. Here we develop an alternative to the classic forward genetics approach for dissecting complex disease traits where, instead of identifying susceptibility genes directly affected by variations in DNA, we identify gene networks that are perturbed by susceptibility loci and that in turn lead to disease. Application of this method to liver and adipose gene expression data generated from a segregating mouse population results in the identification of a macrophage-enriched network supported as having a causal relationship with disease traits associated with metabolic syndrome. Three genes in this network, lipoprotein lipase (Lpl), lactamase beta (Lactb) and protein phosphatase 1-like (Ppm1l), are validated as previously unknown obesity genes, strengthening the association between this network and metabolic disease traits. Our analysis provides direct experimental support that complex traits such as obesity are emergent properties of molecular networks that are modulated by complex genetic loci and environmental factors.