Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results ...across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for ...measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjamini-Hochberg adjusted
-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr.
Glioblastoma (GBM) is the most aggressive nervous system cancer. Understanding its molecular pathogenesis is crucial to improving diagnosis and treatment. Integrated analysis of genomic, proteomic, ...post-translational modification and metabolomic data on 99 treatment-naive GBMs provides insights to GBM biology. We identify key phosphorylation events (e.g., phosphorylated PTPN11 and PLCG1) as potential switches mediating oncogenic pathway activation, as well as potential targets for EGFR-, TP53-, and RB1-altered tumors. Immune subtypes with distinct immune cell types are discovered using bulk omics methodologies, validated by snRNA-seq, and correlated with specific expression and histone acetylation patterns. Histone H2B acetylation in classical-like and immune-low GBM is driven largely by BRDs, CREBBP, and EP300. Integrated metabolomic and proteomic data identify specific lipid distributions across subtypes and distinct global metabolic changes in IDH-mutated tumors. This work highlights biological relationships that could contribute to stratification of GBM patients for more effective treatment.
Display omitted
•Phosphorylated PTPN11 and PLCG1 represent a signaling hub in RTK-altered tumors•Four immune GBM subtypes exist, characterized by distinct immune cell populations•Mesenchymal subtype EMT signature is specific to tumor cells but not to stroma•Histone H2B acetylation is enriched in classical GBMs with low macrophage content
Wang et al. perform integrated proteogenomic analysis of adult glioblastoma (GBM), including metabolomics, lipidomics, and single nuclei RNA-Seq, revealing insights into the immune landscape of GBM, cell-specific nature of EMT signatures, histone acetylation in classical GBM, and the existence of signaling hubs which could provide therapeutic vulnerabilities.
Within the basal ganglia circuit, the external globus pallidus (GPe) is critically involved in motor control. Aside from Foxp2
neurons and ChAT
neurons that have been established as unique neuron ...types, there is little consensus on the classification of GPe neurons. Properties of the remaining neuron types are poorly defined. In this study, we leverage new mouse lines, viral tools, and molecular markers to better define GPe neuron subtypes. We found that Sox6 represents a novel, defining marker for GPe neuron subtypes. Lhx6
neurons that lack the expression of Sox6 were devoid of both parvalbumin and Npas1. This result confirms previous assertions of the existence of a unique Lhx6
population. Neurons that arise from the Dbx1
lineage were similarly abundant in the GPe and displayed a heterogeneous makeup. Importantly, tracing experiments revealed that Npas1
-Nkx2.1
neurons represent the principal noncholinergic, cortically-projecting neurons. In other words, they form the pallido-cortical arm of the cortico-pallido-cortical loop. Our data further show that pyramidal-tract neurons in the cortex collateralized within the GPe, forming a closed-loop system between the two brain structures. Overall, our findings reconcile some of the discrepancies that arose from differences in techniques or the reliance on preexisting tools. Although spatial distribution and electrophysiological properties of GPe neurons reaffirm the diversification of GPe subtypes, statistical analyses strongly support the notion that these neuron subtypes can be categorized under the two principal neuron classes: PV
neurons and Npas1
neurons.
The poor understanding of the neuronal composition in the external globus pallidus (GPe) undermines our ability to interrogate its precise behavioral and disease involvements. In this study, 12 different genetic crosses were used, hundreds of neurons were electrophysiologically characterized, and >100,000 neurons were histologically- and/or anatomically-profiled. Our current study further establishes the segregation of GPe neuron classes and illustrates the complexity of GPe neurons in adult mice. Our results support the idea that Npas1
-Nkx2.1
neurons are a distinct GPe neuron subclass. By providing a detailed analysis of the organization of the cortico-pallidal-cortical projection, our findings establish the cellular and circuit substrates that can be important for motor function and dysfunction.
Human cancer is caused by the accumulation of mutations in oncogenes and tumor suppressor genes. To catalog the genetic changes that occur during tumorigenesis, we isolated DNA from 11 breast and 11 ...colorectal tumors and determined the sequences of the genes in the Reference Sequence database in these samples. Based on analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene "mountains" and a much larger number of gene "hills" that are mutated at low frequency. We describe statistical and bioinformatic tools that may help identify mutations with a role in tumorigenesis. These results have implications for understanding the nature and heterogeneity of human cancers and for using personal genomics for tumor diagnosis and therapy.
Abstract
Motivation
The biological pathways linking exposures and disease risk are often poorly understood. To gain insight into these pathways, studies may try to identify biomarkers that mediate ...the exposure/disease relationship. Such studies often simultaneously test hundreds or thousands of biomarkers.
Results
We consider a set of m biomarkers and a corresponding set of null hypotheses, where the jth null hypothesis states that biomarker j does not mediate the exposure/disease relationship. We propose a Multiple Comparison Procedure (MCP) that rejects a set of null hypotheses or, equivalently, identifies a set of mediators, while asymptotically controlling the Family-Wise Error Rate (FWER) or False Discovery Rate (FDR). We use simulations to show that, compared to currently available methods, our proposed method has higher statistical power to detect true mediators. We then apply our method to a breast cancer study and identify nine metabolites that may mediate the known relationship between an increased BMI and an increased risk of breast cancer.
Availability and implementation
R package MultiMed on https://github.com/SiminaB/MultiMed.
Supplementary information
Supplementary data are available at Bioinformatics online.
In this study, we consider admixed populations through their
expected heterozygosity
, a measure of genetic diversity. A population is termed
admixed
if its members possess recent ancestry from two ...or more separate sources. As a result of the fusion of source populations with different genetic variants, admixed populations can exhibit high levels of genetic diversity, reflecting contributions of their multiple ancestral groups. For a model of an admixed population derived from
K
source populations, we obtain a relationship between its heterozygosity and its proportions of admixture from the various source populations. We show that the heterozygosity of the admixed population is at least as great as that of the least heterozygous source population, and that it potentially exceeds the heterozygosities of
all
of the source populations. The admixture proportions that maximize the heterozygosity possible for an admixed population formed from a specified set of source populations are also obtained under specific conditions. We examine the special case of
K
=
2
source populations in detail, characterizing the maximal admixture in terms of the heterozygosities of the two source populations and the value of
F
ST
between them. In this case, the heterozygosity of the admixed population exceeds the maximal heterozygosity of the source groups if the divergence between them, measured by
F
ST
, is large enough, namely above a certain bound that is a function of the heterozygosities of the source groups. We present applications to simulated data as well as to data from human admixture scenarios, providing results useful for interpreting the properties of genetic variability in admixed populations.
Life expectancy can be estimated accurately from a cohort of individuals born in the same year and followed from birth to death. However, due to the resource-consuming nature of following a cohort ...prospectively, life expectancy is often assessed based upon retrospective death record reviews. This conventional approach may lead to potentially biased estimates, in particular when estimating life expectancy of rare diseases such as Morquio syndrome A. We investigated the accuracy of life expectancy estimation using death records by simulating the survival of individuals with Morquio syndrome A under four different scenarios.
When life expectancy was constant during the entire period, using death data did not result in a biased estimate. However, when life expectancy increased over time, as is often expected to be the case in rare diseases, using only death data led to a substantial underestimation of life expectancy. We emphasize that it is therefore crucial to understand how estimates of life expectancy are obtained, to interpret them in an appropriate context, and to assess estimation methods within a sensitivity analysis framework, similar to the simulations performed herein.
The external globus pallidus (GPe) is a critical node within the basal ganglia circuit. Phasic changes in the activity of GPe neurons during movement and their alterations in Parkinson's disease (PD) ...argue that the GPe is important in motor control. Parvalbumin-positive (PV
) neurons and Npas1
neurons are the two principal neuron classes in the GPe. The distinct electrophysiological properties and axonal projection patterns argue that these two neuron classes serve different roles in regulating motor output. However, the causal relationship between GPe neuron classes and movement remains to be established. Here, by using optogenetic approaches in mice (both males and females), we showed that PV
neurons and Npas1
neurons promoted and suppressed locomotion, respectively. Moreover, PV
neurons and Npas1
neurons are under different synaptic influences from the subthalamic nucleus (STN). Additionally, we found a selective weakening of STN inputs to PV
neurons in the chronic 6-hydroxydopamine lesion model of PD. This finding reinforces the idea that the reciprocally connected GPe-STN network plays a key role in disease symptomatology and thus provides the basis for future circuit-based therapies.
The external pallidum is a key, yet an understudied component of the basal ganglia. Neural activity in the pallidum goes awry in neurologic diseases, such as Parkinson's disease. While this strongly argues that the pallidum plays a critical role in motor control, it has been difficult to establish the causal relationship between pallidal activity and motor function/dysfunction. This was in part because of the cellular complexity of the pallidum. Here, we showed that the two principal neuron types in the pallidum have opposing roles in motor control. In addition, we described the differences in their synaptic influence. Importantly, our research provides new insights into the cellular and circuit mechanisms that explain the hypokinetic features of Parkinson's disease.
Modern biomedical and epidemiological studies often measure hundreds or thousands of biomarkers, such as gene expression or metabolite levels. Although there is an extensive statistical literature on ...adjusting for 'multiple comparisons' when testing whether these biomarkers are directly associated with a disease, testing whether they are biological mediators between a known risk factor and a disease requires a more complex null hypothesis, thus offering additional methodological challenges.
We propose a permutation approach that tests multiple putative mediators and controls the family wise error rate. We demonstrate that, unlike when testing direct associations, replacing the Bonferroni correction with a permutation approach that focuses on the maximum of the test statistics can significantly improve the power to detect mediators even when all biomarkers are independent. Through simulations, we show the power of our method is 2-5× larger than the power achieved by Bonferroni correction. Finally, we apply our permutation test to a case-control study of dietary risk factors and colorectal adenoma to show that, of 149 test metabolites, docosahexaenoate is a possible mediator between fish consumption and decreased colorectal adenoma risk.
R-package included in online Supplementary Material.