A network approach to finding disease modules
Shared genes represent a powerful but limited representation of the mechanistic relationship between two diseases. However, the analysis of ...protein-protein interactions has been hampered by the incompleteness of interactome maps. Menche
et al.
formulated the mathematical conditions needed to allow a disease module (a localized region of connections between disease-related proteins) to be observed. Only diseases with data coverage that exceeds a specific threshold have identifiable disease modules. The network-based distance between two disease modules revealed that disease pairs that are predicted to have overlapping modules had statistically significant molecular similarity. These similarities encompassed their protein components, gene expression, symptoms, and morbidity. Molecular-level links between diseases lacking shared disease genes could also be identified.
Science
, this issue
10.1126/science.1257601
Incomplete networks of protein-protein interactions help explain disease relationships, even in the absence of shared genes.
INTRODUCTION
A disease is rarely a straightforward consequence of an abnormality in a single gene, but rather reflects the interplay of multiple molecular processes. The relationships among these processes are encoded in the interactome, a network that integrates all physical interactions within a cell, from protein-protein to regulatory protein–DNA and metabolic interactions. The documented propensity of disease-associated proteins to interact with each other suggests that they tend to cluster in the same neighborhood of the interactome, forming a disease module, a connected subgraph that contains all molecular determinants of a disease. The accurate identification of the corresponding disease module represents the first step toward a systematic understanding of the molecular mechanisms underlying a complex disease. Here, we present a network-based framework to identify the location of disease modules within the interactome and use the overlap between the modules to predict disease-disease relationships.
RATIONALE
Despite impressive advances in high-throughput interactome mapping and disease gene identification, both the interactome and our knowledge of disease-associated genes remain incomplete. This incompleteness prompts us to ask to what extent the current data are sufficient to map out the disease modules, the first step toward an integrated approach toward human disease. To make progress, we must formulate mathematically the impact of network incompleteness on the identifiability of disease modules, quantifying the predictive power and the limitations of the current interactome.
RESULTS
Using the tools of network science, we show that we can only uncover disease modules for diseases whose number of associated genes exceeds a critical threshold determined by the network incompleteness. We find that disease proteins associated with 226 diseases are clustered in the same network neighborhood, displaying a statistically significant tendency to form identifiable disease modules. The higher the degree of agglomeration of the disease proteins within the interactome, the higher the biological and functional similarity of the corresponding genes. These findings indicate that many local neighborhoods of the interactome represent the observable part of the true, larger and denser disease modules.
If two disease modules overlap, local perturbations causing one disease can disrupt pathways of the other disease module as well, resulting in shared clinical and pathobiological characteristics. To test this hypothesis, we measure the network-based separation of each disease pair, observing a direct relation between the pathobiological similarity of diseases and their relative distance in the interactome. We find that disease pairs with overlapping disease modules display significant molecular similarity, elevated coexpression of their associated genes, and similar symptoms and high comorbidity. At the same time, non-overlapping disease pairs lack any detectable pathobiological relationships. The proposed network-based distance allows us to predict the pathobiological relationship even for diseases that do not share genes.
CONCLUSION
Despite its incompleteness, the interactome has reached sufficient coverage to allow the systematic investigation of disease mechanisms and to help uncover the molecular origins of the pathobiological relationships between diseases. The introduced network-based framework can be extended to address numerous questions at the forefront of network medicine, from interpreting genome-wide association study data to drug target identification and repurposing.
Diseases within the interactome.
The interactome collects all physical interactions between a cell’s molecular components. Proteins associated with the same disease form connected subgraphs, called disease modules, shown for multiple sclerosis (MS), peroxisomal disorders (PD), and rheumatoid arthritis (RA). Disease pairs with overlapping modules (MS and RA) have some phenotypic similarities and high comorbidity. Non-overlapping diseases, like MS and PD, lack detectable clinical relationships.
According to the disease module hypothesis, the cellular components associated with a disease segregate in the same neighborhood of the human interactome, the map of biologically relevant molecular interactions. Yet, given the incompleteness of the interactome and the limited knowledge of disease-associated genes, it is not obvious if the available data have sufficient coverage to map out modules associated with each disease. Here we derive mathematical conditions for the identifiability of disease modules and show that the network-based location of each disease module determines its pathobiological relationship to other diseases. For example, diseases with overlapping network modules show significant coexpression patterns, symptom similarity, and comorbidity, whereas diseases residing in separated network neighborhoods are phenotypically distinct. These tools represent an interactome-based platform to predict molecular commonalities between phenotypically related diseases, even if they do not share primary disease genes.
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data ...generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
The protein–protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes ...such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as “indispensable,” “neutral,” or “dispensable,” which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network’s control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.
Robustness is a prominent feature of most biological systems. Most previous related studies have been focused on homogeneous molecular networks. Here we propose a comprehensive framework for ...understanding how the interactions between genes, proteins and metabolites contribute to the determinants of robustness in a heterogeneous biological network. We integrate heterogeneous sources of data to construct a multilayer interaction network composed of a gene regulatory layer, a protein-protein interaction layer, and a metabolic layer. We design a simulated perturbation process to characterize the contribution of each gene to the overall system's robustness, and find that influential genes are enriched in essential and cancer genes. We show that the proposed mechanism predicts a higher vulnerability of the metabolic layer to perturbations applied to genes associated with metabolic diseases. Furthermore, we find that the real network is comparably or more robust than expected in multiple random realizations. Finally, we analytically derive the expected robustness of multilayer biological networks starting from the degree distributions within and between layers. These results provide insights into the non-trivial dynamics occurring in the cell after a genetic perturbation is applied, confirming the importance of including the coupling between different layers of interaction in models of complex biological systems.
No pharmacological therapy exists for calcific aortic valve disease (CAVD), which confers a dismal prognosis without invasive valve replacement. The search for therapeutics and early diagnostics is ...challenging because CAVD presents in multiple pathological stages. Moreover, it occurs in the context of a complex, multi-layered tissue architecture; a rich and abundant extracellular matrix phenotype; and a unique, highly plastic, and multipotent resident cell population.
A total of 25 human stenotic aortic valves obtained from valve replacement surgeries were analyzed by multiple modalities, including transcriptomics and global unlabeled and label-based tandem-mass-tagged proteomics. Segmentation of valves into disease stage-specific samples was guided by near-infrared molecular imaging, and anatomic layer-specificity was facilitated by laser capture microdissection. Side-specific cell cultures were subjected to multiple calcifying stimuli, and their calcification potential and basal/stimulated proteomes were evaluated. Molecular (protein-protein) interaction networks were built, and their central proteins and disease associations were identified.
Global transcriptional and protein expression signatures differed between the nondiseased, fibrotic, and calcific stages of CAVD. Anatomic aortic valve microlayers exhibited unique proteome profiles that were maintained throughout disease progression and identified glial fibrillary acidic protein as a specific marker of valvular interstitial cells from the spongiosa layer. CAVD disease progression was marked by an emergence of smooth muscle cell activation, inflammation, and calcification-related pathways. Proteins overrepresented in the disease-prone fibrosa are functionally annotated to fibrosis and calcification pathways, and we found that in vitro, fibrosa-derived valvular interstitial cells demonstrated greater calcification potential than those from the ventricularis. These studies confirmed that the microlayer-specific proteome was preserved in cultured valvular interstitial cells, and that valvular interstitial cells exposed to alkaline phosphatase-dependent and alkaline phosphatase-independent calcifying stimuli had distinct proteome profiles, both of which overlapped with that of the whole tissue. Analysis of protein-protein interaction networks found a significant closeness to multiple inflammatory and fibrotic diseases.
A spatially and temporally resolved multi-omics, and network and systems biology strategy identifies the first molecular regulatory networks in CAVD, a cardiac condition without a pharmacological cure, and describes a novel means of systematic disease ontology that is broadly applicable to comprehensive omics studies of cardiovascular diseases.
Despite the global impact of macrophage activation in vascular disease, the underlying mechanisms remain obscure. Here we show, with global proteomic analysis of macrophage cell lines treated with ...either IFNγ or IL-4, that PARP9 and PARP14 regulate macrophage activation. In primary macrophages, PARP9 and PARP14 have opposing roles in macrophage activation. PARP14 silencing induces pro-inflammatory genes and STAT1 phosphorylation in M(IFNγ) cells, whereas it suppresses anti-inflammatory gene expression and STAT6 phosphorylation in M(IL-4) cells. PARP9 silencing suppresses pro-inflammatory genes and STAT1 phosphorylation in M(IFNγ) cells. PARP14 induces ADP-ribosylation of STAT1, which is suppressed by PARP9. Mutations at these ADP-ribosylation sites lead to increased phosphorylation. Network analysis links PARP9-PARP14 with human coronary artery disease. PARP14 deficiency in haematopoietic cells accelerates the development and inflammatory burden of acute and chronic arterial lesions in mice. These findings suggest that PARP9 and PARP14 cross-regulate macrophage activation.
Genes carrying mutations associated with genetic diseases are present in all human cells; yet, clinical manifestations of genetic diseases are usually highly tissue-specific. Although some disease ...genes are expressed only in selected tissues, the expression patterns of disease genes alone cannot explain the observed tissue specificity of human diseases. Here we hypothesize that for a disease to manifest itself in a particular tissue, a whole functional subnetwork of genes (disease module) needs to be expressed in that tissue. Driven by this hypothesis, we conducted a systematic study of the expression patterns of disease genes within the human interactome. We find that genes expressed in a specific tissue tend to be localized in the same neighborhood of the interactome. By contrast, genes expressed in different tissues are segregated in distinct network neighborhoods. Most important, we show that it is the integrity and the completeness of the expression of the disease module that determines disease manifestation in selected tissues. This approach allows us to construct a disease-tissue network that confirms known and predicts unexpected disease-tissue associations.
Low vitamin D status in pregnancy was proposed as a risk factor of preeclampsia.
We assessed the effect of vitamin D supplementation (4,400 vs. 400 IU/day), initiated early in pregnancy (10-18 ...weeks), on the development of preeclampsia. The effects of serum vitamin D (25-hydroxyvitamin D 25OHD) levels on preeclampsia incidence at trial entry and in the third trimester (32-38 weeks) were studied. We also conducted a nested case-control study of 157 women to investigate peripheral blood vitamin D-associated gene expression profiles at 10 to 18 weeks in 47 participants who developed preeclampsia.
Of 881 women randomized, outcome data were available for 816, with 67 (8.2%) developing preeclampsia. There was no significant difference between treatment (N = 408) or control (N = 408) groups in the incidence of preeclampsia (8.08% vs. 8.33%, respectively; relative risk: 0.97; 95% CI, 0.61-1.53). However, in a cohort analysis and after adjustment for confounders, a significant effect of sufficient vitamin D status (25OHD ≥30 ng/ml) was observed in both early and late pregnancy compared with insufficient levels (25OHD <30 ng/ml) (adjusted odds ratio, 0.28; 95% CI, 0.10-0.96). Differential expression of 348 vitamin D-associated genes (158 upregulated) was found in peripheral blood of women who developed preeclampsia (FDR <0.05 in the Vitamin D Antenatal Asthma Reduction Trial VDAART; P < 0.05 in a replication cohort). Functional enrichment and network analyses of this vitamin D-associated gene set suggests several highly functional modules related to systematic inflammatory and immune responses, including some nodes with a high degree of connectivity.
Vitamin D supplementation initiated in weeks 10-18 of pregnancy did not reduce preeclampsia incidence in the intention-to-treat paradigm. However, vitamin D levels of 30 ng/ml or higher at trial entry and in late pregnancy were associated with a lower risk of preeclampsia. Differentially expressed vitamin D-associated transcriptomes implicated the emergence of an early pregnancy, distinctive immune response in women who went on to develop preeclampsia.
ClinicalTrials.gov NCT00920621.
Quebec Breast Cancer Foundation and Genome Canada Innovation Network. This trial was funded by the National Heart, Lung, and Blood Institute. For details see Acknowledgments.
Summary Background Comparison of patients with coronary heart disease and controls in genome-wide association studies has revealed several single nucleotide polymorphisms (SNPs) associated with ...coronary heart disease. We aimed to establish the external validity of these findings and to obtain more precise risk estimates using a prospective cohort design. Methods We tested 13 recently discovered SNPs for association with coronary heart disease in a case-control design including participants differing from those in the discovery samples (3829 participants with prevalent coronary heart disease and 48 897 controls free of the disease) and a prospective cohort design including 30 725 participants free of cardiovascular disease from Finland and Sweden. We modelled the 13 SNPs as a multilocus genetic risk score and used Cox proportional hazards models to estimate the association of genetic risk score with incident coronary heart disease. For case-control analyses we analysed associations between individual SNPs and quintiles of genetic risk score using logistic regression. Findings In prospective cohort analyses, 1264 participants had a first coronary heart disease event during a median 10·7 years' follow-up (IQR 6·7–13·6). Genetic risk score was associated with a first coronary heart disease event. When compared with the bottom quintile of genetic risk score, participants in the top quintile were at 1·66-times increased risk of coronary heart disease in a model adjusting for traditional risk factors (95% CI 1·35–2·04, p value for linear trend=7·3×10−10 ). Adjustment for family history did not change these estimates. Genetic risk score did not improve C index over traditional risk factors and family history (p=0·19), nor did it have a significant effect on net reclassification improvement (2·2%, p=0·18); however, it did have a small effect on integrated discrimination index (0·004, p=0·0006). Results of the case-control analyses were similar to those of the prospective cohort analyses. Interpretation Using a genetic risk score based on 13 SNPs associated with coronary heart disease, we can identify the 20% of individuals of European ancestry who are at roughly 70% increased risk of a first coronary heart disease event. The potential clinical use of this panel of SNPs remains to be defined. Funding The Wellcome Trust; Academy of Finland Center of Excellence for Complex Disease Genetics; US National Institutes of Health; the Donovan Family Foundation.
Recently, long-non-coding RNAs (lncRNAs) have attracted attention because of their emerging role in many important biological mechanisms. The accumulating evidence indicates that the dysregulation of ...lncRNAs is associated with complex diseases. However, only a few lncRNA-disease associations have been experimentally validated and therefore, predicting potential lncRNAs that are associated with diseases become an important task. Current computational approaches often use known lncRNA-disease associations to predict potential lncRNA-disease links. In this work, we exploited the topology of multi-level networks to propose the
ncRNA rank
ng by Netw
rk Diffusio
(LION) approach to identify lncRNA-disease associations. The multi-level complex network consisted of lncRNA-protein, protein-protein interactions, and protein-disease associations. We applied the network diffusion algorithm of LION to predict the lncRNA-disease associations within the multi-level network. LION achieved an AUC value of 96.8% for cardiovascular diseases, 91.9% for cancer, and 90.2% for neurological diseases by using experimentally verified lncRNAs associated with diseases. Furthermore, compared to a similar approach (TPGLDA), LION performed better for cardiovascular diseases and cancer. Given the versatile role played by lncRNAs in different biological mechanisms that are perturbed in diseases, LION's accurate prediction of lncRNA-disease associations helps in ranking lncRNAs that could function as potential biomarkers and potential drug targets.