The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, ...such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype-phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.
Time is a crucial parameter in the assessment of comorbidities in population-based studies, as it permits to identify more complex disease patterns apart from the pairwise disease associations. So ...far, it has been, either, completely ignored or only, taken into account by assessing the temporal directionality of identified comorbidity pairs. In this work, a novel time-analysis framework is presented for large-scale comorbidity studies. The disease-history vectors of patients of a regional Spanish health dataset are represented as time sequences of ordered disease diagnoses. Statistically significant pairwise disease associations are identified and their temporal directionality is assessed. Subsequently, an unsupervised clustering algorithm, based on Dynamic Time Warping, is applied on the common disease trajectories in order to group them according to the temporal patterns that they share. The proposed methodology for the temporal assessment of such trajectories could serve as the preliminary basis of a disease prediction system.
PsyGeNET (Psychiatric disorders and Genes association NETwork) is a knowledge platform for the exploratory analysis of psychiatric diseases and their associated genes. PsyGeNET is composed of a ...database and a web interface supporting data search, visualization, filtering and sharing. PsyGeNET integrates information from DisGeNET and data extracted from the literature by text mining, which has been curated by domain experts. It currently contains 2642 associations between 1271 genes and 37 psychiatric disease concepts. In its first release, PsyGeNET is focused on three psychiatric disorders: major depression, alcohol and cocaine use disorders. PsyGeNET represents a comprehensive, open access resource for the analysis of the molecular mechanisms underpinning psychiatric disorders and their comorbidities.
The PysGeNET platform is freely available at http://www.psygenet.org/. The PsyGeNET database is made available under the Open Database License (http://opendatacommons.org/licenses/odbl/1.0/).
lfurlong@imim.es
Supplementary data are available at Bioinformatics online.
Abstract
Motivation
The study of comorbidities is a major priority due to their impact on life expectancy, quality of life and healthcare cost. The availability of electronic health records (EHRs) ...for data mining offers the opportunity to discover disease associations and comorbidity patterns from the clinical history of patients gathered during routine medical care. This opens the need for analytical tools for detection of disease comorbidities, including the investigation of their underlying genetic basis.
Results
We present comoRbidity, an R package aimed at providing a systematic and comprehensive analysis of disease comorbidities from both the clinical and molecular perspectives. comoRbidity leverages from (i) user provided clinical data from EHR databases (the clinical comorbidity analysis) and (ii) genotype-phenotype information of the diseases under study (the molecular comorbidity analysis) for a comprehensive analysis of disease comorbidities. The clinical comorbidity analysis enables identifying significant disease comorbidities from clinical data, including sex and age stratification and temporal directionality analyses, while the molecular comorbidity analysis supports the generation of hypothesis on the underlying mechanisms of the disease comorbidities by exploring shared genes among disorders. The open-source comoRbidity package is a software tool aimed at expediting the integrative analysis of disease comorbidities by incorporating several analytical and visualization functions.
Availability and implementation
https://bitbucket.org/ibi_group/comorbidity
Supplementary information
Supplementary data are available at Bioinformatics online.
Abstract
Objective
To identify differences related to sex and define autism spectrum disorder (ASD) comorbidities female-enriched through a comprehensive multi-PheWAS intersection approach on big, ...real-world data. Although sex difference is a consistent and recognized feature of ASD, additional clinical correlates could help to identify potential disease subgroups, based on sex and age.
Materials and Methods
We performed a systematic comorbidity analysis on 1860 groups of comorbidities exploring all spectrum of known disease, in 59 140 individuals (11 440 females) with ASD from 4 age groups. We explored ASD sex differences in 2 independent real-world datasets, across all potential comorbidities by comparing (1) females with ASD vs males with ASD and (2) females with ASD vs females without ASD.
Results
We identified 27 different comorbidities that appeared significantly more frequently in females with ASD. The comorbidities were mostly neurological (eg, epilepsy, odds ratio OR > 1.8, 3-18 years of age), congenital (eg, chromosomal anomalies, OR > 2, 3-18 years of age), and mental disorders (eg, intellectual disability, OR > 1.7, 6-18 years of age). Novel comorbidities included endocrine metabolic diseases (eg, failure to thrive, OR = 2.5, ages 0-2), digestive disorders (gastroesophageal reflux disease: OR = 1.7, 6-11 years of age; and constipation: OR > 1.6, 3-11 years of age), and sense organs (strabismus: OR > 1.8, 3-18 years of age).
Discussion
A multi-PheWAS intersection approach on real-world data as presented in this study uniquely contributes to the growing body of research regarding sex-based comorbidity analysis in ASD population.
Conclusions
Our findings provide insights into female-enriched ASD comorbidities that are potentially important in diagnosis, as well as the identification of distinct comorbidity patterns influencing anticipatory treatment or referrals. The code is publicly available (https://github.com/hms-dbmi/sexDifferenceInASD).
Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies.
We provide a ...straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset.
Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics.
We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?
Abstract
Motivation
In the era of big data and precision medicine, the number of databases containing clinical, environmental, self-reported and biochemical variables is increasing exponentially. ...Enabling the experts to focus on their research questions rather than on computational data management, access and analysis is one of the most significant challenges nowadays.
Results
We present Rcupcake, an R package that contains a variety of functions for leveraging different databases through the BD2K PIC-SURE RESTful API and facilitating its query, analysis and interpretation. The package offers a variety of analysis and visualization tools, including the study of the phenotype co-occurrence and prevalence, according to multiple layers of data, such as phenome, exposome or genome.
Availability and implementation
The package is implemented in R and is available under Mozilla v2 license from GitHub (https://github.com/hms-dbmi/Rcupcake). Two reproducible case studies are also available (https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/SSCcaseStudy_v01.ipynb, https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/NHANEScaseStudy_v01.ipynb).
Supplementary information
Supplementary data are available at Bioinformatics online.
Psychiatric disorders have a great impact on morbidity and mortality. Genotype-phenotype resources for psychiatric diseases are key to enable the translation of research findings to a better care of ...patients. PsyGeNET is a knowledge resource on psychiatric diseases and their genes, developed by text mining and curated by domain experts.
We present psygenet2r, an R package that contains a variety of functions for leveraging PsyGeNET database and facilitating its analysis and interpretation. The package offers different types of queries to the database along with variety of analysis and visualization tools, including the study of the anatomical structures in which the genes are expressed and gaining insight of gene's molecular function. Psygenet2r is especially suited for network medicine analysis of psychiatric disorders.
The package is implemented in R and is available under MIT license from Bioconductor (http://bioconductor.org/packages/release/bioc/html/psygenet2r.html).
juanr.gonzalez@isglobal.org or laura.furlong@upf.edu.
Supplementary data are available at Bioinformatics online.
Contraceptive method choice is often strongly influenced by the experiences and opinions of one’s social network. Although social media, including Twitter, increasingly influences reproductive-age ...individuals, discussion of contraception in this setting has yet to be characterized. Natural language processing, a type of machine learning in which computers analyze natural language data, enables this analysis.
This study aimed to illuminate temporal trends in attitudes toward long- and short-acting reversible contraceptive methods in tweets between 2006 and 2019 and establish social media platforms as alternate data sources for large-scale sentiment analysis on contraception.
We studied English-language tweets mentioning reversible prescription contraceptive methods between March 2006 (founding of Twitter) and December 2019. Tweets mentioning contraception were extracted using search terms, including generic or brand names, colloquial names, and abbreviations. We characterized and performed sentiment analysis on tweets. We used Mann-Kendall nonparametric tests to assess temporal trends in the overall number and the number of positive, negative, and neutral tweets referring to each method. The code to reproduce this analysis is available at https://github.com/hms-dbmi/contraceptionOnTwitter.
We extracted 838,739 tweets mentioning at least 1 contraceptive method. The annual number of contraception-related tweets increased considerably over the study period. The intrauterine device was the most commonly referenced method (45.9%). Long-acting methods were mentioned more often than short-acting ones (58% vs 42%), and the annual proportion of long-acting reversible contraception-related tweets increased over time. In sentiment analysis of tweets mentioning a single contraceptive method (n=665,064), the greatest proportion of all tweets was negative (65,339 of 160,713 tweets with at least 95% confident sentiment, or 40.66%). Tweets mentioning long-acting methods were nearly twice as likely to be positive compared with tweets mentioning short-acting methods (19.65% vs 10.21%; P<.002).
Recognizing the influence of social networks on contraceptive decision making, social media platforms may be useful in the collection and dissemination of information about contraception.
The frequent occurrence of comorbidities in patients with chronic obstructive pulmonary disease (COPD) suggests that they may share pathobiological processes and/or risk factors.To explore these ...possibilities we compared the clinical diseasome and the molecular diseasome of 5447 COPD patients hospitalised because of an exacerbation of the disease. The clinical diseasome is a network representation of the relationships between diseases, in which diseases are connected if they co-occur more than expected at random; in the molecular diseasome, diseases are linked if they share associated genes or interaction between proteins.The results showed that about half of the disease pairs identified in the clinical diseasome had a biological counterpart in the molecular diseasome, particularly those related to inflammation and vascular tone regulation. Interestingly, the clinical diseasome of these patients appears independent of age, cumulative smoking exposure or severity of airflow limitation.These results support the existence of shared molecular mechanisms among comorbidities in COPD.