This paper concerns a study indicating that the expression levels of genes in signaling pathways can be modeled using a causal Bayesian network (BN) that is altered in tumorous tissue. These results ...open up promising areas of future research that can help identify driver genes and therapeutic targets. So, it is most appropriate for the cancer informatics community.
Our central hypothesis is that the expression levels of genes that code for proteins on a signal transduction network (STP) are causally related and that this causal structure is altered when the STP is involved in cancer. To test this hypothesis, we analyzed 5 STPs associated with breast cancer, 7 STPs associated with other cancers, and 10 randomly chosen pathways, using a breast cancer gene expression level dataset containing 529 cases and 61 controls. We identified all the genes related to each of the 22 pathways and developed separate gene expression datasets for each pathway. We obtained significant results indicating that the causal structure of the expression levels of genes coding for proteins on STPs, which are believed to be implicated in both breast cancer and in all cancers, is more altered in the cases relative to the controls than the causal structure of the randomly chosen pathways.
The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient ...survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis.
This paper concerns a study indicating that the expression levels of genes in signaling pathways can be modeled using a causal Bayesian network (BN) that is altered in tumorous tissue. These results ...open up promising areas of future research that can help identify driver genes and therapeutic targets. So, it is most appropriate for the cancer informatics community. Our central hypothesis is that the expression levels of genes that code for proteins on a signal transduction network (STP) are causally related and that this causal structure is altered when the STP is involved in cancer. To test this hypothesis, we analyzed 5 STPs associated with breast cancer, 7 STPs associated with other cancers, and 10 randomly chosen pathways, using a breast cancer gene expression level dataset containing 529 cases and 61 controls. We identified all the genes related to each of the 22 pathways and developed separate gene expression datasets for each pathway. We obtained significant results indicating that the causal structure of the expression levels of genes coding for proteins on STPs, which are believed to be implicated in both breast cancer and in all cancers, is more altered in the cases relative to the controls than the causal structure of the randomly chosen pathways. KEYWORDS: signal transduction pathway, bayesian network, gene expression level, breast cancer, causal structure
Medical diagnosis is the process of determining the nature of a disease and distinguishing it from other similar diseases. A diagnostic error happens when a diagnosis is missed, inappropriately ...delayed, or inaccurate. Diagnostic error accounts for the most severe patient harm, the largest fraction of claims, and highest total penalty payouts. One way to reduce diagnostic error is to use a computer-aided diagnostic (CAD) system to augment doctors’ diagnostic abilities. More and more machine learning algorithms have been applied to the medical diagnosis field and achieve good performance. However, because most of the models are very complicated and the diagnostic process is different from physicians’ workflow, physicians usually do not trust those models.My dissertation investigates how to combine electronic health record (EHR) data with medical knowledge to generate a sequential diagnostic system that utilizes clinical alignment, which is when the diagnostic process is in line with physicians’ diagnostic process. The new system has two main characteristics: (1) data-driven so that we can use EHR data and machine learning algorithms for developing a multi-label classification system; (2) clinical knowledge-driven so that valuable clinical diagnostic knowledge can be integrated into the system.I have developed (1) a framework that can integrate pre-defined medical knowledge with disease patterns in EHR data for sequential diagnosis and (2) an algorithm that generates medical diagnostic trees that recommend diagnostic actions by considering clinical workflow, diagnostic accuracy, and misdiagnosis costs. Experiments show that the learned model has better clinical alignment, higher diagnostic accuracy, and lower misdiagnosis costs than baseline models, which were developed using a traditional multi-label classification tree algorithm (ML-C4.5) and a deep reinforcement learning algorithm (deep Q learning), respectively.
Abstract only
Introduction:
Patient symptom data are recorded in free text notes in the electronic health records and are difficult to extract. Natural language processing (NLP) is a technique that ...can be used to mine key patient symptoms that can be combined with risk stratification tools to improve patient outcomes.
Hypothesis:
Adding symptom data to the National Early Warning Score (NEWS) and quick sequential organ failure assessment score (qSOFA) will improve the area under the receiver operating curve (AUC) performance.
Methods:
NEWS scores and qSOFA scores were calculated with assistance from an emergency medicine physician for each patient with suspicion of acute coronary syndrome at initial emergency department (ED) assessment. Independent reviewers annotated the outcome of admission to the intensive care unit as a binary outcome (yes/no). The Apache clinical Text Analysis Knowledge Extraction System (cTAKES), an open access NLP platform, was used to preprocess and run analysis on patient free text ED history of present illness notes. Patient symptom data were extracted and used as patient features/input variables and combined with variables from the NEWS and qSOFA scores. Logistic regression model was used to do prediction. We used DeLong’s test to determine the statistical difference between each scores’ AUC.
Results:
70 patients out of 1195 had an intensive care admission. The AUC for the NEWS and qSOFA scores were 0.75 and 0.67, respectively. After adding symptoms features, the AUCs improved to 0.81 and 0.79, respectively. There were statistical differences between the AUCs between each score.
Conclusion:
Adding patient symptom data improves performance of both the NEWS and qSOFA risk stratification scores. Enhanced risk stratification scores with symptom data could identify those at high risk for clinical deterioration and could help augment accurate decision making at ED disposition.
The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network ...(BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions.
We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions. The methods are as follows: naive Bayes (NB), model averaging NB (MANB), feature selection NB (FSNB), EBMC, logistic regression (LR), support vector machines (SVM), Lasso, and extreme learning machines (ELM). We use a hundred 1000-single nucleotide polymorphism (SNP) simulated datasets, ten 10,000-SNP datasets, six semi-synthetic sets, and two real genome-wide association studies (GWAS) datasets in our evaluation.
In fivefold cross-validation studies, the SVM performed best on the 1000-SNP dataset, while the BN-based methods performed best on the other datasets, with EBMC exhibiting the best overall performance. In-sample testing indicates that LR, SVM, Lasso, ELM, and NB tend to overfit the data.
EBMC performed better than NB when there are several strong predictors, whereas NB performed better when there are many weak predictors. Furthermore, for all BN-based methods, prediction capability did not degrade as the dimension increased.
Our results support the hypothesis that EBMC performs well at binary outcome prediction using high-dimensional discrete datasets containing epistatic-like interactions. Future research using more GWAS datasets is needed to further investigate the potential of EBMC.
Display omitted
•Proposed a method to automatically classify symptom severity in psychiatric reports.•Question-answers from reports are the most important source of information.•Best predictive ...models automatically selected features prevalent in literature.
In response to the challenges set forth by the CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing, we describe a framework to automatically classify initial psychiatric evaluation records to one of four positive valence system severities: absent, mild, moderate, or severe. We used a dataset provided by the event organizers to develop a framework comprised of natural language processing (NLP) modules and 3 predictive models (two decision tree models and one Bayesian network model) used in the competition. We also developed two additional predictive models for comparison purpose. To evaluate our framework, we employed a blind test dataset provided by the 2016 CEGS N-GRID. The predictive scores, measured by the macro averaged-inverse normalized mean absolute error score, from the two decision trees and Naïve Bayes models were 82.56%, 82.18%, and 80.56%, respectively. The proposed framework in this paper can potentially be applied to other predictive tasks for processing initial psychiatric evaluation records, such as predicting 30-day psychiatric readmissions.
The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient ...survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis.
Araliaceae species produce various classes of triterpene and triterpenoid saponins, such as the oleanane-type triterpenoids in Aralia species and dammarane-type saponins in Panax, valued for their ...medicinal properties. The lack of genome sequences of Panax relatives has hindered mechanistic insight into the divergence of triterpene saponins in Araliaceae. Here, we report a chromosome-level genome of Aralia elata with a total length of 1.05 Gb. The loss of 12 exons in the dammarenediol synthase (DDS)-encoding gene in A. elata after divergence from Panax might have caused the lack of dammarane-type saponin production, and a complementation assay shows that overexpression of the PgDDS gene from Panax ginseng in callus of A. elata recovers the accumulation of dammarane-type saponins. Tandem duplication events of triterpene biosynthetic genes are common in the A. elata genome, especially for AeCYP72As, AeCSLMs, and AeUGT73s, which function as tailoring enzymes of oleanane-type saponins and aralosides. More than 13 aralosides are de novo synthesized in Saccharomyces cerevisiae by overexpression of these genes in combination. This study sheds light on the diversity of saponins biosynthetic pathway in Araliaceae and will facilitate heterologous bioproduction of aralosides.
The early prediction of adolescent depression recurrence poses a significant challenge in the field. This study aims to investigate and compare the abilities of the general psychopathology factor (
) ...and the specific internalizing factor, in predicting depression recurrence over a 2-year course, as well as identifying remitted depressed adolescents from healthy adolescents. Longitudinal changes of these two factors in different trajectory groups were also tracked to examine their sensitivity to sustained remission and relapse.
We included 255 baseline-remitted depressed adolescents and a healthy control group (
= 255) matched in age, sex, and race, sourced from the Adolescent Brain Cognitive Development Study. The linear mixed model was employed for the statistical analysis.
The
factor not only effectively discriminated between remitted depressed adolescents and healthy controls but also robustly predicted the depression recurrence over a subsequent 2-year course. The specific internalizing factor could only differentiate remitted depressed adolescents from healthy controls. Additionally, a noteworthy longitudinal decline of the
factor in the sustained-remission group was observed.
Psychopathology factors serve as the inherent and enduring measurement of long-term mental health aberrations. Longitudinal results indicate that the
factor is more sensitive to respond to sustained remission than the internalizing factor. The ability of the overall
factor to anticipate depression relapse, unlike the specific internalizing factor, suggests the clinical interventions should monitor and mitigate the coincident symptoms across all dimensions to preempt relapse of adolescent depression, rather than an exclusive focus on internalizing symptoms.