We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular ...bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.
Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is characterized by respiratory distress, multiorgan dysfunction, and, in some cases, ...death. The pathological mechanisms underlying COVID-19 respiratory distress and the interplay with aggravating risk factors have not been fully defined. Lung autopsy samples from 18 patients with fatal COVID-19, with symptom onset-to-death times ranging from 3 to 47 days, and antemortem plasma samples from 6 of these cases were evaluated using deep sequencing of SARS-CoV-2 RNA, multiplex plasma protein measurements, and pulmonary gene expression and imaging analyses. Prominent histopathological features in this case series included progressive diffuse alveolar damage with excessive thrombosis and late-onset pulmonary tissue and vascular remodeling. Acute damage at the alveolar-capillary barrier was characterized by the loss of surfactant protein expression with injury to alveolar epithelial cells, endothelial cells, respiratory epithelial basal cells, and defective tissue repair processes. Other key findings included impaired clot fibrinolysis with increased concentrations of plasma and lung plasminogen activator inhibitor-1 and modulation of cellular senescence markers, including p21 and sirtuin-1, in both lung epithelial and endothelial cells. Together, these findings further define the molecular pathological features underlying the pulmonary response to SARS-CoV-2 infection and provide important insights into signaling pathways that may be amenable to therapeutic intervention.
Screening agrochemicals and pharmaceuticals for potential liver toxicity is required for regulatory approval and is an expensive and time-consuming process. The identification and utilization of ...early exposure gene signatures and robust predictive models in regulatory toxicity testing has the potential to reduce time and costs substantially. In this study, comparative supervised machine learning approaches were applied to the rat liver TG-GATEs dataset to develop feature selection and predictive testing. We identified ten gene biomarkers using three different feature selection methods that predicted liver necrosis with high specificity and selectivity in an independent validation dataset from the Microarray Quality Control (MAQC)-II study. Nine of the ten genes that were selected with the supervised methods are involved in metabolism and detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, and Tkfc) and transcriptional regulation (Ablim3). Several of these genes are also implicated in liver carcinogenesis, including Crat, Car3 and Slc23a1. Our biomarker gene signature provides high statistical accuracy and a manageable number of genes to study as indicators to potentially accelerate toxicity testing based on their ability to induce liver necrosis and, eventually, liver cancer.
Epithelial ovarian cancer (OC) is the most deadly cancer of the female reproductive system. To date, there is no effective screening method for early detection of OC and current diagnostic ...armamentarium may include sonographic grading of the tumor and analyzing serum levels of tumor markers, Cancer Antigen 125 (CA-125) and Human epididymis protein 4 (HE4). Microorganisms (bacterial, archaeal, and fungal cells) residing in mucosal tissues including the gastrointestinal and urogenital tracts can be altered by different disease states, and these shifts in microbial dynamics may help to diagnose disease states. We hypothesized that the peritoneal microbial environment was altered in patients with OC and that inclusion of selected peritoneal microbial features with current clinical features into prediction analyses will improve detection accuracy of patients with OC. Blood and peritoneal fluid were collected from consented patients that had sonography confirmed adnexal masses and were being seen at SIU School of Medicine Simmons Cancer Institute. Blood was processed and serum HE4 and CA-125 were measured. Peritoneal fluid was collected at the time of surgery and processed for Next Generation Sequencing (NGS) using 16S V4 exon bacterial primers and bioinformatics analyses. We found that patients with OC had a unique peritoneal microbial profile compared to patients with a benign mass. Using ensemble modeling and machine learning pathways, we identified 18 microbial features that were highly specific to OC pathology. Prediction analyses confirmed that inclusion of microbial features with serum tumor marker levels and control features (patient age and BMI) improved diagnostic accuracy compared to currently used models. We conclude that OC pathogenesis alters the peritoneal microbial environment and that these unique microbial features are important for accurate diagnosis of OC. Our study warrants further analyses of the importance of microbial features in regards to oncological diagnostics and possible prognostic and interventional medicine.
Accurate detection and risk stratification of latent tuberculosis infection (LTBI) remains a major clinical and public health problem. We hypothesize that multiparameter strategies that probe immune ...responses to Mycobacterium tuberculosis can provide new diagnostic insights into not only the status of LTBI infection, but also the risk of reactivation. After the initial proof-of-concept study, we developed a 13-plex immunoassay panel to profile cytokine release from peripheral blood mononuclear cells stimulated separately with Mtb-relevant and non-specific antigens to identify putative biomarker signatures. We sequentially enrolled 65 subjects with various risk of TB exposure, including 32 subjects with diagnosis of LTBI. Random Forest feature selection and statistical data reduction methods were applied to determine cytokine levels across different normalized stimulation conditions. Receiver Operator Characteristic (ROC) analysis for full and reduced feature sets revealed differences in biomarkers signatures for LTBI status and reactivation risk designations. The reduced set for increased risk included IP-10, IL-2, IFN-γ, TNF-α, IL-15, IL-17, CCL3, and CCL8 under varying normalized stimulation conditions. ROC curves determined predictive accuracies of > 80% for both LTBI diagnosis and increased risk designations. Our study findings suggest that a multiparameter diagnostic approach to detect normalized cytokine biomarker signatures might improve risk stratification in LTBI.
Interstitial cystitis/bladder pain syndrome (IC) is associated with significant morbidity, yet underlying mechanisms and diagnostic biomarkers remain unknown. Pelvic organs exhibit neural crosstalk ...by convergence of visceral sensory pathways, and rodent studies demonstrate distinct bacterial pain phenotypes, suggesting that the microbiome modulates pelvic pain in IC. Stool samples were obtained from female IC patients and healthy controls, and symptom severity was determined by questionnaire. Operational taxonomic units (OTUs) were identified by16S rDNA sequence analysis. Machine learning by Extended Random Forest (ERF) identified OTUs associated with symptom scores. Quantitative PCR of stool DNA with species-specific primer pairs demonstrated significantly reduced levels of E. sinensis, C. aerofaciens, F. prausnitzii, O. splanchnicus, and L. longoviformis in microbiota of IC patients. These species, deficient in IC pelvic pain (DIPP), were further evaluated by Receiver-operator characteristic (ROC) analyses, and DIPP species emerged as potential IC biomarkers. Stool metabolomic studies identified glyceraldehyde as significantly elevated in IC. Metabolomic pathway analysis identified lipid pathways, consistent with predicted metagenome functionality. Together, these findings suggest that DIPP species and metabolites may serve as candidates for novel IC biomarkers in stool. Functional changes in the IC microbiome may also serve as therapeutic targets for treating chronic pelvic pain.
Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and ...engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.
Estimating the individualized treatment effect has become one of the most popular topics in statistics and machine learning communities in recent years. Most existing methods focus on modeling the ...heterogeneous treatment effects for univariate outcomes. However, many biomedical studies are interested in studying multiple highly correlated endpoints at the same time. We propose a random forest model that simultaneously estimates individualized treatment effects of multivariate outcomes. We consider a popular study design where covariates and outcomes are measured both before and after the intervention. The proposed model uses oblique splitting rules to partition population space to the neighborhood that experiences distinct treatment effects. An extensive simulation study suggests that the proposed method outperforms existing methods in various nonlinear settings. We further apply the proposed method to two nutrition studies investigating the effects of food consumption on gastrointestinal microbiota composition and clinical biomarkers. The method has been implemented in a freely available R package MOTE.RF at
https://github.com/boyiguo1/MOTE.RF
.
In this study, we examined the relationships between anti-influenza virus serum antibody titers, clinical disease, and peripheral blood leukocyte (PBL) global gene expression during presymptomatic, ...acute, and convalescent illness in 83 participants infected with 2009 pandemic H1N1 virus in a human influenza challenge model. Using traditional statistical and logistic regression modeling approaches, profiles of differentially expressed genes that correlated with active viral shedding, predicted length of viral shedding, and predicted illness severity were identified. These analyses further demonstrated that challenge participants fell into three peripheral blood leukocyte gene expression phenotypes that significantly correlated with different clinical outcomes and prechallenge serum titers of antibodies specific for the viral neuraminidase, hemagglutinin head, and hemagglutinin stalk. Higher prechallenge serum antibody titers were inversely correlated with leukocyte responsiveness in participants with active disease and could mask expression of peripheral blood markers of clinical disease in some participants, including viral shedding and symptom severity. Consequently, preexisting anti-influenza antibodies may modulate PBL gene expression, and this must be taken into consideration in the development and interpretation of peripheral blood diagnostic and prognostic assays of influenza infection.
Influenza A viruses are significant human pathogens that caused 83,000 deaths in the United States during 2017 to 2018, and there is need to understand the molecular correlates of illness and to identify prognostic markers of viral infection, symptom severity, and disease course. Preexisting antibodies against viral neuraminidase (NA) and hemagglutinin (HA) proteins play a critical role in lessening disease severity. We performed global gene expression profiling of peripheral blood leukocytes collected during acute and convalescent phases from a large cohort of people infected with A/H1N1pdm virus. Using statistical and machine-learning approaches, populations of genes were identified early in infection that correlated with active viral shedding, predicted length of shedding, or disease severity. Finally, these gene expression responses were differentially affected by increased levels of preexisting influenza antibodies, which could mask detection of these markers of contagiousness and disease severity in people with active clinical disease.
Pancreatic cancer is a devastating disease often detected at later stages, necessitating swift and effective chemotherapy treatment. However, chemoresistance is common and its mechanisms are poorly ...understood. Here, label-free multi-modal nonlinear optical microscopy was applied to study microstructural and functional features of pancreatic tumors in vivo to monitor inter- and intra-tumor heterogeneity and treatment response. Patient-derived xenografts with human pancreatic ductal adenocarcinoma were implanted into mice and characterized over five weeks of intraperitoneal chemotherapy (FIRINOX or Gem/NabP) with known responsiveness/resistance. Resistant and responsive tumors exhibited a similar initial metabolic response, but by week 5 the resistant tumor deviated significantly from the responsive tumor, indicating that a representative response may take up to five weeks to appear. This biphasic metabolic response in a chemoresistant tumor reveals the possibility of intra-tumor spatiotemporal heterogeneity of drug responsiveness. These results, though limited by small sample size, suggest the possibility for further work characterizing chemoresistance mechanisms using nonlinear optical microscopy.