We previously demonstrated the association between epithelial-to-mesenchymal transition (EMT) and drug response in lung cancer using an EMT signature derived in cancer cell lines. Given the ...contribution of tumor microenvironments to EMT, we extended our investigation of EMT to patient tumors from 11 cancer types to develop a pan-cancer EMT signature.
Using the pan-cancer EMT signature, we conducted an integrated, global analysis of genomic and proteomic profiles associated with EMT across 1,934 tumors including breast, lung, colon, ovarian, and bladder cancers. Differences in outcome and in vitro drug response corresponding to expression of the pan-cancer EMT signature were also investigated.
Compared with the lung cancer EMT signature, the patient-derived, pan-cancer EMT signature encompasses a set of core EMT genes that correlate even more strongly with known EMT markers across diverse tumor types and identifies differences in drug sensitivity and global molecular alterations at the DNA, RNA, and protein levels. Among those changes associated with EMT, pathway analysis revealed a strong correlation between EMT and immune activation. Further supervised analysis demonstrated high expression of immune checkpoints and other druggable immune targets, such as PD1, PD-L1, CTLA4, OX40L, and PD-L2, in tumors with the most mesenchymal EMT scores. Elevated PD-L1 protein expression in mesenchymal tumors was confirmed by IHC in an independent lung cancer cohort.
This new signature provides a novel, patient-based, histology-independent tool for the investigation of EMT and offers insights into potential novel therapeutic targets for mesenchymal tumors, independent of cancer type, including immune checkpoints.
In the intensive care unit (ICU), delirium is a common, acute, confusional state associated with high risk for short- and long-term morbidity and mortality. Machine learning (ML) has promise to ...address research priorities and improve delirium outcomes. However, due to clinical and billing conventions, delirium is often inconsistently or incompletely labeled in electronic health record (EHR) datasets. Here, we identify clinical actions abstracted from clinical guidelines in electronic health records (EHR) data that indicate risk of delirium among intensive care unit (ICU) patients. We develop a novel prediction model to label patients with delirium based on a large data set and assess model performance.
EHR data on 48,451 admissions from 2001 to 2012, available through Medical Information Mart for Intensive Care-III database (MIMIC-III), was used to identify features to develop our prediction models. Five binary ML classification models (Logistic Regression; Classification and Regression Trees; Random Forests; Naïve Bayes; and Support Vector Machines) were fit and ranked by Area Under the Curve (AUC) scores. We compared our best model with two models previously proposed in the literature for goodness of fit, precision, and through biological validation.
Our best performing model with threshold reclassification for predicting delirium was based on a multiple logistic regression using the 31 clinical actions (AUC 0.83). Our model out performed other proposed models by biological validation on clinically meaningful, delirium-associated outcomes.
Hurdles in identifying accurate labels in large-scale datasets limit clinical applications of ML in delirium. We developed a novel labeling model for delirium in the ICU using a large, public data set. By using guideline-directed clinical actions independent from risk factors, treatments, and outcomes as model predictors, our classifier could be used as a delirium label for future clinically targeted models.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Human papillomavirus (HPV) is a necessary but insufficient cause of a subset of oral squamous cell carcinomas (OSCCs) that is increasing markedly in frequency. To identify contributory, secondary ...genetic alterations in these cancers, we used comprehensive genomics methods to compare 149 HPV-positive and 335 HPV-negative OSCC tumor/normal pairs. Different behavioral risk factors underlying the two OSCC types were reflected in distinctive genomic mutational signatures. In HPV-positive OSCCs, the signatures of APOBEC cytosine deaminase editing, associated with anti-viral immunity, were strongly linked to overall mutational burden. In contrast, in HPV-negative OSCCs, T>C substitutions in the sequence context 5'-ATN-3' correlated with tobacco exposure. Universal expression of HPV
and
oncogenes was a sine qua non of HPV-positive OSCCs. Significant enrichment of somatic mutations was confirmed or newly identified in
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, and
Of these, many affect host pathways already targeted by HPV oncoproteins, including the p53 and pRB pathways, or disrupt host defenses against viral infections, including interferon (IFN) and nuclear factor kappa B signaling. Frequent copy number changes were associated with concordant changes in gene expression. Chr 11q (including
) and 14q (including
and
) were recurrently lost in HPV-positive OSCCs, in contrast to their gains in HPV-negative OSCCs. High-ranking variant allele fractions implicated
,
, and
mutations as candidate driver events in HPV-positive cancers. We conclude that virus-host interactions cooperatively shape the unique genetic features of these cancers, distinguishing them from their HPV-negative counterparts.
Abstract
Summary
Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize ...high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics.
Availabilityand implementation
Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).
There are as yet no licensed therapeutics for the COVID-19 pandemic. The causal coronavirus (SARS-CoV-2) binds host cells via a trimeric spike whose receptor binding domain (RBD) recognizes ...angiotensin-converting enzyme 2, initiating conformational changes that drive membrane fusion. We find that the monoclonal antibody CR3022 binds the RBD tightly, neutralizing SARS-CoV-2, and report the crystal structure at 2.4 Å of the Fab/RBD complex. Some crystals are suitable for screening for entry-blocking inhibitors. The highly conserved, structure-stabilizing CR3022 epitope is inaccessible in the prefusion spike, suggesting that CR3022 binding facilitates conversion to the fusion-incompetent post-fusion state. Cryogenic electron microscopy (cryo-EM) analysis confirms that incubation of spike with CR3022 Fab leads to destruction of the prefusion trimer. Presentation of this cryptic epitope in an RBD-based vaccine might advantageously focus immune responses. Binders at this epitope could be useful therapeutically, possibly in synergy with an antibody that blocks receptor attachment.
Display omitted
•CR3022 binds the RBD of SARS-CoV-2 and shows strong neutralization•Neutralization is by destroying the prefusion spike conformation•CR3022 binds a highly conserved epitope that is inaccessible in prefusion spike protein•CR3022 could have therapeutic potential alone or in synergy with a receptor blocker
Huo et al. find that the antibody CR3022 binds tightly to the receptor binding domain of the SARS-CoV-2 spike at a site different to that used by the receptor. CR3022 effectively neutralizes the virus, and cryo-EM reveals that it disrupts the spike. Such antibodies could have potential as COVID-19 therapeutics.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Cleft Lip and Palate Transmembrane Protein 1-Like (CLPTM1L), resides in a region of chromosome 5 for which copy number gain has been found to be the most frequent genetic event in the early stages of ...non-small cell lung cancer (NSCLC). This locus has been found by multiple genome wide association studies to be associated with lung cancer in both smokers and non-smokers. CLPTM1L has been identified as an overexpressed protein in human ovarian tumor cell lines that are resistant to cisplatin, which is the only insight thus far into the function of CLPTM1L. Here we find CLPTM1L expression to be increased in lung adenocarcinomas compared to matched normal lung tissues and in lung tumor cell lines by mechanisms not exclusive to copy number gain. Upon loss of CLPTM1L accumulation in lung tumor cells, cisplatin and camptothecin induced apoptosis were increased in direct proportion to the level of CLPTM1L knockdown. Bcl-xL accumulation was significantly decreased upon loss of CLPTM1L. Expression of exogenous Bcl-xL abolished sensitization to apoptotic killing with CLPTM1L knockdown. These results demonstrate that CLPTM1L, an overexpressed protein in lung tumor cells, protects from genotoxic stress induced apoptosis through regulation of Bcl-xL. Thus, this study implicates anti-apoptotic CLPTM1L function as a potential mechanism of susceptibility to lung tumorigenesis and resistance to chemotherapy.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Studying mechanisms of malignant transformation of human pre-B cells, we found that acute activation of oncogenes induced immediate cell death in the vast majority of cells. Few surviving pre-B cell ...clones had acquired permissiveness to oncogenic signaling by strong activation of negative feedback regulation of Erk signaling. Studying negative feedback regulation of Erk in genetic experiments at three different levels, we found that Spry2, Dusp6, and Etv5 were essential for oncogenic transformation in mouse models for pre-B acute lymphoblastic leukemia (ALL). Interestingly, a small molecule inhibitor of DUSP6 selectively induced cell death in patient-derived pre-B ALL cells and overcame conventional mechanisms of drug-resistance.
•Robust negative regulation of Erk enables transformation of pre-B cells•High Erk feedback activity predicts poor clinical outcome of patients with ALL•Deletion of Erk feedback genes protects against pre-B cell transformation•Small molecule inhibition of the Erk-phosphatase DUSP6 kills patient ALL cells
Shojaee et al. show that successful transformation of pre-B cells to pre-B acute lymphoblastic leukemia (ALL) requires negative feedback regulation of Erk signaling and inhibiting this feedback selectively kills pre-B ALL cells, suggesting negative feedback regulation of oncogenes as a vulnerability in human ALL.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on ...the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed. Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We present a novel model of time‐series analysis to learn from electronic health record (EHR) data when infection occurred in the intensive care unit (ICU) by translating methods from proteomics and ...Bayesian statistics. Using 48,536 patients hospitalized in an ICU, we describe each hospital course as an ‘alphabet’ of 23 physician actions (‘events’) in temporal order. We analyze these as k‐mers of length 3–12 events and apply a Bayesian model of (cumulative) relative risk (RR). The log2‐transformed RR (median=0.248, mean=0.226) supported the conclusion that the events selected were individually associated with increased risk of infection. Selecting from all possible cutoffs of maximum gain (MG), MG>0.0244 predicts administration of antibiotics with PPV 82.0 %, NPV 44.4 %, and AUC 0.706. Our approach holds value for retrospective analysis of other clinical syndromes for which time‐of‐onset is critical to analysis but poorly marked in EHRs, including delirium and decompensation.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes ...by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers.
In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database.
Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK