Heat maps and clustering are used frequently in expression analysis studies for data visualization and quality control. Simple clustering and heat maps can be produced from the “heatmap” function in ...R. However, the “heatmap” function lacks certain functionalities and customizability, preventing it from generating advanced heat maps and dendrograms. To tackle the limitations of the “heatmap” function, we have developed an R package “heatmap3” which significantly improves the original “heatmap” function by adding several more powerful and convenient features. The “heatmap3” package allows users to produce highly customizable state of the art heat maps and dendrograms. The “heatmap3” package is developed based on the “heatmap” function in R, and it is completely compatible with it. The new features of “heatmap3” include highly customizable legends and side annotation, a wider range of color selections, new labeling features which allow users to define multiple layers of phenotype variables, and automatically conducted association tests based on the phenotypes provided. Additional features such as different agglomeration methods for estimating distance between two samples are also added for clustering.
Cisplatin-based adjuvant chemotherapy remains the standard of care for patients with resected stage II or III non-small-cell lung cancer. However, biomarker-informed clinical trials are starting to ...push the management of early-stage lung cancer beyond cytotoxic chemotherapy. This review explores recent and ongoing studies focused on improving cytotoxic chemotherapy and incorporating targeted and immunotherapies in the management of early-stage, resectable lung cancer. Adjuvant osimertinib for patients with
-mutant tumors, preoperative chemoimmunotherapy, and adjuvant immunotherapy could improve outcomes for selected patients with resectable lung cancer, and ongoing or planned studies leveraging biomarkers, immunotherapy, and targeted therapy may further improve survival. We also discuss the unique barriers associated with clinical trials of early-stage lung cancer and the need for innovative trial designs to overcome these challenges.
Using an ORF kinome screen in MCF-7 cells treated with the CDK4/6 inhibitor ribociclib plus fulvestrant, we identified FGFR1 as a mechanism of drug resistance. FGFR1-amplified/ER+ breast cancer cells ...and MCF-7 cells transduced with FGFR1 were resistant to fulvestrant ± ribociclib or palbociclib. This resistance was abrogated by treatment with the FGFR tyrosine kinase inhibitor (TKI) lucitanib. Addition of the FGFR TKI erdafitinib to palbociclib/fulvestrant induced complete responses of FGFR1-amplified/ER+ patient-derived-xenografts. Next generation sequencing of circulating tumor DNA (ctDNA) in 34 patients after progression on CDK4/6 inhibitors identified FGFR1/2 amplification or activating mutations in 14/34 (41%) post-progression specimens. Finally, ctDNA from patients enrolled in MONALEESA-2, the registration trial of ribociclib, showed that patients with FGFR1 amplification exhibited a shorter progression-free survival compared to patients with wild type FGFR1. Thus, we propose breast cancers with FGFR pathway alterations should be considered for trials using combinations of ER, CDK4/6 and FGFR antagonists.
Abstract Typical clustering methods for single-cell and spatial transcriptomics struggle to identify rare cell types, while approaches tailored to detect rare cell types gain this ability at the cost ...of poorer performance for grouping abundant ones. Here, we develop aKNNO to simultaneously identify abundant and rare cell types based on an adaptive k -nearest neighbor graph with optimization. Benchmarking on 38 simulated and 20 single-cell and spatial transcriptomics datasets demonstrates that aKNNO identifies both abundant and rare cell types more accurately than general and specialized methods. Using only gene expression aKNNO maps abundant and rare cells more precisely compared to integrative approaches.
Display omitted
K-means algorithm is the most commonly used simple clustering method. For a large number of high dimensional numerical data, it provides an efficient method for classifying similar ...data into the same cluster. In this study, a tri-level k-means algorithm and a bi-layer k-means algorithm are proposed. The k-means algorithm is vulnerable to outliers and noisy data, and also susceptible to initial cluster centers. The tri-level k-means algorithm can overcome these drawbacks. While the data in a dataset S are often changed, after a period of time the trained cluster centers cannot precisely describe the data in each cluster. The cluster centers hence need to be updated. In this paper, an online machine learning based tri-level k-means algorithm is also provided to solve this problem. When the data in a cluster are significantly different, a cluster center cannot alone precisely describe each datum in the cluster. Noisy data, outliers, and data with quite different values in the same cluster may decrease the performance of pattern matching systems. The bi-layer k-means algorithm can deal with the above problems. Meanwhile, a genetic-based algorithm is provided to derive the fittest parameters used in the tri-level and bi-layer k-means algorithms. Experimental results demonstrate that both algorithms can provide much better accuracy of classification than the traditional k-means algorithm.
Triple-negative breast cancer (TNBC) is a heterogeneous disease that can be classified into distinct molecular subtypes by gene expression profiling. Considered a difficult-to-treat cancer, a ...fraction of TNBC patients benefit significantly from neoadjuvant chemotherapy and have far better overall survival. Outside of BRCA1/2 mutation status, biomarkers do not exist to identify patients most likely to respond to current chemotherapy; and, to date, no FDA-approved targeted therapies are available for TNBC patients. Previously, we developed an approach to identify six molecular subtypes TNBC (TNBCtype), with each subtype displaying unique ontologies and differential response to standard-of-care chemotherapy. Given the complexity of the varying histological landscape of tumor specimens, we used histopathological quantification and laser-capture microdissection to determine that transcripts in the previously described immunomodulatory (IM) and mesenchymal stem-like (MSL) subtypes were contributed from infiltrating lymphocytes and tumor-associated stromal cells, respectively. Therefore, we refined TNBC molecular subtypes from six (TNBCtype) into four (TNBCtype-4) tumor-specific subtypes (BL1, BL2, M and LAR) and demonstrate differences in diagnosis age, grade, local and distant disease progression and histopathology. Using five publicly available, neoadjuvant chemotherapy breast cancer gene expression datasets, we retrospectively evaluated chemotherapy response of over 300 TNBC patients from pretreatment biopsies subtyped using either the intrinsic (PAM50) or TNBCtype approaches. Combined analysis of TNBC patients demonstrated that TNBC subtypes significantly differ in response to similar neoadjuvant chemotherapy with 41% of BL1 patients achieving a pathological complete response compared to 18% for BL2 and 29% for LAR with 95% confidence intervals (CIs; 33, 51, 9, 28, 17, 41, respectively). Collectively, we provide pre-clinical data that could inform clinical trials designed to test the hypothesis that improved outcomes can be achieved for TNBC patients, if selection and combination of existing chemotherapies is directed by knowledge of molecular TNBC subtypes.
Triple-negative breast cancer (TNBC) is a highly diverse group of cancers, and subtyping is necessary to better identify molecular-based therapies. In this study, we analyzed gene expression (GE) ...profiles from 21 breast cancer data sets and identified 587 TNBC cases. Cluster analysis identified 6 TNBC subtypes displaying unique GE and ontologies, including 2 basal-like (BL1 and BL2), an immunomodulatory (IM), a mesenchymal (M), a mesenchymal stem-like (MSL), and a luminal androgen receptor (LAR) subtype. Further, GE analysis allowed us to identify TNBC cell line models representative of these subtypes. Predicted "driver" signaling pathways were pharmacologically targeted in these cell line models as proof of concept that analysis of distinct GE signatures can inform therapy selection. BL1 and BL2 subtypes had higher expression of cell cycle and DNA damage response genes, and representative cell lines preferentially responded to cisplatin. M and MSL subtypes were enriched in GE for epithelial-mesenchymal transition, and growth factor pathways and cell models responded to NVP-BEZ235 (a PI3K/mTOR inhibitor) and dasatinib (an abl/src inhibitor). The LAR subtype includes patients with decreased relapse-free survival and was characterized by androgen receptor (AR) signaling. LAR cell lines were uniquely sensitive to bicalutamide (an AR antagonist). These data may be useful in biomarker selection, drug discovery, and clinical trial design that will enable alignment of TNBC patients to appropriate targeted therapies.
Abstract
Identifying binding targets of RNA-binding proteins (RBPs) can greatly facilitate our understanding of their functional mechanisms. Most computational methods employ machine learning to ...train classifiers on either RBP-specific targets or pooled RBP-RNA interactions. The former strategy is more powerful, but it only applies to a few RBPs with a large number of known targets; conversely, the latter strategy sacrifices prediction accuracy for a wider application, since specific interaction features are inevitably obscured through pooling heterogeneous datasets. Here, we present beRBP, a dual approach to predict human RBP-RNA interaction given PWM of a RBP and one RNA sequence. Based on Random Forests, beRBP not only builds a specific model for each RBP with a decent number of known targets, but also develops a general model for RBPs with limited or null known targets. The specific and general models both compared well with existing methods on three benchmark datasets. Notably, the general model achieved a better performance than existing methods on most novel RBPs. Overall, as a composite solution overarching the RBP-specific and RBP-General strategies, beRBP is a promising tool for human RBP binding estimation with good prediction accuracy and a broad application scope.
•We develop a computer aided system to detect abnormal mammograms.•We extract only 5 features of intensity and gradient for mass detection.•Principal component analysis is applied to determine the ...feature weights.•The abnormality detection classifier by feature weight adjustments is proposed.•We evaluate our method upon 2 different datasets.
This paper proposes a detection method for abnormal mammograms in mammographic datasets based on the novel abnormality detection classifier (ADC) by extracting a few of discriminative features, first-order statistical intensities and gradients. As tumorous masses are often indistinguishable from the surrounding parenchyma, automatic mass detection on highly complex breast tissues has been a challenge. However, most tumor detection methods require extraction of a large number of textural features for further multiple computations. The study first investigates image preprocessing techniques for obtaining more accurate breast segmentation prior to mass detection, including global equalization transformation, denoising, binarization, breast orientation determination and the pectoral muscle suppression. After performing gray level quantization on the breast images segmented, the presented feature difference matrices could be created by five features extracted from a suspicious region of interest (ROI); subsequently, principal component analysis (PCA) is applied to aid the determination of feature weights. The experimental results show that applying the algorithm of ADC accompanied with the feature weight adjustments to detect abnormal mammograms has yielded prominent sensitivities of 88% and 86% on the two respective datasets. Comparing other automated mass detection systems, this study proposes a new method for fully developing a high-performance, computer-aided decision (CAD) system that can automatically detect abnormal mammograms in screening programs, especially when an entire database is tested.
Abstract
Recent studies have shown that disease-susceptibility variants frequently lie in cell-type-specific enhancer elements. To identify, interpret, and prioritize such risk variants, we must ...identify the enhancers active in disease-relevant cell types, their upstream transcription factor (TF) binding, and their downstream target genes. To address this need, we built HACER (http://bioinfo.vanderbilt.edu/AE/HACER/), an atlas of Human ACtive Enhancers to interpret Regulatory variants. The HACER atlas catalogues and annotates in-vivo transcribed cell-type-specific enhancers, as well as placing enhancers within transcriptional regulatory networks by integrating ENCODE TF ChIP-Seq and predicted/validated chromatin interaction data. We demonstrate the utility of HACER in (i) offering a mechanistic hypothesis to explain the association of SNP rs614367 with ER-positive breast cancer risk, (ii) exploring tumor-specific enhancers in selective MYC dysregulation and (iii) prioritizing/annotating non-coding regulatory regions targeting CCND1. HACER provides a valuable resource for studies of GWAS, non-coding variants, and enhancer-mediated regulation.