Purpose Dysregulated microRNAs are implicated in the pathogenesis and aggressiveness of acute myeloid leukemia (AML). We describe the effect of the hematopoietic stem-cell self-renewal regulating ...miR-193b on progression and prognosis of AML. Methods We profiled miR-193b-5p/3p expression in cytogenetically and clinically characterized de novo pediatric AML (n = 161) via quantitative real-time polymerase chain reaction and validated our findings in an independent cohort of 187 adult patients. We investigated the tumor suppressive function of miR-193b in human AML blasts, patient-derived xenografts, and miR-193b knockout mice in vitro and in vivo. Results miR-193b exerted important, endogenous, tumor-suppressive functions on the hematopoietic system. miR-193b-3p was downregulated in several cytogenetically defined subgroups of pediatric and adult AML, and low expression served as an independent indicator for poor prognosis in pediatric AML (risk ratio ± standard error, -0.56 ± 0.23; P = .016). miR-193b-3p expression improved the prognostic value of the European LeukemiaNet risk-group stratification or a 17-gene leukemic stemness score. In knockout mice, loss of miR-193b cooperated with Hoxa9/Meis1 during leukemogenesis, whereas restoring miR-193b expression impaired leukemic engraftment. Similarly, expression of miR-193b in AML blasts from patients diminished leukemic growth in vitro and in mouse xenografts. Mechanistically, miR-193b induced apoptosis and a G1/S-phase block in various human AML subgroups by targeting multiple factors of the KIT-RAS-RAF-MEK-ERK (MAPK) signaling cascade and the downstream cell cycle regulator CCND1. Conclusion The tumor-suppressive function is independent of patient age or genetics; therefore, restoring miR-193b would assure high antileukemic efficacy by blocking the entire MAPK signaling cascade while preventing the emergence of resistance mechanisms.
IKAROS family zinc finger 1/IKZF1 is a transcription factor important in lymphoid differentiation, and a known tumor suppressor in acute lymphoid leukemia. Recent studies suggest that IKZF1 is also ...involved in myeloid differentiation. To investigate whether IKZF1 deletions also play a role in pediatric acute myeloid leukemia, we screened a panel of pediatric acute myeloid leukemia samples for deletions of the IKZF1 locus using multiplex ligation-dependent probe amplification and for mutations using direct sequencing. Three patients were identified with a single amino acid variant without change of IKZF1 length. No frame-shift mutations were found. Out of 11 patients with an IKZF1 deletion, 8 samples revealed a complete loss of chromosome 7, and 3 cases a focal deletion of 0.1-0.9Mb. These deletions included the complete IKZF1 gene (n=2) or exons 1-4 (n=1), all leading to a loss of IKZF1 function. Interestingly, differentially expressed genes in monosomy 7 cases (n=8) when compared to non-deleted samples (n=247) significantly correlated with gene expression changes in focal IKZF1-deleted cases (n=3). Genes with increased expression included genes involved in myeloid cell self-renewal and cell cycle, and a significant portion of GATA target genes and GATA factors. Together, these results suggest that loss of IKZF1 is recurrent in pediatric acute myeloid leukemia and might be a determinant of oncogenesis in acute myeloid leukemia with monosomy 7.
Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and ...bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.
We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.
Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.
•Principal component analysis (PCA) is one of the powerful dimension reduction techniques widely used in data mining field.•Data usually contaminated by the noise.•Noise in the data has effect in ...computation of PC’s components.•We use regularization method to filter the diffusion of the noise in PC’s.•Experimental results shows the power of the new approach.
Principal component analysis (PCA) is one of the powerful dimension reduction techniques widely used in data mining field. PCA tries to project the data into lower dimensional space while preserving the intrinsic information hidden in the data as much as possible. Disadvantage of PCA is that, extracted principal components (PCs) are linear combination of all features, hence PCs are may still contaminated with noise in the data. To address this problem we propose a modified version of PCA called noise free PCA (NFPCA), in which regularization is introduced during the PCs extraction step to mitigate the effect of noise. Potentials of the proposed method is assessed in two important application of high-dimensional molecular data: classification and survival prediction. Multiple publicly available real-world data sets are used for this illustration. Experimental results show that, the NFPCA produce highly informative than the ordinary PCA method. This is largely due to the fact that the NFPCA suppress the effect of noise in the PCs more efficiently with minimum information lost. The NFPCA is a promising alternative to existing PCA approaches not only in terms of highly informative PCs, but also its relatively cheap computational cost.
Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves ...of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters.
This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct ...classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two high-dimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients’ distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.
Acute megakaryoblastic leukemia (AMKL) is a subtype of acute myeloid leukemia (AML) in which cells morphologically resemble abnormal megakaryoblasts. While rare in adults, AMKL accounts for 4-15% of ...newly diagnosed childhood AML cases. AMKL in individuals without Down syndrome (non-DS-AMKL) is frequently associated with poor clinical outcomes. Previous efforts have identified chimeric oncogenes in a substantial number of non-DS-AMKL cases, including RBM15-MKL1, CBFA2T3-GLIS2, KMT2A gene rearrangements, and NUP98-KDM5A. However, the etiology of 30-40% of cases remains unknown. To better understand the genomic landscape of non-DS-AMKL, we performed RNA and exome sequencing on specimens from 99 patients (75 pediatric and 24 adult). We demonstrate that pediatric non-DS-AMKL is a heterogeneous malignancy that can be divided into seven subgroups with varying outcomes. These subgroups are characterized by chimeric oncogenes with cooperating mutations in epigenetic and kinase signaling genes. Overall, these data shed light on the etiology of AMKL and provide useful information for the tailoring of treatment.
Uveal melanoma (UM) is characterized by multiple chromosomal rearrangements and recurrent mutated genes. The aim of this study was to investigate if copy number variations (CNV) alone and in ...combination with other genetic and clinico-histopathological variables can be used to stratify for disease-free survival (DFS) in enucleated patients with UM.
We analyzed single nucleotide polymorphisms (SNP) array data of primary tumors and other clinical variables of 214 UM patients from the Rotterdam Ocular Melanoma Study (ROMS) cohort. Nonweighted hierarchical clustering of SNP array data was used to identify molecular subclasses with distinct CNV patterns. The subclasses associate with mutational status of BAP1, SF3B1, or EIF1AX. Cox proportional hazard models were then used to study the predictive performance of SNP array cluster-, mutation-, and clinico-histopathological data, and their combination for study endpoint risk.
Five clusters with distinct CNV patterns and concomitant mutations in BAP1, SF3B1, or EIF1AX were identified. The sample's cluster allocation contributed significantly to mutational status of samples in predicting the incidence of metastasis during a median of 45.6 (interquartile range IQR: 24.7-81.8) months of follow-up (P < 0.05) and vice versa. Furthermore, incorporating all data sources in one model yielded a 0.797 C-score during 100 months of follow-up.
UM has distinct CNV patterns that correspond to different mutated driver genes. Incorporating clinico-histopathological, cluster and mutation data in the analysis results in good performance for UM-related DFS prediction.
In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical ...representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information.
We develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches.
The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information. The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip.