We develop a general class of response-adaptive Bayesian designs using hierarchical models, and provide open source software to implement them. Our work is motivated by recent master protocols in ...oncology, where several treatments are investigated simultaneously in one or multiple disease types, and treatment efficacy is expected to vary across biomarkerdefined subpopulations. Adaptive trials such as I-SPY-2 (Barker et al., 2009) and BATTLE (Zhou et al., 2008) are special cases within our framework. We discuss the application of our adaptive scheme to two distinct research goals. The first is to identify a biomarker subpopulation for which a therapy shows evidence of treatment efficacy, and to exclude other subpopulations for which such evidence does not exist. This leads to a subpopulation-finding design. The second is to identify, within biomarkerdefined subpopulations, a set of cancer types for which an experimental therapy is superior to the standard-of-care. This goal leads to a subpopulat ion-stratified design. Using simulations constructed to faithfully represent ongoing cancer sequencing projects, we quantify the potential gains of our proposed designs relative to conventional non-adaptive designs.
Glioblastoma multiforme (GBM) is the most common and lethal type of brain cancer. To identify the genetic alterations in GBMs, we sequenced 20,661 protein coding genes, determined the presence of ...amplifications and deletions using high-density oligonucleotide arrays, and performed gene expression analyses using next-generation sequencing technologies in 22 human tumor samples. This comprehensive analysis led to the discovery of a variety of genes that were not known to be altered in GBMs. Most notably, we found recurrent mutations in the active site of isocitrate dehydrogenase 1 (IDH1) in 12% of GBM patients. Mutations in IDH1 occurred in a large fraction of young patients and in most patients with secondary GBMs and were associated with an increase in overall survival. These studies demonstrate the value of unbiased genomic analyses in the characterization of human brain cancer and identify a potentially useful genetic alteration for the classification and targeted therapy of GBMs.
Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they ...generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context.
We develop and implement a systematic approach to 'cross-study validation', to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation.
Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation.
The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor.
We show that the times separating the birth of benign, invasive, and metastatic tumor cells can be determined by analysis of the mutations they have in common. When combined with prior clinical ...observations, these analyses suggest the following general conclusions about colorectal tumorigenesis: (i) It takes almost equal to17 years for a large benign tumor to evolve into an advanced cancer but <2 years for cells within that cancer to acquire the ability to metastasize; (ii) it requires few, if any, selective events to transform a highly invasive cancer cell into one with the capacity to metastasize; (iii) the process of cell culture ex vivo does not introduce new clonal mutations into colorectal tumor cell populations; and (iv) the rates at which point mutations develop in advanced cancers are similar to those of normal cells. These results have important implications for understanding human tumor pathogenesis, particularly those associated with metastasis.
Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors and it is ...important to have simple and reliable approaches for monitoring it at the level of individual experiments.
We developed a fast, scalable and accurate approach to estimating error rates in short reads, which has the added advantage of not requiring a reference genome. We build on the fundamental observation that there is a linear relationship between the copy number for a given read and the number of erroneous reads that differ from the read of interest by one or two bases. The slope of this relationship can be transformed to give an estimate of the error rate, both by read and by position. We present simulation studies as well as analyses of real data sets illustrating the precision and accuracy of this method, and we show that it is more accurate than alternatives that count the difference between the sample of interest and a reference genome. We show how this methodology led to the detection of mutations in the genome of the PhiX strain used for calibration of Illumina data. The proposed method is implemented in an R package, which can be downloaded from http://bcb.dfci.harvard.edu/∼vwang/shadowRegression.html.
The proposed method can be used to monitor the quality of sequencing pipelines at the level of individual experiments without the use of reference genomes. Furthermore, having an estimate of the error rates gives one the opportunity to improve analyses and inferences in many applications of next-generation sequencing data.
Multiple myeloma (MM) is accompanied by heterogeneous somatic alterations. The overall goal of this study was to describe the genomic landscape of myeloma using deep whole-genome sequencing (WGS) and ...develop a model that identifies patients with long survival.
We analyzed deep WGS data from 183 newly diagnosed patients with MM treated with lenalidomide, bortezomib, and dexamethasone (RVD) alone or RVD + autologous stem cell transplant (ASCT) in the IFM/DFCI 2009 study (ClinicalTrials.gov identifier: NCT01191060). We integrated genomic markers with clinical data.
We report significant variability in mutational load and processes within MM subgroups. The timeline of observed activation of mutational processes provides the basis for 2 distinct models of acquisition of mutational changes detected at the time of diagnosis of myeloma. Virtually all MM subgroups have activated DNA repair-associated signature as a prominent late mutational process, whereas APOBEC signature targeting C>G is activated in the intermediate phase of disease progression in high-risk MM. Importantly, we identify a genomically defined MM subgroup (17% of newly diagnosed patients) with low DNA damage (low genomic scar score with chromosome 9 gain) and a superior outcome (100% overall survival at 69 months), which was validated in a large independent cohort. This subgroup allowed us to distinguish patients with low- and high-risk hyperdiploid MM and identify patients with prolongation of progression-free survival. Genomic characteristics of this subgroup included lower mutational load with significant contribution from age-related mutations as well as frequent
mutation. Surprisingly, their overall survival was independent of International Staging System and minimal residual disease status.
This is a comprehensive study identifying genomic markers of a good-risk group with prolonged survival. Identification of this patient subgroup will affect future therapeutic algorithms and research planning.
The ribonuclease DIS3 is one of the most frequently mutated genes in the hematological cancer multiple myeloma, yet the basis of its tumor suppressor function in this disease remains unclear. Herein, ...exploiting the TCGA dataset, we found that DIS3 plays a prominent role in the DNA damage response. DIS3 inactivation causes genomic instability by increasing mutational load, and a pervasive accumulation of DNA:RNA hybrids that induces genomic DNA double‐strand breaks (DSBs). DNA:RNA hybrid accumulation also prevents binding of the homologous recombination (HR) machinery to double‐strand breaks, hampering DSB repair. DIS3‐inactivated cells become sensitive to PARP inhibitors, suggestive of a defect in homologous recombination repair. Accordingly, multiple myeloma patient cells mutated for DIS3 harbor an increased mutational burden and a pervasive overexpression of pro‐inflammatory interferon, correlating with the accumulation of DNA:RNA hybrids. We propose DIS3 loss in myeloma to be a driving force for tumorigenesis via DNA:RNA hybrid‐dependent enhanced genome instability and increased mutational rate. At the same time, DIS3 loss represents a liability that might be therapeutically exploited in patients whose cancer cells harbor DIS3 mutations.
Synopsis
The ribonuclease DIS3 is frequently mutated in the blood cancer multiple myeloma. Here, DIS3 inactivation is found to cause accumulation of DNA:RNA hybrids, as well as to increases interferon responses and reduce homologous recombination.
DIS3 loss triggers a genome‐wide increase in DNA:RNA hybrids, which in turn leads to DNA fragmentation and genomic instability.
Hybrids accumulation at the sites of DNA damage prevents BRCA1 binding to DNA, impairing homologous recombination‐based DNA repair.
DIS3 loss is associated with increased mutational rate both in vitro and in patient samples with DIS3 mutations.
Myeloma cells derived from patients presenting DIS3 mutations display an intense interferon response.
DIS3 mutation in hematological cancer causes reduced homologous recombination repair, increased mutational burden, and overactivation of inflammatory interferon responses.
The natural history of multiple myeloma is characterized by its localization to the bone marrow and its interaction with bone marrow stromal cells. The bone marrow stromal cells provide growth and ...survival signals, thereby promoting the development of drug resistance. Here, we show that the interaction between bone marrow stromal cells and myeloma cells (using human cell lines) induces chromatin remodeling of cis-regulatory elements and is associated with changes in the expression of genes involved in the cell migration and cytokine signaling. The expression of genes involved in these stromal interactions are observed in extramedullary disease in patients with myeloma and provides the rationale for survival of myeloma cells outside of the bone marrow microenvironment. Expression of these stromal interaction genes is also observed in a subset of patients with newly diagnosed myeloma and are akin to the transcriptional program of extramedullary disease. The presence of such adverse stromal interactions in newly diagnosed myeloma is associated with accelerated disease dissemination, predicts the early development of therapeutic resistance, and is of independent prognostic significance. These stromal cell induced transcriptomic and epigenomic changes both predict long-term outcomes and identify therapeutic targets in the tumor microenvironment for the development of novel therapeutic approaches.
The successful translation of genomic signatures into clinical settings relies on good discrimination between patient subgroups. Many sophisticated algorithms have been proposed in the statistics and ...machine learning literature, but in practice simpler algorithms are often used. However, few simple algorithms have been formally described or systematically investigated.
We give a precise definition of a popular simple method we refer to as más-o-menos, which calculates prognostic scores for discrimination by summing standardized predictors, weighted by the signs of their marginal associations with the outcome. We study its behavior theoretically, in simulations and in an extensive analysis of 27 independent gene expression studies of bladder, breast and ovarian cancer, altogether totaling 3833 patients with survival outcomes. We find that despite its simplicity, más-o-menos can achieve good discrimination performance. It performs no worse, and sometimes better, than popular and much more CPU-intensive methods for discrimination, including lasso and ridge regression.
Más-o-menos is implemented for survival analysis as an option in the survHD package, available from http://www.bitbucket.org/lwaldron/survhd and submitted to Bioconductor.