Although the majority of colorectal cancers exhibit chromosome instability (CIN), only a few genes that might cause this phenotype have been identified and no general mechanism underlying their ...function has emerged. To systematically identify somatic mutations in potential CIN genes in colorectal cancers, we determined the sequence of 102 human homologues of 96 yeast CIN genes known to function in various aspects of chromosome transmission fidelity. We identified 11 somatic mutations distributed among five genes in a panel that included 132 colorectal cancers. Remarkably, all but one of these 11 mutations were in the homologs of yeast genes that regulate sister chromatid cohesion. We then demonstrated that down-regulation of such homologs resulted in chromosomal instability and chromatid cohesion defects in human cells. Finally, we showed that down-regulation or genetic disruption of the two major candidate CIN genes identified in previous studies (MRE11A and CDC4) also resulted in abnormal sister chromatid cohesion in human cells. These results suggest that defective sister chromatid cohesion as a result of somatic mutations may represent a major cause of chromosome instability in human cancers.
Identifying families with an underlying inherited cancer predisposition is a major goal of cancer prevention efforts. Mendelian risk models have been developed to better predict the risk associated ...with a pathogenic variant of developing breast/ovarian cancer (with BRCAPRO) and the risk of developing pancreatic cancer (PANCPRO). Given that pathogenic variants involving BRCA2 and BRCA1 predispose to all three of these cancers, we developed a joint risk model to capture shared susceptibility.
We expanded the existing framework for PANCPRO and BRCAPRO to jointly model risk of pancreatic, breast, and ovarian cancer and validated this new model, BRCAPANCPRO on three data sets each reflecting the common target populations.
BRCAPANCPRO outperformed the prior BRCAPRO and PANCPRO models and yielded good discrimination for differentiating BRCA1 and BRCA2 carriers from non-carriers (AUCs 0.79, 95% CI: 0.73-0.84 and 0.70, 95% CI: 0.60-0.80) in families seen in high-risk clinics and pancreatic cancer family registries, respectively. In addition, BRCAPANCPRO was reasonably well calibrated for predicting future risk of pancreatic cancer (observed-to-expected (O/E) ratio = 0.81 0.69, 0.94).
The BRCAPANCPRO model provides improved risk assessment over our previous risk models, particularly for pedigrees with a co-occurrence of pancreatic cancer and breast and/or ovarian cancer.
Previous approaches to defining subtypes of colorectal carcinoma (CRC) and other cancers based on transcriptomes have assumed the existence of discrete subtypes. We analyze gene expression patterns ...of colorectal tumors from a large number of patients to test this assumption and propose an approach to identify potentially a continuum of subtypes that are present across independent studies and cohorts.
We examine the assumption of discrete CRC subtypes by integrating 18 published gene expression datasets and > 3700 patients, and contrary to previous reports, find no evidence to support the existence of discrete transcriptional subtypes. Using a meta-analysis approach to identify co-expression patterns present in multiple datasets, we identify and define robust, continuously varying subtype scores to represent CRC transcriptomes. The subtype scores are consistent with established subtypes (including microsatellite instability and previously proposed discrete transcriptome subtypes), but better represent overall transcriptional activity than do discrete subtypes. The scores are also better predictors of tumor location, stage, grade, and times of disease-free survival than discrete subtypes. Gene set enrichment analysis reveals that the subtype scores characterize T-cell function, inflammation response, and cyclin-dependent kinase regulation of DNA replication.
We find no evidence to support discrete subtypes of the CRC transcriptome and instead propose two validated scores to better characterize a continuity of CRC transcriptomes.
Published reports suggest that DNA microarrays identify clinically meaningful subtypes of lung adenocarcinomas not recognizable by other routine tests. This report is an investigation of the ...reproducibility of the reported tumor subtypes.
Three independent cohorts of patients with lung cancer were evaluated using a variety of DNA microarray assays. Using the integrative correlations method, a subset of genes was selected, the reliability of which was acceptable across the different DNA microarray platforms. Tumor subtypes were selected using consensus clustering and genes distinguishing subtypes were identified using the weighted difference statistic. Gene lists were compared across cohorts using centroids and gene set enrichment analysis.
Cohorts of 31, 72, and 128 adenocarcinomas were generated for a total of 231 microarrays, each with 2,553 reliable genes. Three adenocarcinoma subtypes were identified in each cohort. These were named bronchioid, squamoid, and magnoid according to their respective correlations with gene expression patterns from histologically defined bronchioalveolar carcinoma, squamous cell carcinoma, and large-cell carcinoma. Tumor subtypes were distinguishable by many hundreds of genes, and lists generated in one cohort were predictive of tumor subtypes in the two other cohorts. Tumor subtypes correlated with clinically relevant covariates, including stage-specific survival and metastatic pattern. Most notably, bronchioid tumors were correlated with improved survival in early-stage disease, whereas squamoid tumors were associated with better survival in advanced disease.
DNA microarray analysis of lung adenocarcinomas identified reproducible tumor subtypes which differ significantly in clinically important behaviors such as stage-specific survival.
We have performed a genome-wide analysis of copy number changes in breast and colorectal tumors using approaches that can reliably detect homozygous deletions and amplifications. We found that the ...number of genes altered by major copy number changes, deletion of all copies or amplification to at least 12 copies per cell, averaged 17 per tumor. We have integrated these data with previous mutation analyses of the Reference Sequence genes in these same tumor types and have identified genes and cellular pathways affected by both copy number changes and point alterations. Pathways enriched for genetic alterations included those controlling cell adhesion, intracellular signaling, DNA topological change, and cell cycle control. These analyses provide an integrated view of copy number and sequencing alterations on a genome-wide scale and identify genes and pathways that could prove useful for cancer diagnosis and therapy.
Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Multiple breast cancer risk prediction models are used in clinical ...practice, and often provide a range of different predictions for the same patient. Integrating information from different models may improve the accuracy of predictions, which would be valuable for both clinicians and patients. BRCAPRO is a widely used model that predicts breast cancer risk based on detailed family history information. A major limitation of this model is that it does not consider non-genetic risk factors. To address this limitation, we expand BRCAPRO by combining it with another popular existing model, BCRAT (i.e., Gail), which uses a largely complementary set of risk factors, most of them non-genetic. We consider two approaches for combining BRCAPRO and BCRAT: (1) modifying the penetrance (age-specific probability of developing cancer given genotype) functions in BRCAPRO using relative hazard estimates from BCRAT, and (2) training an ensemble model that takes BRCAPRO and BCRAT predictions as input. Using both simulated data and data from Newton-Wellesley Hospital and the Cancer Genetics Network, we show that the combination models are able to achieve performance gains over both BRCAPRO and BCRAT. In the Cancer Genetics Network cohort, we show that the proposed BRCAPRO + BCRAT penetrance modification model performs comparably to IBIS, an existing model that combines detailed family history with non-genetic risk factors.
This article introduces a manually curated data collection for gene expression meta-analysis of patients with ovarian cancer and software for reproducible preparation of similar databases. This ...resource provides uniformly prepared microarray data for 2970 patients from 23 studies with curated and documented clinical metadata. It allows users to efficiently identify studies and patient subgroups of interest for analysis and to perform meta-analysis immediately without the challenges posed by harmonizing heterogeneous microarray technologies, study designs, expression data processing methods and clinical data formats. We confirm that the recently proposed biomarker CXCL12 is associated with patient survival, independently of stage and optimal surgical debulking, which was possible only through meta-analysis owing to insufficient sample sizes of the individual studies. The database is implemented as the curatedOvarianData Bioconductor package for the R statistical computing language, providing a comprehensive and flexible resource for clinically oriented investigation of the ovarian cancer transcriptome. The package and pipeline for producing it are available from http://bcb.dfci.harvard.edu/ovariancancer.
The rapid fatality of pancreatic cancer is, in large part, the result of an advanced stage of diagnosis for the majority of patients. Identification of individuals at high risk of developing ...pancreatic cancer is a first step towards the early detection of this disease. Individuals who may harbor a major pancreatic cancer susceptibility gene are one such high-risk group. The goal of this study was to develop and validate PancPRO, a Mendelian model for pancreatic cancer risk prediction in individuals with familial pancreatic cancer, to identify high-risk individuals.
PancPRO was built by extending the Bayesian modeling framework developed for BRCAPRO, trained using published data, and validated using independent prospective data on 961 families enrolled onto the National Familial Pancreas Tumor Registry, including 26 individuals who developed incident pancreatic cancer during follow-up.
We developed a risk prediction model, PancPRO, and free software for the estimation of pancreatic cancer susceptibility gene carrier probabilities and absolute pancreatic cancer risk. Model validation demonstrated an observed to predicted pancreatic cancer ratio of 0.83 (95% CI, 0.52 to 1.20) and high discriminatory ability, with an area under the receiver operating characteristic curve of 0.75 (95% CI, 0.68 to 0.81) for PancPRO.
PancPRO is the first risk prediction model for pancreatic cancer. When we validated our model using the largest registry of familial pancreatic cancer, our model provided accurate risk assessment. Our findings highlight the importance of detailed family history for clinical cancer risk assessment and demonstrate that accurate genetic risk assessment is possible even when the causative genes are not known.
This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on ...different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.
Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of ...error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction software with pre-processing methods that have become the standard for normalization of cDNA arrays. These include log transformation followed by loess normalization with or without background subtraction and often a between array scale normalization procedure. The larger goal is to define best study design and pre-processing practices for Agilent arrays, and we offer some suggestions.
Simple loess normalization without background subtraction produced the lowest variability. However, without background subtraction, fold changes were biased towards zero, particularly at low intensities. ROC analysis of a spike-in experiment showed that differentially expressed genes are most reliably detected when background is not subtracted. Loess normalization and no background subtraction yielded an AUC of 99.7% compared with 88.8% for Agilent processed fold changes. All methods performed well when error was taken into account by t- or z-statistics, AUCs > or = 99.8%. A substantial proportion of genes showed dye effects, 43% (99% CI: 39%, 47%). However, these effects were generally small regardless of the pre-processing method.
Simple loess normalization without background subtraction resulted in low variance fold changes that more reliably ranked gene expression than the other methods. While t-statistics and other measures that take variation into account, including Agilent's z-statistic, can also be used to reliably select differentially expressed genes, fold changes are a standard measure of differential expression for exploratory work, cross platform comparison, and biological interpretation and can not be entirely replaced. Although dye effects are small for most genes, many array features are affected. Therefore, an experimental design that incorporates dye swaps or a common reference could be valuable.