We propose an extension to quantile normalization that removes unwanted technical variation using control probes. We adapt our algorithm, functional normalization, to the Illumina 450k methylation ...array and address the open problem of normalizing methylation data with global epigenetic changes, such as human cancers. Using data sets from The Cancer Genome Atlas and a large case-control study, we show that our algorithm outperforms all existing normalization methods with respect to replication of results between experiments, and yields robust results even in the presence of batch effects. Functional normalization can be applied to any microarray platform, provided suitable control probes are available.
Chemotherapy resistance is a critical barrier in cancer treatment. Metabolic adaptations have been shown to fuel therapy resistance; however, little is known regarding the generality of these changes ...and whether specific therapies elicit unique metabolic alterations. Using a combination of metabolomics, transcriptomics, and functional genomics, we show that two anthracyclines, doxorubicin and epirubicin, elicit distinct primary metabolic vulnerabilities in human breast cancer cells. Doxorubicin-resistant cells rely on glutamine to drive oxidative phosphorylation and de novo glutathione synthesis, while epirubicin-resistant cells display markedly increased bioenergetic capacity and mitochondrial ATP production. The dependence on these distinct metabolic adaptations is revealed by the increased sensitivity of doxorubicin-resistant cells and tumor xenografts to buthionine sulfoximine (BSO), a drug that interferes with glutathione synthesis, compared with epirubicin-resistant counterparts that are more sensitive to the biguanide phenformin. Overall, our work reveals that metabolic adaptations can vary with therapeutics and that these metabolic dependencies can be exploited as a targeted approach to treat chemotherapy-resistant breast cancer.
Chromosomal breakage followed by faulty DNA repair leads to gene amplifications and deletions in cancers. However, the mere assessment of the extent of genomic changes, amplifications and deletions ...may reduce the complexity of genomic data observed by array comparative genomic hybridization (array CGH). We present here a novel approach to array CGH data analysis, which focuses on putative breakpoints responsible for rearrangements within the genome.
We performed array comparative genomic hybridization in 29 primary tumors from high risk patients with breast cancer. The specimens were flow sorted according to ploidy to increase tumor cell purity prior to array CGH. We describe the number of chromosomal breaks as well as the patterns of breaks on individual chromosomes in each tumor. There were differences in chromosomal breakage patterns between the 3 clinical subtypes of breast cancers, although the highest density of breaks occurred at chromosome 17 in all subtypes, suggesting a particular proclivity of this chromosome for breaks. We also observed chromothripsis affecting various chromosomes in 41% of high risk breast cancers.
Our results provide a new insight into the genomic complexity of breast cancer. Genomic instability dependent on chromosomal breakage events is not stochastic, targeting some chromosomes clearly more than others. We report a much higher percentage of chromothripsis than described previously in other cancers and this suggests that massive genomic rearrangements occurring in a single catastrophic event may shape many breast cancer genomes.
The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise ...high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.
Many different methods exist to adjust for variability in cell-type mixture proportions when analyzing DNA methylation studies. Here we present the result of an extensive simulation study, built on ...cell-separated DNA methylation profiles from Illumina Infinium 450K methylation data, to compare the performance of eight methods including the most commonly used approaches.
We designed a rich multi-layered simulation containing a set of probes with true associations with either binary or continuous phenotypes, confounding by cell type, variability in means and standard deviations for population parameters, additional variability at the level of an individual cell-type-specific sample, and variability in the mixture proportions across samples. Performance varied quite substantially across methods and simulations. In particular, the number of false positives was sometimes unrealistically high, indicating limited ability to discriminate the true signals from those appearing significant through confounding. Methods that filtered probes had consequently poor power. QQ plots of p values across all tested probes showed that adjustments did not always improve the distribution. The same methods were used to examine associations between smoking and methylation data from a case-control study of colorectal cancer, and we also explored the effect of cell-type adjustments on associations between rheumatoid arthritis cases and controls.
We recommend surrogate variable analysis for cell-type mixture adjustment since performance was stable under all our simulated scenarios.
Osteoporosis is a common disease diagnosed primarily by measurement of bone mineral density (BMD). We undertook a genome-wide association study (GWAS) in 142,487 individuals from the UK Biobank to ...identify loci associated with BMD as estimated by quantitative ultrasound of the heel. We identified 307 conditionally independent single-nucleotide polymorphisms (SNPs) that attained genome-wide significance at 203 loci, explaining approximately 12% of the phenotypic variance. These included 153 previously unreported loci, and several rare variants with large effect sizes. To investigate the underlying mechanisms, we undertook (1) bioinformatic, functional genomic annotation and human osteoblast expression studies; (2) gene-function prediction; (3) skeletal phenotyping of 120 knockout mice with deletions of genes adjacent to lead independent SNPs; and (4) analysis of gene expression in mouse osteoblasts, osteocytes and osteoclasts. The results implicate GPC6 as a novel determinant of BMD, and also identify abnormal skeletal phenotypes in knockout mice associated with a further 100 prioritized genes.
Deleterious copy number variants (CNVs) are identified in up to 20% of individuals with autism. However, levels of autism risk conferred by most rare CNVs remain unknown. The authors recently ...developed statistical models to estimate the effect size on IQ of all CNVs, including undocumented ones. In this study, the authors extended this model to autism susceptibility.
The authors identified CNVs in two autism populations (Simons Simplex Collection and MSSNG) and two unselected populations (IMAGEN and Saguenay Youth Study). Statistical models were used to test nine quantitative variables associated with genes encompassed in CNVs to explain their effects on IQ, autism susceptibility, and behavioral domains.
The "probability of being loss-of-function intolerant" (pLI) best explains the effect of CNVs on IQ and autism risk. Deleting 1 point of pLI decreases IQ by 2.6 points in autism and unselected populations. The effect of duplications on IQ is threefold smaller. Autism susceptibility increases when deleting or duplicating any point of pLI. This is true for individuals with high or low IQ and after removing de novo and known recurrent neuropsychiatric CNVs. When CNV effects on IQ are accounted for, autism susceptibility remains mostly unchanged for duplications but decreases for deletions. Model estimates for autism risk overlap with previously published observations. Deletions and duplications differentially affect social communication, behavior, and phonological memory, whereas both equally affect motor skills.
Autism risk conferred by duplications is less influenced by IQ compared with deletions. The model applied in this study, trained on CNVs encompassing >4,500 genes, suggests highly polygenic properties of gene dosage with respect to autism risk and IQ loss. These models will help to interpret CNVs identified in the clinic.