We propose an extension to quantile normalization that removes unwanted technical variation using control probes. We adapt our algorithm, functional normalization, to the Illumina 450k methylation ...array and address the open problem of normalizing methylation data with global epigenetic changes, such as human cancers. Using data sets from The Cancer Genome Atlas and a large case-control study, we show that our algorithm outperforms all existing normalization methods with respect to replication of results between experiments, and yields robust results even in the presence of batch effects. Functional normalization can be applied to any microarray platform, provided suitable control probes are available.
DNA methylation, an important type of epigenetic modification in humans, participates in crucial cellular processes, such as embryonic development, X-inactivation, genomic imprinting and chromosome ...stability. Several platforms have been developed to study genome-wide DNA methylation. Many investigators in the field have chosen the Illumina Infinium HumanMethylation microarray for its ability to reliably assess DNA methylation following sodium bisulfite conversion. Here, we analyzed methylation profiles of 489 adult males and 357 adult females generated by the Infinium HumanMethylation450 microarray. Among the autosomal CpG sites that displayed significant methylation differences between the two sexes, we observed a significant enrichment of cross-reactive probes co-hybridizing to the sex chromosomes with more than 94% sequence identity. This could lead investigators to mistakenly infer the existence of significant autosomal sex-associated methylation. Using sequence identity cutoffs derived from the sex methylation analysis, we concluded that 6% of the array probes can potentially generate spurious signals because of co-hybridization to alternate genomic sequences highly homologous to the intended targets. Additionally, we discovered probes targeting polymorphic CpGs that overlapped SNPs. The methylation levels detected by these probes are simply the reflection of underlying genetic polymorphisms but could be misinterpreted as true signals. The existence of probes that are cross-reactive or of target polymorphic CpGs in the Illumina HumanMethylation microarrays can confound data obtained from such microarrays. Therefore, investigators should exercise caution when significant biological associations are found using these array platforms. A list of all cross-reactive probes and polymorphic CpGs identified by us are annotated in this paper.
To evaluate the impact of complement factor H (CFH) and age-related maculopathy susceptibility 2 (ARMS2) risk alleles on the observed response to components of the Age-Related Eye Disease Study ...(AREDS) formulation.
Genetic and statistical subgroup analysis of a randomized, prospective clinical trial.
White patients from the AREDS with category 3 or 4 age-related macular degeneration (AMD) with available DNA (n = 989).
Four genotype groups based on CFH and ARMS2 risk allele number were defined. Progression to advanced AMD was analyzed by genotype and treatment using Cox proportionate hazards estimates and 7-year events.
The effect of predefined genotype group on treatment-specific progression to advanced AMD.
Patients with 2 CFH risk alleles and no ARMS2 risk alleles progressed more with zinc-containing treatment compared with placebo, with a hazard ratio (HR) of 3.07 (P = 0.0196) for zinc and 2.73 (P = 0.0418) for AREDS formulation (AF). Seven-year treatment-specific progression rates were: placebo, 17.0%; zinc, 43.2% (P = 0.023); and AF, 40.2% (P = 0.039). Patients with 0 or 1 CFH risk alleles and 1 or 2 ARMS2 risk alleles benefited from zinc-containing treatment compared with placebo, with an HR of 0.514 for zinc (P = 0.012) and 0.569 for AF (P = 0.0254). Seven-year treatment-specific AMD progression rates were as follows: placebo, 43.3%; zinc, 25.2% (P = 0.020); and AF, 27.3% (P = 0.011). Zinc and AF treatment each interacted statistically with these 2 genotype groups under a Cox model, with P values of 0.000999 and 0.00366, respectively. For patients with 0 or 1 CFH risk alleles and no ARMS2 risk alleles, neither zinc-containing treatment altered progression compared with placebo, but treatment with antioxidants decreased progression (HR, 0.380; P = 0.034). Seven-year progression with placebo was 22.6% and with antioxidants was 9.17% (P = 0.033). For patients with 2 CFH risk alleles and 1 or 2 ARMS2 risk alleles, no treatment was better than placebo (48.4%).
The benefit of the AREDS formulation seems the result of a favorable response by patients in only 1 genotype group, balanced by neutral or unfavorable responses in 3 genotype groups.
The interplay between genetic and epigenetic variation is only partially understood. One form of epigenetic variation is methylation at CpG sites, which can be measured as methylation quantitative ...trait loci (meQTL). Here we report that in a panel of lymphocytes from 1,748 individuals, methylation levels at 1,919 CpG sites are correlated with at least one distal (trans) single-nucleotide polymorphism (SNP) (P<3.2 × 10(-13); FDR<5%). These trans-meQTLs include 1,657 SNP-CpG pairs from different chromosomes and 262 pairs from the same chromosome that are >1 Mb apart. Over 90% of these pairs are replicated (FDR<5%) in at least one of two independent data sets. Genomic loci harbouring trans-meQTLs are significantly enriched (P<0.001) for long non-coding transcripts (2.2-fold), known epigenetic regulators (2.3-fold), piwi-interacting RNA clusters (3.6-fold) and curated transcription factors (4.1-fold), including zinc-finger proteins (8.75-fold). Long-range epigenetic networks uncovered by this approach may be relevant to normal and disease states.
In a genome-wide association study to identify loci associated with colorectal cancer (CRC) risk, we genotyped 555,510 SNPs in 1,012 early-onset Scottish CRC cases and 1,012 controls (phase 1). In ...phase 2, we genotyped the 15,008 highest-ranked SNPs in 2,057 Scottish cases and 2,111 controls. We then genotyped the five highest-ranked SNPs from the joint phase 1 and 2 analysis in 14,500 cases and 13,294 controls from seven populations, and identified a previously unreported association, rs3802842 on 11q23 (OR = 1.1; P = 5.8 × 10−10), showing population differences in risk. We also replicated and fine-mapped associations at 8q24 (rs7014346; OR = 1.19; P = 8.6 × 10−26) and 18q21 (rs4939827; OR = 1.2; P = 7.8 × 10−28). Risk was greater for rectal than for colon cancer for rs3802842 (P < 0.008) and rs4939827 (P < 0.009). Carrying all six possible risk alleles yielded OR = 2.6 (95% CI = 1.75-3.89) for CRC. These findings extend our understanding of the role of common genetic variation in CRC etiology.
Using a multistage genetic association approach comprising 7,480 affected individuals and 7,779 controls, we identified markers in chromosomal region 8q24 associated with colorectal cancer. In stage ...1, we genotyped 99,632 SNPs in 1,257 affected individuals and 1,336 controls from Ontario. In stages 2-4, we performed serial replication studies using 4,024 affected individuals and 4,042 controls from Seattle, Newfoundland and Scotland. We identified one locus on chromosome 8q24 and another on 9p24 having combined odds ratios (OR) for stages 1-4 of 1.18 (trend; P = 1.41 × 10−8) and 1.14 (trend; P = 1.32 × 10−5), respectively. Additional analyses in 2,199 affected individuals and 2,401 controls from France and Europe supported the association at the 8q24 locus (OR = 1.16, trend; 95% confidence interval (c.i.): 1.07-1.26; P = 5.05 × 10−4). A summary across all seven studies at the 8q24 locus was highly significant (OR = 1.17, c.i.: 1.12-1.23; P = 3.16 × 10−11). This locus has also been implicated in prostate cancer.
We evaluated the influence of an antioxidant and zinc nutritional supplement the Age-Related Eye Disease Study (AREDS) formulation on delaying or preventing progression to neovascular AMD (NV) in ...persons with age-related macular degeneration (AMD). AREDS subjects (n = 802) with category 3 or 4 AMD at baseline who had been treated with placebo or the AREDS formulation were evaluated for differences in the risk of progression to NV as a function of complement factor H (CFH) and age-related maculopathy susceptibility 2 (ARMS2) genotype groups. We used published genetic grouping: a two-SNP haplotype risk-calling algorithm to assess CFH, and either the single SNP rs10490924 or 372_815del443ins54 to mark ARMS2 risk. Progression risk was determined using the Cox proportional hazard model. Genetics–treatment interaction on NV risk was assessed using a multiiterative bootstrap validation analysis. We identified strong interaction of genetics with AREDS formulation treatment on the development of NV. Individuals with high CFH and no ARMS2 risk alleles and taking the AREDS formulation had increased progression to NV compared with placebo. Those with low CFH risk and high ARMS2 risk had decreased progression risk. Analysis of CFH and ARMS2 genotype groups from a validation dataset reinforces this conclusion. Bootstrapping analysis confirms the presence of a genetics– treatment interaction and suggests that individual treatment response to the AREDS formulation is largely determined by genetics. The AREDS formulation modifies the risk of progression to NV based on individual genetics. Its use should be based on patient-specific genotype.
Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation. We previously demonstrated that SNPs (rs1800734, rs749072, and rs13098279) in the MLH1 gene region are associated ...with MLH1 promoter island methylation, loss of MLH1 protein expression, and microsatellite instability (MSI) in colorectal cancer (CRC) patients. Recent studies have identified less CpG-dense "shore" regions flanking many CpG islands. These shores often exhibit distinct methylation profiles between different tissues and matched normal versus tumor cells of patients. To date, most epigenetic studies have focused on somatic methylation events occurring within solid tumors; less is known of the contributions of peripheral blood cell (PBC) methylation to processes such as aging and tumorigenesis. To address whether MLH1 methylation in PBCs is correlated with tumorigenesis we utilized the Illumina 450 K microarrays to measure methylation in PBC DNA of 846 healthy controls and 252 CRC patients from Ontario, Canada. Analysis of a region of chromosome 3p21 spanning the MLH1 locus in healthy controls revealed that a CpG island shore 1 kb upstream of the MLH1 gene exhibits different methylation profiles when stratified by SNP genotypes (rs1800734, rs749072, and rs13098279). Individuals with wild-type genotypes incur significantly higher PBC shore methylation than heterozygous or homozygous variant carriers (p<1.1×10(-6); ANOVA). This trend is also seen in CRC cases (p<0.096; ANOVA). Shore methylation also decreases significantly with increasing age in cases and controls. This is the first study of its kind to integrate PBC methylation at a CpG island shore with SNP genotype status in CRC cases and controls. These results indicate that CpG island shore methylation in PBCs may be influenced by genotype as well as the normal aging process.
Colorectal cancer is the second leading cause of cancer death in developed countries. Genome-wide association studies (GWAS) have successfully identified novel susceptibility loci for colorectal ...cancer. To follow up on these findings, and try to identify novel colorectal cancer susceptibility loci, we present results for GWAS of colorectal cancer (2,906 cases, 3,416 controls) that have not previously published main associations. Specifically, we calculated odds ratios and 95% confidence intervals using log-additive models for each study. In order to improve our power to detect novel colorectal cancer susceptibility loci, we performed a meta-analysis combining the results across studies. We selected the most statistically significant single nucleotide polymorphisms (SNPs) for replication using ten independent studies (8,161 cases and 9,101 controls). We again used a meta-analysis to summarize results for the replication studies alone, and for a combined analysis of GWAS and replication studies. We measured ten SNPs previously identified in colorectal cancer susceptibility loci and found eight to be associated with colorectal cancer (
p
value range 0.02 to 1.8 × 10
−8
). When we excluded studies that have previously published on these SNPs, five SNPs remained significant at
p
< 0.05 in the combined analysis. No novel susceptibility loci were significant in the replication study after adjustment for multiple testing, and none reached genome-wide significance from a combined analysis of GWAS and replication. We observed marginally significant evidence for a second independent SNP in the
BMP2
region at chromosomal location 20p12 (rs4813802; replication
p
value 0.03; combined
p
value 7.3 × 10
−5
). In a region on 5p33.15, which includes the coding regions of the
TERT
-
CLPTM1L
genes and has been identified in GWAS to be associated with susceptibility to at least seven other cancers, we observed a marginally significant association with rs2853668 (replication
p
value 0.03; combined
p
value 1.9 × 10
−4
). Our study suggests a complex nature of the contribution of common genetic variants to risk for colorectal cancer.