Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol consumption ...from blood was reported with high accuracy, but these results were based on employing the same dataset for model training and testing, which can lead to accuracy overestimation. Moreover, only subsets of alcohol consumption categories were used, which makes it impossible to extrapolate such models to the general population. By using data from eight population-based European cohorts (N = 4677), we internally and externally validated the previously reported biomarkers and models for epigenetic inference of alcohol consumption from blood and developed new models comprising all data from all categories.
By employing data from six European cohorts (N = 2883), we empirically tested the reproducibility of the previously suggested biomarkers and prediction models via ten-fold internal cross-validation. In contrast to previous findings, all seven models based on 144-CpGs yielded lower mean AUCs compared to the models with less CpGs. For instance, the 144-CpG heavy versus non-drinkers model gave an AUC of 0.78 ± 0.06, while the 5 and 23 CpG models achieved 0.83 ± 0.05, respectively. The transportability of the models was empirically tested via external validation in three independent European cohorts (N = 1794), revealing high AUC variance between datasets within models. For instance, the 144-CpG heavy versus non-drinkers model yielded AUCs ranging from 0.60 to 0.84 between datasets. The newly developed models that considered data from all categories showed low AUCs but gave low AUC variation in the external validation. For instance, the 144-CpG heavy and at-risk versus light and non-drinkers model achieved AUCs of 0.67 ± 0.02 in the internal cross-validation and 0.61-0.66 in the external validation datasets.
The outcomes of our internal and external validation demonstrate that the previously reported prediction models suffer from both overfitting and accuracy overestimation. Our results show that the previously proposed biomarkers are not yet sufficient for accurate and robust inference of alcohol consumption from blood. Overall, our findings imply that DNA methylation prediction biomarkers and models need to be improved considerably before epigenetic inference of alcohol consumption from blood can be considered for practical applications.
Biomarkers are of interest as potential diagnostic and predictive instruments in personalized medicine. We present the first urinary metabolomics biomarker study of childhood aggression. We aim to ...examine the association of urinary metabolites and neurotransmitter ratios involved in key metabolic and neurotransmitter pathways in a large cohort of twins (
= 1,347) and clinic-referred children (
= 183) with an average age of 9.7 years. This study is part of ACTION (Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies), in which we developed a standardized protocol for large-scale collection of urine samples in children. Our analytical design consisted of three phases: a discovery phase in twins scoring low or high on aggression (
= 783); a replication phase in twin pairs discordant for aggression (
= 378); and a validation phase in clinical cases and matched twin controls (
= 367). In the discovery phase, 6 biomarkers were significantly associated with childhood aggression, of which the association of O-phosphoserine (β = 0.36;
= 0.09;
= 0.004), and gamma-L-glutamyl-L-alanine (β = 0.32;
= 0.09;
= 0.01) remained significant after multiple testing. Although non-significant, the directions of effect were congruent between the discovery and replication analyses for six biomarkers and two neurotransmitter ratios and the concentrations of 6 amines differed between low and high aggressive twins. In the validation analyses, the top biomarkers and neurotransmitter ratios, with congruent directions of effect, showed no significant associations with childhood aggression. We find suggestive evidence for associations of childhood aggression with metabolic dysregulation of neurotransmission, oxidative stress, and energy metabolism. Although replication is required, our findings provide starting points to investigate causal and pleiotropic effects of these dysregulations on childhood aggression.
DNA methylation alteration extensively associates with smoking and is a plausible link between smoking and adverse health. We examined the association between epigenome-wide DNA methylation and serum ...cotinine levels as a proxy of nicotine exposure and smoking quantity, assessed the role of SNPs in these associations, and evaluated molecular mediation by methylation in a sample of biochemically verified current smokers (N = 310).
DNA methylation at 50 CpG sites was associated (FDR < 0.05) with cotinine levels, 17 of which are novel associations. As cotinine levels are influenced not only by nicotine intake but also by CYP2A6-mediated nicotine metabolism rate, we performed secondary analyses adjusting for genetic risk score of nicotine metabolism rate and identified five additional novel associations. We further assessed the potential role of genetic variants in the detected association between methylation and cotinine levels observing 124 cis and 3898 trans methylation quantitative trait loci (meQTLs). Nineteen of these SNPs were also associated with cotinine levels (FDR < 0.05). Further, at seven CpG sites, we observed a trend (P < 0.05) that altered DNA methylation mediates the effect of SNPs on nicotine exposure rather than a direct consequence of smoking. Finally, we performed replication of our findings in two independent cohorts of biochemically verified smokers (N = 450 and N = 79).
Using cotinine, a biomarker of nicotine exposure, we replicated and extended identification of novel epigenetic associations in smoking-related genes. We also demonstrated that DNA methylation in some of the identified loci is driven by the underlying genotype and may mediate the causal effect of genotype on cotinine levels.
A key focus in cancer research is the discovery of biomarkers that accurately diagnose early lesions in non-invasive tissues. Several studies have identified malignancy-associated DNA methylation ...changes in blood, yet no general cancer biomarker has been identified to date. Here, we explore the potential of blood DNA methylation as a biomarker of pan-cancer (cancer of multiple different origins) in 41 female cancer discordant monozygotic (MZ) twin-pairs sampled before or after diagnosis using the Illumina HumanMethylation450 BeadChip.
We analysed epigenome-wide DNA methylation profiles in 41 cancer discordant MZ twin-pairs with affected individuals diagnosed with tumours at different single primary sites: the breast, cervix, colon, endometrium, thyroid gland, skin (melanoma), ovary, and pancreas. No significant global differences in whole blood DNA methylation profiles were observed. Epigenome-wide analyses identified one novel pan-cancer differentially methylated position at false discovery rate (FDR) threshold of 10 % (cg02444695, P = 1.8 × 10(-7)) in an intergenic region 70 kb upstream of the SASH1 tumour suppressor gene, and three suggestive signals in COL11A2, AXL, and LINC00340. Replication of the four top-ranked signals in an independent sample of nine cancer-discordant MZ twin-pairs showed a similar direction of association at COL11A2, AXL, and LINC00340, and significantly greater methylation discordance at AXL compared to 480 healthy concordant MZ twin-pairs. The effects at cg02444695 (near SASH1), COL11A2, and LINC00340 were the most promising in biomarker potential because the DNA methylation differences were found to pre-exist in samples obtained prior to diagnosis and were limited to a 5-year period before diagnosis. Gene expression follow-up at the top-ranked signals in 283 healthy individuals showed correlation between blood methylation and gene expression in lymphoblastoid cell lines at PRL, and in the skin tissue at AXL. A significant enrichment of differential DNA methylation was observed in enhancer regions (P = 0.03).
We identified DNA methylation signatures in blood associated with pan-cancer, at or near SASH1, COL11A2, AXL, and LINC00340. Three of these signals were present up to 5 years prior to cancer diagnosis, highlighting the potential clinical utility of whole blood DNA methylation analysis in cancer surveillance.
Low birth weight (LBW) can have an impact on health outcomes in later life, especially in relation to pre-disposition to metabolic disease. Several studies suggest that LBW resulting from restricted ...intrauterine growth leaves a footprint on DNA methylation in utero, and this influence likely persists into adulthood. To investigate this further, we performed epigenome-wide association analyses of blood DNA methylation using Infinium HumanMethylation450 BeadChip profiles in 71 adult monozygotic (MZ) twin pairs who were extremely discordant for birth weight. A signal mapping to the IGF1R gene (cg12562232, p = 2.62 × 10(-8)), was significantly associated with birth weight discordance at a genome-wide false-discovery rate (FDR) of 0.05. We pursued replication in three additional independent datasets of birth weight discordant MZ pairs and observed the same direction of association, but the results were not significant. However, a meta-analysis across the four independent samples, in total 216 birth-weight discordant MZ twin pairs, showed a significant positive association between birth weight and DNA methylation differences at IGF1R (random-effects meta-analysis p = .04), and the effect was particularly pronounced in older twins (random-effects meta-analysis p = .008, 98 older birth-weight discordant MZ twin pairs). The results suggest that severe intra-uterine growth differences (birth weight discordance >20%) are associated with methylation changes in the IGF1R gene in adulthood, independent of genetic effects.
Educational attainment is a key behavioural measure in studies of cognitive and physical health, and socioeconomic status. We measured DNA methylation at 410,746 CpGs (
= 4152) and identified 58 ...CpGs associated with educational attainment at loci characterized by pleiotropic functions shared with neuronal, immune and developmental processes. Associations overlapped with those for smoking behaviour, but remained after accounting for smoking at many CpGs: Effect sizes were on average 28% smaller and genome-wide significant at 11 CpGs after adjusting for smoking and were 62% smaller in never smokers. We examined sources and biological implications of education-related methylation differences, demonstrating correlations with maternal prenatal folate, smoking and air pollution signatures, and associations with gene expression in cis, dynamic methylation in foetal brain, and correlations between blood and brain. Our findings show that the methylome of lower-educated people resembles that of smokers beyond effects of their own smoking behaviour and shows traces of various other exposures.
Obese individuals are characterized by altered brain reward responses to food. Despite the latest discovery of obesity-associated genes, the contribution of environmental and genetic factors to brain ...reward responsiveness to food remains largely unclear. Sixteen female monozygotic twin pairs with a mean BMI discordance of 3.96 ± 2.1 kg/m
2
were selected from the Netherlands Twin Register to undergo functional MRI scanning while watching high- and low-calorie food and non-food pictures and during the anticipation and receipt of chocolate milk. In addition, appetite ratings, eating behavior and food intake were assessed using visual analog scales, validated questionnaires and an ad libitum lunch. In the overall group, visual and taste stimuli elicited significant activation in regions of interest (ROIs) implicated in reward, i.e. amygdala, insula, striatum and orbitofrontal cortex. However, when comparing leaner and heavier co-twins no statistically significant differences in ROI-activations were observed after family wise error correction. Heavier versus leaner co-twins reported higher feelings of hunger (
P
= 0.02), cravings for sweet food (
P
= 0.04), body dissatisfaction (
P
< 0.05) and a trend towards more emotional eating (
P
= 0.1), whereas caloric intake was not significantly different between groups (
P
= 0.3). Our results suggest that inherited rather than environmental factors are largely responsible for the obesity-related altered brain responsiveness to food. Future studies should elucidate the genetic variants underlying the susceptibility to reward dysfunction and obesity.
Clinical Trial Registration Number:
NCT02025595.
Abstract
Spontaneous dizygotic (DZ) twins, i.e. twins conceived without the use of ARTs, run in families and their prevalence varies widely around the globe. In contrast, monozygotic (MZ) twins occur ...at a constant rate across time and geographical regions and, with some rare exceptions, do not cluster in families. The leading hypothesis for MZ twins, which arise when a zygote splits during preimplantation stages of development, is random occurrence. We have found the first series of genes underlying the liability of being the mother of DZ twins and have shown that being an MZ twin is strongly associated with a stable DNA methylation signature in child and adult somatic tissues. Because identical twins keep this molecular signature across the lifespan, this discovery opens up completely new possibilities for the retrospective diagnosis of whether a person is an MZ twin whose co-twin may have vanished in the early stages of pregnancy. Here, we summarize the gene finding results for mothers of DZ twins based on genetic association studies followed by meta-analysis, and further present the striking epigenetic results for MZ twins.
Graphical Abstract
Graphical Abstract
Recent findings regarding the genetic susceptibility to being the mother of dizygotic (DZ) twins (top) and an epigenetic signature associated with being a monozygotic (MZ) twin (bottom). TGC (logo): Twinning Genetics Consortium; SNP: single-nucleotide polymorphism.
In current biomedical and complex trait research, increasing numbers of large molecular profiling (omics) data sets are being generated. At the same time, many studies fail to be reproduced (Baker ...2016, Kim 2018). In order to improve study reproducibility and data reuse, including integration of data sets of different types and origins, it is imperative to work with omics data that is findable, accessible, interoperable, and reusable (FAIR, Wilkinson 2016) at the source. The data analysis, integration and stewardship pillar of the Netherlands X-omics Initiative aims to facilitate multi-omics research by providing tools to create, analyze and integrate FAIR omics data. We here report a joint activity of X-omics and the Netherlands Twin Register demonstrating the FAIRification of a multi-omics data set and the development of a FAIR multi-omics data analysis workflow.
The implementation of FAIR principles (Wilkinson 2016) can improve scientific transparency and facilitate data reuse. However, Kim (2018) showed in a case study that the availability of data and code are required but not sufficient to reproduce data analyses. They highlighted the importance of interoperable and open formats, and structured metadata. In order to increase research reproducibility on the data analysis level, additional practices such as version-control, code licensing, and documentation have been proposed. These include recommendations for FAIR software by the Netherlands eScience Center and the Dutch Data Archiving and Networked Services (DANS), and FAIR principles for research software proposed by the Research Data Alliance (Chue Hong 2022). Data analysis in biomedical research usually comprises multiple steps often resulting in complex data analysis workflows and requiring additional practices, such as containerization, to ensure transparency and reproducibility (Goble 2020, Stoudt 2021).
We apply these practices to a multi-omics data set that comprises genome-wide DNA methylation profiles, targeted metabolomics, and behavioral data of two cohorts that participated in the ACTION Biomarker Study (ACTION, Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies, see consortium members in Suppl. material 1) (Boomsma 2015, Bartels 2018, Hagenbeek 2020, van Dongen 2021, Hagenbeek 2022). The ACTION-NTR cohort consists of twins that are either longitudinally concordant or discordant for childhood aggression. The ACTION-Curium-LUMC cohort consists of children referred to the Dutch LUMC Curium academic center for child and youth psychiatry. With the joint analysis of multi-omics data and behavioral data, we aim to identify substructures in the ACTION-NTR cohort and link them to aggressive behavior. First, the individuals are clustered using Similarity Network Fusion (SNF, Wang 2014), and latent feature dimensions are uncovered using different unsupervised methods including Multi-Omics Factor Analysis (MOFA) (Argelaguet 2018) and Multiple Correspondence Analysis (MCA, Lê 2008, Husson 2017). In a second step, we determine correlations between -omics and phenotype dimensions, and use them to explain the subgroups of individuals from the ACTION-NTR cohort. In order to validate the results, we project data of the ACTION-Curium-LUMC cohort onto the latent dimensions and determine if correlations between omics and phenotype data can be reproduced.
Integration of data across cohorts and across data types, requires interoperability. We applied different practices to make the data FAIR, including conversion of files to community-standard formats, and capturing experimental metadata using the ISA (Investigation, Study, Assay) metadata framework (Johnson 2021) and ontology-based annotations. All data analysis steps including pre-processing of different omics data types were implemented in either R or Python and combined in a modular Nextflow (Di Tommaso 2017) workflow, where the environment for each step is provided as a Singularity (Kurtzer 2017) container. The analysis workflow is packaged in a Research Object Crate (RO-Crate) (Soiland-Reyes 2022). The RO-Crate is a FAIR digital object that contains the Nextflow workflow including ontology-based annotations of each analysis step. Since omics data is considered to be potentially personally identifiable, the packaged workflow contains a minimal synthetic data set resembling the original data structure. Finally, the code is made available on GitHub and the workflow is registered at Workflowhub (Goble 2021). Since our Nextflow workflow is set up in a modular manner, the individual analysis steps can be reused in other workflows. We demonstrate this replicability by applying different sub-workflows to data from two different cohorts.
Variation in metabolite levels reflects individual differences in genetic and environmental factors. Here, we investigated the role of these factors in urinary metabolomics data in children. We ...examined the effects of sex and age on 86 metabolites, as measured on three metabolomics platforms that target amines, organic acids, and steroid hormones. Next, we estimated their heritability in a twin cohort of 1300 twins (age range: 5.7–12.9 years). We observed associations between age and 50 metabolites and between sex and 21 metabolites. The monozygotic (MZ) and dizygotic (DZ) correlations for the urinary metabolites indicated a role for non-additive genetic factors for 50 amines, 13 organic acids, and 6 steroids. The average broad-sense heritability for these amines, organic acids, and steroids was 0.49 (range: 0.25–0.64), 0.50 (range: 0.33–0.62), and 0.64 (range: 0.43–0.81), respectively. For 6 amines, 7 organic acids, and 4 steroids the twin correlations indicated a role for shared environmental factors and the average narrow-sense heritability was 0.50 (range: 0.37–0.68), 0.50 (range; 0.23–0.61), and 0.47 (range: 0.32–0.70) for these amines, organic acids, and steroids. We conclude that urinary metabolites in children have substantial heritability, with similar estimates for amines and organic acids, and higher estimates for steroid hormones.