Metabolites are small molecules involved in cellular metabolism, which can be detected in biological samples using metabolomic techniques. Here we present the results of genome-wide association and ...meta-analyses for variation in the blood serum levels of 129 metabolites as measured by the Biocrates metabolomic platform. In a discovery sample of 7,478 individuals of European descent, we find 4,068 genome- and metabolome-wide significant (Z-test, P < 1.09 × 10(-9)) associations between single-nucleotide polymorphisms (SNPs) and metabolites, involving 59 independent SNPs and 85 metabolites. Five of the fifty-nine independent SNPs are new for serum metabolite levels, and were followed-up for replication in an independent sample (N = 1,182). The novel SNPs are located in or near genes encoding metabolite transporter proteins or enzymes (SLC22A16, ARG1, AGPS and ACSL1) that have demonstrated biomedical or pharmaceutical importance. The further characterization of genetic influences on metabolic phenotypes is important for progress in biological and medical research.
Insights into individual differences in gene expression and its heritability (h
) can help in understanding pathways from DNA to phenotype. We estimated the heritability of gene expression of 52,844 ...genes measured in whole blood in the largest twin RNA-Seq sample to date (1497 individuals including 459 monozygotic twin pairs and 150 dizygotic twin pairs) from classical twin modeling and identity-by-state-based approaches. We estimated for each gene h
, composed of cis-heritability (h
, the variance explained by single nucleotide polymorphisms in the cis-window of the gene), and trans-heritability (h
, the residual variance explained by all other genome-wide variants). Mean h
was 0.26, which was significantly higher than heritability estimates earlier found in a microarray-based study using largely overlapping (>60%) RNA samples (mean h
= 0.14, p = 6.15 × 10
). Mean h
was 0.06 and strongly correlated with beta of the top cis expression quantitative loci (eQTL, ρ = 0.76, p < 10
) and with estimates from earlier RNA-Seq-based studies. Mean h
was 0.20 and correlated with the beta of the corresponding trans-eQTL (ρ = 0.04, p < 1.89 × 10
) and was significantly higher for genes involved in cytokine-cytokine interactions (p = 4.22 × 10
), many other immune system pathways, and genes identified in genome-wide association studies for various traits including behavioral disorders and cancer. This study provides a thorough characterization of cis- and trans-h
estimates of gene expression, which is of value for interpretation of GWAS and gene expression studies.
Handedness has low heritability and epigenetic mechanisms have been proposed as an etiological mechanism. To examine this hypothesis, we performed an epigenome-wide association study of ...left-handedness. In a meta-analysis of 3914 adults of whole-blood DNA methylation, we observed that CpG sites located in proximity of handedness-associated genetic variants were more strongly associated with left-handedness than other CpG sites (P = 0.04), but did not identify any differentially methylated positions. In longitudinal analyses of DNA methylation in peripheral blood and buccal cells from children (N = 1737), we observed moderately stable associations across age (correlation range 0.355-0.578), but inconsistent across tissues (correlation range - 0.384 to 0.318). We conclude that DNA methylation in peripheral tissues captures little of the variance in handedness. Future investigations should consider other more targeted sources of tissue, such as the brain.
The association between circadian rhythms and diseases has been well established, while the association with mental health is less explored. Given the heritable nature of circadian rhythms, this ...study aimed to investigate the relationship between genes underlying circadian rhythms and mental health outcomes, as well as a possible gene-environment correlation for circadian rhythms. Polygenic scores (PGSs) represent the genetic predisposition to develop a certain trait or disease. In a sample from the Netherlands Twin Register (N = 14,021), PGSs were calculated for two circadian rhythm measures: morningness and relative amplitude (RA). The PGSs were used to predict mental health outcomes such as subjective happiness, quality of life, and depressive symptoms. In addition, we performed the same prediction analysis in a within-family design in a subset of dizygotic twins. The PGS for morningness significantly predicted morningness (R2 = 1.55%) and depressive symptoms (R2 = 0.22%). The PGS for RA significantly predicted general health (R2 = 0.12%) and depressive symptoms (R2 = 0.20%). Item analysis of the depressive symptoms showed that 4 out of 14 items were significantly associated with the PGSs. Overall, the results showed that people with a genetic predisposition of being a morning person or with a high RA are likely to have fewer depressive symptoms. The four associated depressive symptoms described symptoms related to decision-making, energy, and feeling worthless or inferior, rather than sleep. Based on our findings future research should include a substantial role for circadian rhythms in depression research and should further explore the gene-environment correlation in circadian rhythms.
We compute the potential of mean force for two gold nanocrystals capped by alkylthiols from atomistic simulations and show how variables such as temperature, capping molecule length, and the presence ...of solvent affect these interactions. Our main findings are (1) the equilibrium distance in vacuum always equals ∼1.25 times the core diameter, (2) incomplete capping layers promote sintering, and (3) the presence of a good solvent results in purely repulsive interactions.
In current biomedical and complex trait research, increasing numbers of large molecular profiling (omics) data sets are being generated. At the same time, many studies fail to be reproduced (Baker ...2016, Kim 2018). In order to improve study reproducibility and data reuse, including integration of data sets of different types and origins, it is imperative to work with omics data that is findable, accessible, interoperable, and reusable (FAIR, Wilkinson 2016) at the source. The data analysis, integration and stewardship pillar of the Netherlands X-omics Initiative aims to facilitate multi-omics research by providing tools to create, analyze and integrate FAIR omics data. We here report a joint activity of X-omics and the Netherlands Twin Register demonstrating the FAIRification of a multi-omics data set and the development of a FAIR multi-omics data analysis workflow.
The implementation of FAIR principles (Wilkinson 2016) can improve scientific transparency and facilitate data reuse. However, Kim (2018) showed in a case study that the availability of data and code are required but not sufficient to reproduce data analyses. They highlighted the importance of interoperable and open formats, and structured metadata. In order to increase research reproducibility on the data analysis level, additional practices such as version-control, code licensing, and documentation have been proposed. These include recommendations for FAIR software by the Netherlands eScience Center and the Dutch Data Archiving and Networked Services (DANS), and FAIR principles for research software proposed by the Research Data Alliance (Chue Hong 2022). Data analysis in biomedical research usually comprises multiple steps often resulting in complex data analysis workflows and requiring additional practices, such as containerization, to ensure transparency and reproducibility (Goble 2020, Stoudt 2021).
We apply these practices to a multi-omics data set that comprises genome-wide DNA methylation profiles, targeted metabolomics, and behavioral data of two cohorts that participated in the ACTION Biomarker Study (ACTION, Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies, see consortium members in Suppl. material 1) (Boomsma 2015, Bartels 2018, Hagenbeek 2020, van Dongen 2021, Hagenbeek 2022). The ACTION-NTR cohort consists of twins that are either longitudinally concordant or discordant for childhood aggression. The ACTION-Curium-LUMC cohort consists of children referred to the Dutch LUMC Curium academic center for child and youth psychiatry. With the joint analysis of multi-omics data and behavioral data, we aim to identify substructures in the ACTION-NTR cohort and link them to aggressive behavior. First, the individuals are clustered using Similarity Network Fusion (SNF, Wang 2014), and latent feature dimensions are uncovered using different unsupervised methods including Multi-Omics Factor Analysis (MOFA) (Argelaguet 2018) and Multiple Correspondence Analysis (MCA, Lê 2008, Husson 2017). In a second step, we determine correlations between -omics and phenotype dimensions, and use them to explain the subgroups of individuals from the ACTION-NTR cohort. In order to validate the results, we project data of the ACTION-Curium-LUMC cohort onto the latent dimensions and determine if correlations between omics and phenotype data can be reproduced.
Integration of data across cohorts and across data types, requires interoperability. We applied different practices to make the data FAIR, including conversion of files to community-standard formats, and capturing experimental metadata using the ISA (Investigation, Study, Assay) metadata framework (Johnson 2021) and ontology-based annotations. All data analysis steps including pre-processing of different omics data types were implemented in either R or Python and combined in a modular Nextflow (Di Tommaso 2017) workflow, where the environment for each step is provided as a Singularity (Kurtzer 2017) container. The analysis workflow is packaged in a Research Object Crate (RO-Crate) (Soiland-Reyes 2022). The RO-Crate is a FAIR digital object that contains the Nextflow workflow including ontology-based annotations of each analysis step. Since omics data is considered to be potentially personally identifiable, the packaged workflow contains a minimal synthetic data set resembling the original data structure. Finally, the code is made available on GitHub and the workflow is registered at Workflowhub (Goble 2021). Since our Nextflow workflow is set up in a modular manner, the individual analysis steps can be reused in other workflows. We demonstrate this replicability by applying different sub-workflows to data from two different cohorts.
Molecular simulation techniques are increasingly being used to study biomolecular systems at an atomic level. Such simulations rely on empirical force fields to represent the intermolecular ...interactions. There are many different force fields available--each based on a different set of assumptions and thus requiring different parametrization procedures. Recently, efforts have been made to fully automate the assignment of force-field parameters, including atomic partial charges, for novel molecules. In this work, we focus on a problem arising in the automated parametrization of molecules for use in combination with the GROMOS family of force fields: namely, the assignment of atoms to charge groups such that for every charge group the sum of the partial charges is ideally equal to its formal charge. In addition, charge groups are required to have size at most k. We show NP-hardness and give an exact algorithm that solves practical problem instances to provable optimality in a fraction of a second.
Most disease-associated genetic variants are noncoding, making it challenging to design experiments to understand their functional consequences. Identification of expression quantitative trait loci ...(eQTLs) has been a powerful approach to infer the downstream effects of disease-associated variants, but most of these variants remain unexplained. The analysis of DNA methylation, a key component of the epigenome, offers highly complementary data on the regulatory potential of genomic regions. Here we show that disease-associated variants have widespread effects on DNA methylation in trans that likely reflect differential occupancy of trans binding sites by cis-regulated transcription factors. Using multiple omics data sets from 3,841 Dutch individuals, we identified 1,907 established trait-associated SNPs that affect the methylation levels of 10,141 different CpG sites in trans (false discovery rate (FDR) < 0.05). These included SNPs that affect both the expression of a nearby transcription factor (such as NFKB1, CTCF and NKX2-3) and methylation of its respective binding site across the genome. Trans methylation QTLs effectively expose the downstream effects of disease-associated variants.
Metformin is used as a first-line oral treatment for type 2 diabetes (T2D). However, the underlying mechanism is not fully understood. Here, we aimed to comprehensively investigate the pleiotropic ...effects of metformin.
We analyzed both metabolomic and genomic data of the population-based KORA cohort. To evaluate the effect of metformin treatment on metabolite concentrations, we quantified 131 metabolites in fasting serum samples and used multivariable linear regression models in three independent cross-sectional studies (n = 151 patients with T2D treated with metformin mt-T2D). Additionally, we used linear mixed-effect models to study the longitudinal KORA samples (n = 912) and performed mediation analyses to investigate the effects of metformin intake on blood lipid profiles. We combined genotyping data with the identified metformin-associated metabolites in KORA individuals (n = 1,809) and explored the underlying pathways.
We found significantly lower (P < 5.0E-06) concentrations of three metabolites (acyl-alkyl phosphatidylcholines PCs) when comparing mt-T2D with four control groups who were not using glucose-lowering oral medication. These findings were controlled for conventional risk factors of T2D and replicated in two independent studies. Furthermore, we observed that the levels of these metabolites decreased significantly in patients after they started metformin treatment during 7 years' follow-up. The reduction of these metabolites was also associated with a lowered blood level of LDL cholesterol (LDL-C). Variations of these three metabolites were significantly associated with 17 genes (including FADS1 and FADS2) and controlled by AMPK, a metformin target.
Our results indicate that metformin intake activates AMPK and consequently suppresses FADS, which leads to reduced levels of the three acyl-alkyl PCs and LDL-C. Our findings suggest potential beneficial effects of metformin in the prevention of cardiovascular disease.