Studying how genetic predispositions come together with environmental factors to contribute to complex behavioral outcomes has great potential for advancing the understanding of the development of ...psychopathology. It represents a clear theoretical advance over studying these factors in isolation. However, research at the intersection of multiple fields creates many challenges. We review several reasons why the rapidly expanding candidate gene-environment interaction (cG×E) literature should be considered with a degree of caution. We discuss lessons learned about candidate gene main effects from the evolving genetics literature and how these inform the study of cG×E. We review the importance of the measurement of the gene and environment of interest in cG×E studies. We discuss statistical concerns with modeling cGxE that are frequently overlooked. Furthermore, we review other challenges that have likely contributed to the cG×E literature being difficult to interpret, including low power and publication bias. Many of these issues are similar to other concerns about research integrity (e.g., high false-positive rates) that have received increasing attention in the social sciences. We provide recommendations for rigorous research practices for cG×E studies that we believe will advance its potential to contribute more robustly to the understanding of complex behavioral phenotypes.
Multiple methods have been developed to estimate narrow-sense heritability, h
, using single nucleotide polymorphisms (SNPs) in unrelated individuals. However, a comprehensive evaluation of these ...methods has not yet been performed, leading to confusion and discrepancy in the literature. We present the most thorough and realistic comparison of these methods to date. We used thousands of real whole-genome sequences to simulate phenotypes under varying genetic architectures and confounding variables, and we used array, imputed, or whole genome sequence SNPs to obtain 'SNP-heritability' estimates. We show that SNP-heritability can be highly sensitive to assumptions about the frequencies, effect sizes, and levels of linkage disequilibrium of underlying causal variants, but that methods that bin SNPs according to minor allele frequency and linkage disequilibrium are less sensitive to these assumptions across a wide range of genetic architectures and possible confounding factors. These findings provide guidance for best practices and proper interpretation of published estimates.
The classical twin design (CTD) is the most common method used to infer genetic and environmental causes of phenotypic variation. As has long been acknowledged, different combinations of the common ...environment/assortative mating, and additive, dominant, and epistatic genetic effects can lead to the same observed covariation between twin pairs, meaning that there is an inherent indeterminacy in parameter estimates arising from the CTD. The CTD circumvents this indeterminacy by assuming that higher-order epistasis is negligible and that the effects of either dominant genetic variation or the common environment are nonexistent. These assumptions, however, lead to consistent biases in parameter estimation. The current paper quantifies these biases and discusses alternative strategies for dealing with parameter indeterminacy in twin designs. One strategy is to model the similarity among other relatives in addition to twins (extended twin-family designs), which reduces but does not eliminate indeterminacy in parameter estimates. A more general strategy, applicable to all twin designs, is to present the parameter indeterminacy explicitly, as in a graph. Presenting the space of mathematically equally likely parameter values is important, not only because it aids the proper interpretation of twin design findings, but also because it keeps behavioral geneticists themselves mindful of methodological assumptions that can easily go unexamined.
Whole genome pathway analysis is a powerful tool for the exploration of the combined effects of gene-sets within biological pathways. This study applied Interval Based Enrichment Analysis (INRICH) to ...perform whole-genome pathway analysis of body-mass index (BMI). We used a discovery set composed of summary statistics from a meta-analysis of 123,865 subjects performed by the GIANT Consortium, and an independent sample of 8,632 subjects to assess replication of significant pathways. We examined SNPs within nominally significant pathways using linear mixed models to estimate their contribution to overall BMI heritability. Six pathways replicated as having significant enrichment for association after correcting for multiple testing, including the previously unknown relationships between BMI and the Reactome regulation of ornithine decarboxylase pathway, the KEGG lysosome pathway, and the Reactome stabilization of P53 pathway. Two non-overlapping sets of genes emerged from the six significant pathways. The clustering of shared genes based on previously identified protein-protein interactions listed in PubMed and OMIM supported the relatively independent biological effects of these two gene-sets. We estimate that the SNPs located in examined pathways explain ∼20% of the heritability for BMI that is tagged by common SNPs (3.35% of the 16.93% total).
The classical twin design uses data on the variation of and covariation between monozygotic and dizygotic twins to infer underlying genetic and environmental causes of phenotypic variation in the ...population. By using data from additional relative classes, such as parents, extended twin family designs more comprehensively describe the causes of phenotypic variation. This article introduces an extension of previous extended twin family models, the Cascade model, which uses information on twins as well as their siblings, spouses, parents, and children to differentiate two genetic and six environmental sources of phenotypic variation. The Cascade also relaxes assumptions regarding mating and cultural transmission that existed in previous extended twin family designs. The estimation of additional parameters and relaxation of assumptions is potentially important, not only because it allows more fine-grained descriptions of the causes of phenotypic variation, but more importantly, because it can reduce the biases in parameter estimates that exist in earlier designs.
We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome ...sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.
Computer simulations are excellent tools for understanding the evolutionary and genetic consequences of complex processes that cannot be analytically predicted and for creating realistic genetic ...data. There are many software packages that simulate genetic data, but they are typically not fast or memory efficient enough to simulate realistic, individual-level genome-wide SNP/sequence data.
GeneEvolve is a user-friendly and efficient population genetics simulator that handles complex evolutionary and life history scenarios and generates individual-level phenotypes and realistic whole-genome sequence or SNP data. GeneEvolve runs forward-in-time, which allows it to provide a wide range of scenarios for mating systems, selection, population size and structure, migration, recombination and environmental effects. The software is designed to use as input data from real or previously simulated phased haplotypes, allowing it to mimic very closely the properties of real genomic data.
GeneEvolve is freely available at https://github.com/rtahmasbi/GeneEvolve CONTACT: Rasool.Tahmasbi@Colorado.eduSupplementary information: Supplementary data are available at Bioinformatics online.
In a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and ...parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing
r
2
values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation (
r
2
=
.
05
), standard errors of these standardized estimates are reasonable (
<
.
05
) for
n
=
16
K
trios, and can even be reasonable for smaller sample sizes (e.g., down to 4
K
) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g.,
r
2
>
.
025
)
as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring.
Traditional genome-wide association studies are generally limited in their ability explain a large portion of genetic risk for most common diseases. We sought to use both traditional GWAS methods, as ...well as more recently developed polygenic genome-wide analysis techniques to identify subsets of single-nucleotide polymorphisms (SNPs) that may be involved in risk of cardiovascular disease, as well as estimate the heritability explained by common SNPs.
Using data from the Framingham SNP Health Association Resource (SHARe), three complimentary methods were applied to examine the genetic factors associated with the Framingham Risk Score, a widely accepted indicator of underlying cardiovascular disease risk. The first method adopted a traditional GWAS approach - independently testing each SNP for association with the Framingham Risk Score. The second two approaches involved polygenic methods with the intention of providing estimates of aggregate genetic risk and heritability.
While no SNPs were independently associated with the Framingham Risk Score based on the results of the traditional GWAS analysis, we were able to identify cardiovascular disease-related SNPs as reported by previous studies. A predictive polygenic analysis was only able to explain approximately 1% of the genetic variance when predicting the 10-year risk of general cardiovascular disease. However, 20% to 30% of the variation in the Framingham Risk Score was explained using a recently developed method that considers the joint effect of all SNPs simultaneously.
The results of this study imply that common SNPs explain a large amount of the variation in the Framingham Risk Score and suggest that future, better-powered genome-wide association studies, possibly informed by knowledge of gene-pathways, will uncover more risk variants that will help to elucidate the genetic architecture of cardiovascular disease.
Objective:
The authors sought to determine whether, in a general population sample, different categories of adverse life events were associated with different patterns of depressive symptoms.
Method:
...A total of 4,856 individuals (53% female) who experienced depressive symptoms in the previous year were assessed in up to four waves over a maximum of 12 years. At each wave, participants reported the severity of 12 symptoms disaggregated from the nine DSM-III-R criteria for major depression and the self-identified cause of these symptoms, which were classified into nine categories of adverse life events.
Results:
The patterns of depressive symptoms associated with the nine categories of adverse life events differed significantly. Deaths of loved ones and romantic breakups were marked by high levels of sadness, anhedonia, appetite loss, and (for romantic breakups) guilt. Chronic stress and, to a lesser degree, failures were associated with fatigue and hypersomnia, but less so with sadness, anhedonia, and appetite loss. Those who reported that no adverse life events caused their dysphoric episodes reported fatigue, appetite gain, and thoughts of self-harm, but less sadness or trouble concentrating. These symptom patterns were found in a between-persons analysis of participants who had a single dysphoric episode, and they were replicated in an independent within-persons analysis of episode-specific symptom deviations among individuals with multiple episodes. Similar results were obtained when the sample was restricted to those meeting DSM-III-R diagnostic criteria for major depression.
Conclusions:
Depression is a pathoplastic syndrome. Different types of life events are related to different depressive symptom profiles. The results from the within-persons analysis suggest that these relationships are causal.