In this article, we review some of the data that contribute to our understanding of the genetic architecture of psychiatric disorders. These include results from evolutionary modelling (hence no ...data), the observed recurrence risk to relatives and data from molecular markers. We briefly discuss the common-disease common-variant hypothesis, the success (or otherwise) of genome-wide association studies, the evidence for polygenic variance and the likely success of exome and whole-genome sequencing studies. We conclude that the perceived dichotomy between 'common' and 'rare' variants is not only false, but unhelpful in making progress towards increasing our understanding of the genetic basis of psychiatric disorders. Strong evidence has been accumulated that is consistent with the contribution of many genes to risk of disease, across a wide range of allele frequencies and with a substantial proportion of genetic variation in the population in linkage disequilibrium with single-nucleotide polymorphisms (SNPs) on commercial genotyping arrays. At the same time, most causal variants that segregate in the population are likely to be rare and in total these variants also explain a significant proportion of genetic variation. It is the combination of allele frequency, effect size and functional characteristics that will determine the success of new experimental paradigms such as whole exome/genome sequencing to detect such loci. Empirical results suggest that roughly half the genetic variance is tagged by SNPs on commercial genome-wide chips, but that individual causal variants have a small effect size, on average. We conclude that larger experimental sample sizes are essential to further our understanding of the biology underlying psychiatric disorders.
Dense SNP genotypes are often combined with complex trait phenotypes to map causal variants, study genetic architecture and provide genomic predictions for individuals with genotypes but no ...phenotype. A single method of analysis that jointly fits all genotypes in a Bayesian mixture model (BayesR) has been shown to competitively address all 3 purposes simultaneously. However, BayesR and other similar methods ignore prior biological knowledge and assume all genotypes are equally likely to affect the trait. While this assumption is reasonable for SNP array genotypes, it is less sensible if genotypes are whole-genome sequence variants which should include causal variants.
We introduce a new method (BayesRC) based on BayesR that incorporates prior biological information in the analysis by defining classes of variants likely to be enriched for causal mutations. The information can be derived from a range of sources, including variant annotation, candidate gene lists and known causal variants. This information is then incorporated objectively in the analysis based on evidence of enrichment in the data. We demonstrate the increased power of BayesRC compared to BayesR using real dairy cattle genotypes with simulated phenotypes. The genotypes were imputed whole-genome sequence variants in coding regions combined with dense SNP markers. BayesRC increased the power to detect causal variants and increased the accuracy of genomic prediction. The relative improvement for genomic prediction was most apparent in validation populations that were not closely related to the reference population. We also applied BayesRC to real milk production phenotypes in dairy cattle using independent biological priors from gene expression analyses. Although current biological knowledge of which genes and variants affect milk production is still very incomplete, our results suggest that the new BayesRC method was equal to or more powerful than BayesR for detecting candidate causal variants and for genomic prediction of milk traits.
BayesRC provides a novel and flexible approach to simultaneously improving the accuracy of QTL discovery and genomic prediction by taking advantage of prior biological knowledge. Approaches such as BayesRC will become increasing useful as biological knowledge accumulates regarding functional regions of the genome for a range of traits and species.
Genetic correlations are the genome-wide aggregate effects of causal variants affecting multiple traits. Traditionally, genetic correlations between complex traits are estimated from pedigree ...studies, but such estimates can be confounded by shared environmental factors. Moreover, for diseases, low prevalence rates imply that even if the true genetic correlation between disorders was high, co-aggregation of disorders in families might not occur or could not be distinguished from chance. We have developed and implemented statistical methods based on linear mixed models to obtain unbiased estimates of the genetic correlation between pairs of quantitative traits or pairs of binary traits of complex diseases using population-based case-control studies with genome-wide single-nucleotide polymorphism data. The method is validated in a simulation study and applied to estimate genetic correlation between various diseases from Wellcome Trust Case Control Consortium data in a series of bivariate analyses. We estimate a significant positive genetic correlation between risk of Type 2 diabetes and hypertension of ~0.31 (SE 0.14, P = 0.024).
Our methods, appropriate for both quantitative and binary traits, are implemented in the freely available software GCTA (http://www.complextraitgenomics.com/software/gcta/reml_bivar.html).
hong.lee@uq.edu.au
Supplementary data are available at Bioinformatics online.
Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has ...been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits.
A new technology called genomic selection is revolutionizing dairy cattle breeding. Genomic selection refers to selection decisions based on genomic breeding values (GEBV). The GEBV are calculated as ...the sum of the effects of dense genetic markers, or haplotypes of these markers, across the entire genome, thereby potentially capturing all the quantitative trait loci (QTL) that contribute to variation in a trait. The QTL effects, inferred from either haplotypes or individual single nucleotide polymorphism markers, are first estimated in a large reference population with phenotypic information. In subsequent generations, only marker information is required to calculate GEBV. The reliability of GEBV predicted in this way has already been evaluated in experiments in the United States, New Zealand, Australia, and the Netherlands. These experiments used reference populations of between 650 and 4,500 progeny-tested Holstein-Friesian bulls, genotyped for approximately 50,000 genome-wide markers. Reliabilities of GEBV for young bulls without progeny test results in the reference population were between 20 and 67%. The reliability achieved depended on the heritability of the trait evaluated, the number of bulls in the reference population, the statistical method used to estimate the single nucleotide polymorphism effects in the reference population, and the method used to calculate the reliability. A common finding in 3 countries (United States, New Zealand, and Australia) was that a straightforward BLUP method for estimating the marker effects gave reliabilities of GEBV almost as high as more complex methods. The BLUP method is attractive because the only prior information required is the additive genetic variance of the trait. All countries included a polygenic effect (parent average breeding value) in their GEBV calculation. This inclusion is recommended to capture any genetic variance not associated with the markers, and to put some selection pressure on low-frequency QTL that may not be captured by the markers. The reliabilities of GEBV achieved were significantly greater than the reliability of parental average breeding values, the current criteria for selection of bull calves to enter progeny test teams. The increase in reliability is sufficiently high that at least 2 dairy breeding companies are already marketing bull teams for commercial use based on their GEBV only, at 2 yr of age. This strategy should at least double the rate of genetic gain in the dairy industry. Many challenges with genomic selection and its implementation remain, including increasing the accuracy of GEBV, integrating genomic information into national and international genetic evaluations, and managing long-term genetic gain.
Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of ...approximately 50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size N(e) = 100, the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.
General intelligence is an important human quantitative trait that accounts for much of the variation in diverse cognitive abilities. Individual differences in intelligence are strongly associated ...with many important life outcomes, including educational and occupational attainments, income, health and lifespan. Data from twin and family studies are consistent with a high heritability of intelligence, but this inference has been controversial. We conducted a genome-wide analysis of 3511 unrelated adults with data on 549,692 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits. We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits. We partitioned genetic variation on individual chromosomes and found that, on average, longer chromosomes explain more variation. Finally, using just SNP data we predicted ∼1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P=0.009 and 0.028, respectively). Our results unequivocally confirm that a substantial proportion of individual differences in human intelligence is due to genetic variation, and are consistent with many genes of small effects underlying the additive genetic influences on intelligence.
Genomic prediction of future phenotypes or genetic merit using dense SNP genotypes can be used for prediction of disease risk, forensics, and genomic selection of livestock and domesticated plant ...species. The reliability of genomic predictions is their squared correlation with the true genetic merit and indicates the proportion of the genetic variance that is explained. As reliability relies heavily on the number of phenotypes, combining data sets from multiple populations may be attractive as a way to increase reliabilities, particularly when phenotypes are scarce. However, this strategy may also decrease reliabilities if the marker effects are very different between the populations. The effect of combining multiple populations on the reliability of genomic predictions was assessed for two simulated cattle populations, A and B, that had diverged for T = 6, 30, or 300 generations. The training set comprised phenotypes of 1000 individuals from population A and 0, 300, 600, or 1000 individuals from population B, while marker density and trait heritability were varied. Adding individuals from population B to the training set increased the reliability in population A by up to 0.12 when the marker density was high and T = 6, whereas it decreased the reliability in population A by up to 0.07 when the marker density was low and T = 300. Without individuals from population B in the training set, the reliability in population B was up to 0.77 lower than in population A, especially for large T. Adding individuals from population B to the training set increased the reliability in population B to close to the same level as in population A when the marker density was sufficiently high for the marker-QTL linkage disequilibrium to persist across populations. Our results suggest that the most accurate genomic predictions are achieved when phenotypes from all populations are combined in one training set, while for more diverged populations a higher marker density is required.
When a genetic marker and a quantitative trait locus (QTL) are in linkage disequilibrium (LD) in one population, they may not be in LD in another population or their LD phase may be reversed. The ...objectives of this study were to compare the extent of LD and the persistence of LD phase across multiple cattle populations. LD measures r and r(2) were calculated for syntenic marker pairs using genomewide single-nucleotide polymorphisms (SNP) that were genotyped in Dutch and Australian Holstein-Friesian (HF) bulls, Australian Angus cattle, and New Zealand Friesian and Jersey cows. Average r(2) was approximately 0.35, 0.25, 0.22, 0.14, and 0.06 at marker distances 10, 20, 40, 100, and 1000 kb, respectively, which indicates that genomic selection within cattle breeds with r(2) >or= 0.20 between adjacent markers would require approximately 50,000 SNPs. The correlation of r values between populations for the same marker pairs was close to 1 for pairs of very close markers (<10 kb) and decreased with increasing marker distance and the extent of divergence between the populations. To find markers that are in LD with QTL across diverged breeds, such as HF, Jersey, and Angus, would require approximately 300,000 markers.
The longevity of dairy cattle has economic, animal welfare, and health implications and is influenced by the frequency of mortality on the farm and sale for slaughter. In this study cows removed from ...the herd due to death or slaughter during the lactation were coded 1 and cows that were not terminated were coded 0. Genetic parameters for mortality rates (MR) and slaughter rates (SR) were estimated for Holstein (H) and Jersey (J) breeds by applying both linear (LM) and threshold (TM) sire models using about 1.2 million H and 286,000 J cows. Estimated breeding values (EBV) for MR and SR were predicted using animal models to assess the opportunity for selection and genetic trends. Cow termination data, recorded between 1990 and 2020 on a voluntary basis by Australian dairy farmers, were analyzed. Cow MR has increased from below 1% in the 1990s to 4.1% and 3.6% in recent years in H and J cows, respectively. Most dead cows (∼36%) left the herd before 120 d of lactation, while cows that were slaughtered left the herd toward the end of the lactation. Using the LM, heritability (h2) estimates for MR were lower (1%) than those for SR (2%–3.5%). When h2 were estimated using a TM, the estimates for both traits varied between 4% and 20%, suggesting that the difference in incidence level is one of the reasons for the difference in the h2 values between MR and SR. Early test-day milk yield (MY) and 305-d MY (305-d MY) have unfavorable genetic correlations (0.32–0.41) with MR in both breeds. The genetic correlations of calving interval with MR were stronger (0.54–0.68) than with SR (0.28–0.45) suggesting that poor fertility can serve as an early indicator of poor cow health that may lead to increased risk of death. High early test-day somatic cell count is genetically associated with increased likelihood of slaughter (0.24–0.46), but not with increased likelihood of death. In H, 305-d protein yield (PY) had the strongest genetic correlation (−0.34 to −0.40) with SR whereas in J, both 305-d PY and fat yield showed high genetic (−0.64 to −0.70) and moderate environmental (−0.35 to −0.37) correlations with SR. The genetic correlation of removal from the herd due to death and slaughter was negative (−0.3) in J and zero in H. Strong selection for improved fertility and survival and less selection emphasis for MY, has led to an improvement in the genetic trend for cow MR in H and the trend in J has stabilized. Although genetic evaluations for cow MR are feasible, the reliabilities of the EBV are low and the level of cow MR in Australia are relatively low compared with similar countries. Therefore, genetic evaluation for survival based on mortality and slaughter data could be sufficient in the current selection circumstances where breeding objectives are broadly defined. Nevertheless, all Australian farmers should be encouraged to continue recording mortality and slaughter data for monitoring of the trends and for future development of genetic evaluations.