The increased transcription of the Cyp6g1 gene of Drosophila melanogaster, and consequent resistance to insecticides such as DDT, is a widely cited example of adaptation mediated by cis-regulatory ...change. A fragment of an Accord transposable element inserted upstream of the Cyp6g1 gene is causally associated with resistance and has spread to high frequencies in populations around the world since the 1940s. Here we report the existence of a natural allelic series at this locus of D. melanogaster, involving copy number variation of Cyp6g1, and two additional transposable element insertions (a P and an HMS-Beagle). We provide evidence that this genetic variation underpins phenotypic variation, as the more derived the allele, the greater the level of DDT resistance. Tracking the spatial and temporal patterns of allele frequency changes indicates that the multiple steps of the allelic series are adaptive. Further, a DDT association study shows that the most resistant allele, Cyp6g1-BP, is greatly enriched in the top 5% of the phenotypic distribution and accounts for approximately 16% of the underlying phenotypic variation in resistance to DDT. In contrast, copy number variation for another candidate resistance gene, Cyp12d1, is not associated with resistance. Thus the Cyp6g1 locus is a major contributor to DDT resistance in field populations, and evolution at this locus features multiple adaptive steps occurring in rapid succession.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Prediction of genetic merit using dense SNP genotypes can be used for estimation of breeding values for selection of livestock, crops, and forage species; for prediction of disease risk; and for ...forensics. The accuracy of these genomic predictions depends in part on the genetic architecture of the trait, in particular number of loci affecting the trait and distribution of their effects. Here we investigate the difference among three traits in distribution of effects and the consequences for the accuracy of genomic predictions. Proportion of black coat colour in Holstein cattle was used as one model complex trait. Three loci, KIT, MITF, and a locus on chromosome 8, together explain 24% of the variation of proportion of black. However, a surprisingly large number of loci of small effect are necessary to capture the remaining variation. A second trait, fat concentration in milk, had one locus of large effect and a host of loci with very small effects. Both these distributions of effects were in contrast to that for a third trait, an index of scores for a number of aspects of cow confirmation ("overall type"), which had only loci of small effect. The differences in distribution of effects among the three traits were quantified by estimating the distribution of variance explained by chromosome segments containing 50 SNPs. This approach was taken to account for the imperfect linkage disequilibrium between the SNPs and the QTL affecting the traits. We also show that the accuracy of predicting genetic values is higher for traits with a proportion of large effects (proportion black and fat percentage) than for a trait with no loci of large effect (overall type), provided the method of analysis takes advantage of the distribution of loci effects.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) ...prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient matrix and from the realised accuracies of GEBV.
Best linear unbiased prediction with a multi-breed genomic relationship matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient matrix were compared to realised accuracies.
When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained.
Predicting genomic breeding values using a genomic relationship matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.
The genetic architecture of complex traits in cattle includes very large numbers of loci affecting any given trait. Most of these loci have small effects but occasionally there are loci with ...moderate-to-large effects segregating due to recent selection for the mutant allele. Genomic markers capture most but not all of the additive genetic variance for traits, probably because there are causal mutations with low allele frequency and therefore in incomplete linkage disequilibrium with the markers. The prediction of genetic value from genomic markers can achieve high accuracy by using statistical models that include all markers and assuming that marker effects are random variables drawn from a specified prior distribution. Recent effective population size is in the order of 100 within cattle breeds and ≈ 2500 animals with genotypes and phenotypes are sufficient to predict the genetic value of animals with an accuracy of 0.65. Recent effective population size for humans is much larger, in the order of 10,000-15,000, and more than 145,000 records would be required to reach a similar accuracy for people. However, our calculations assume that genomic markers capture all the genetic variance. This may be possible in the future as causal polymorphisms are genotyped using genome sequence data.
Continued production of food in areas predicted to be most affected by climate change, such as dairy farming regions of Australia, will be a major challenge in coming decades. Along with rising ...temperatures and water shortages, scarcity of inputs such as high energy feeds is predicted. With the motivation of selecting cattle adapted to these changing environments, we conducted a genome wide association study to detect DNA markers (single nucleotide polymorphisms) associated with the sensitivity of milk production to environmental conditions. To do this we combined historical milk production and weather records with dense marker genotypes on dairy sires with many daughters milking across a wide range of production environments in Australia. Markers associated with sensitivity of milk production to feeding level and sensitivity of milk production to temperature humidity index on chromosome nine and twenty nine respectively were validated in two independent populations, one a different breed of cattle. As the extent of linkage disequilibrium across cattle breeds is limited, the underlying causative mutations have been mapped to a small genomic interval containing two promising candidate genes. The validated marker panels we have reported here will aid selection for high milk production under anticipated climate change scenarios, for example selection of sires whose daughters will be most productive at low levels of feeding.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Meta-analysis describes a category of statistical methods that aim at combining the results of multiple studies to increase statistical power by exploiting summary statistics. Different industries ...that use genomic prediction do not share their raw data due to logistic or privacy restrictions, which can limit the size of their reference populations and creates a need for a practical meta-analysis method.
We developed a meta-analysis, named MetaGS, that duplicates the results of multi-trait best linear unbiased prediction (mBLUP) analysis without accessing raw data. MetaGS exploits the correlations among different populations to produce more accurate population-specific single nucleotide polymorphism (SNP) effects. The method improves SNP effect estimations for a given population depending on its relations to other populations. MetaGS was tested on milk, fat and protein yield data of Australian Holstein and Jersey cattle and it generated very similar genomic estimated breeding values to those produced using the mBLUP method for all traits in both breeds. One of the major difficulties when combining SNP effects across populations is the use of different variants for the populations, which limits the applications of meta-analysis in practice. We solved this issue by developing a method to impute missing summary statistics without using raw data. Our results showed that imputing summary statistics can be done with high accuracy (r > 0.9) even when more than 70% of the SNPs were missing with a minimal effect on prediction accuracy.
We demonstrated that MetaGS can replace the mBLUP model when raw data cannot be shared, which can lead to more flexible collaborations compared to the single-trait BLUP model.
Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and ...phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision.
To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10
) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk.
Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.
Effective population size (N(e)) determines the amount of genetic variation, genetic drift, and linkage disequilibrium (LD) in populations. Here, we present the first genome-wide estimates of human ...effective population size from LD data. Chromosome-specific effective population size was estimated for all autosomes and the X chromosome from estimated LD between SNP pairs <100 kb apart. We account for variation in recombination rate by using coalescent-based estimates of fine-scale recombination rate from one sample and correlating these with LD in an independent sample. Phase I of the HapMap project produced between 18 and 22 million SNP pairs in samples from four populations: Yoruba from Ibadan (YRI), Nigeria; Japanese from Tokyo (JPT); Han Chinese from Beijing (HCB); and residents from Utah with ancestry from northern and western Europe (CEU). For CEU, JPT, and HCB, the estimate of effective population size, adjusted for SNP ascertainment bias, was approximately 3100, whereas the estimate for the YRI was approximately 7500, consistent with the out-of-Africa theory of ancestral human population expansion and concurrent bottlenecks. We show that the decay in LD over distance between SNPs is consistent with recent population growth. The estimates of N(e) are lower than previously published estimates based on heterozygosity, possibly because they represent one or more bottlenecks in human population size that occurred approximately 10,000 to 200,000 years ago.
Female fertility is an important trait in dairy cattle. Identifying putative causal variants associated with fertility may help to improve the accuracy of genomic prediction of fertility. Combining ...expression data (eQTL) of genes, exons, gene splicing and allele specific expression is a promising approach to fine map QTL to get closer to the causal mutations. Another approach is to identify genomic differences between cows selected for high and low fertility and a selection experiment in New Zealand has created exactly this resource. Our objective was to combine multiple types of expression data, fertility traits and allele frequency in high- (POS) and low-fertility (NEG) cows with a genome-wide association study (GWAS) on calving interval in Australian cows to fine-map QTL associated with fertility in both Australia and New Zealand dairy cattle populations. Variants that were significantly associated with calving interval (CI) were strongly enriched for variants associated with gene, exon, gene splicing and allele-specific expression, indicating that there is substantial overlap between QTL associated with CI and eQTL. We identified 671 genes with significant differential expression between POS and NEG cows, with the largest fold change detected for the CCDC196 gene on chromosome 10. Our results provide numerous candidate genes associated with female fertility in dairy cattle, including GYS2 and TIGAR on chromosome 5 and SYT3 and HSD17B14 on chromosome 18. Multiple QTL regions were located in regions with large numbers of copy number variants (CNV). To identify the causal mutations for these variants, long read sequencing may be useful. Variants that were significantly associated with CI were highly enriched for eQTL. We detected 671 genes that were differentially expressed between POS and NEG cows. Several QTL detected for CI overlapped with eQTL, providing candidate genes for fertility in dairy cattle.
Topological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called ...CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants.
We used genomic synteny to map TADs and CTCF binding motifs from humans, mice, dogs and macaques to the bovine genome. We found that our mapped TADs exhibited the same hallmark properties of those sourced from experimental data, such as housekeeping genes, transfer RNA genes, CTCF binding motifs, short interspersed elements, H3K4me3 and H3K27ac. We showed that runs of genes with the same pattern of allele-specific expression (ASE) (either favouring paternal or maternal allele) were often located in the same TAD or between the same conserved CTCF binding motifs. Analyses of variance showed that when averaged across all bovine tissues tested, TADs explained 14% of ASE variation (standard deviation, SD: 0.056), while CTCF explained 27% (SD: 0.078). Furthermore, we showed that the quantitative trait loci (QTLs) associated with gene expression variation (eQTLs) or ASE variation (aseQTLs), which were identified from mRNA transcripts from 141 lactating cows' white blood and milk cells, were highly enriched at putative bovine CTCF binding motifs. The linearly-furthermost, and most-significant aseQTL and eQTL for each genic target were located within the same TAD as the gene more often than expected (Chi-Squared test P-value < 0.001).
Our results suggest that genomic synteny can be used to functionally annotate conserved transcriptional components, and provides a tool to reduce the search space for causative regulatory variants in the bovine genome.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK