Some present-day humans derive up to ∼5% 1 of their ancestry from archaic Denisovans, an even larger proportion than the ∼2% from Neanderthals 2. We developed methods that can disambiguate the ...locations of segments of Denisovan and Neanderthal ancestry in present-day humans and applied them to 257 high-coverage genomes from 120 diverse populations, among which were 20 individual Oceanians with high Denisovan ancestry 3. In Oceanians, the average size of Denisovan fragments is larger than Neanderthal fragments, implying a more recent average date of Denisovan admixture in the history of these populations (p = 0.00004). We document more Denisovan ancestry in South Asia than is expected based on existing models of history, reflecting a previously undocumented mixture related to archaic humans (p = 0.0013). Denisovan ancestry, just like Neanderthal ancestry, has been deleterious on a modern human genetic background, as reflected by its depletion near genes. Finally, the reduction of both archaic ancestries is especially pronounced on chromosome X and near genes more highly expressed in testes than other tissues (p = 1.2 × 10−7 to 3.2 × 10−7 for Denisovan and 2.2 × 10−3 to 2.9 × 10−3 for Neanderthal ancestry even after controlling for differences in level of selective constraint across gene classes). This suggests that reduced male fertility may be a general feature of mixtures of human populations diverged by >500,000 years.
•Denisovan admixture into modern humans occurred after Neanderthal admixture•There is more Denisovan ancestry in South Asians than expected from current models•Denisovan ancestry has been subject to positive and negative selection after admixture•Male infertility most likely occurred after modern human interbreeding with Denisovans
Sankararaman et al. present a map of Denisovan and Neanderthal ancestry in 120 diverse populations and show that Denisovan admixture post-dated Neanderthal admixture. South Asians have more Denisovan ancestry than expected. There was selection both for and against archaic ancestry. Hybridization with Denisovans was probably associated with reduced male fertility.
Statistical analyses of genomic data from diverse human populations have demonstrated that archaic hominins, such as Neanderthals and Denisovans, interbred or admixed with the ancestors of ...present-day humans. Central to these analyses are methods for inferring archaic ancestry along the genomes of present-day individuals (archaic local ancestry). Methods for archaic local ancestry inference rely on the availability of reference genomes from the ancestral archaic populations for accurate inference. However, several instances of archaic admixture lack reference archaic genomes, making it difficult to characterize these events. We present a statistical method that combines diverse population genetic summary statistics to infer archaic local ancestry without access to an archaic reference genome. We validate the accuracy and robustness of our method in simulations. When applied to genomes of European individuals, our method recovers segments that are substantially enriched for Neanderthal ancestry, even though our method did not have access to any Neanderthal reference genomes.
In a pair of seminal papers, Sewall Wright and Gustave Malécot introduced FST as a measure of structure in natural populations. In the decades that followed, a number of papers provided differing ...definitions, estimation methods, and interpretations beyond Wright's. While this diversity in methods has enabled many studies in genetics, it has also introduced confusion regarding how to estimate FST from available data. Considering this confusion, wide variation in published estimates of FST for pairs of HapMap populations is a cause for concern. These estimates changed-in some cases more than twofold-when comparing estimates from genotyping arrays to those from sequence data. Indeed, changes in FST from sequencing data might be expected due to population genetic factors affecting rare variants. While rare variants do influence the result, we show that this is largely through differences in estimation methods. Correcting for this yields estimates of FST that are much more concordant between sequence and genotype data. These differences relate to three specific issues: (1) estimating FST for a single SNP, (2) combining estimates of FST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Changes in each of these aspects of estimation may result in FST estimates that are highly divergent from one another. Here, we clarify these issues and propose solutions.
Genomic studies have shown that Neanderthals interbred with modern humans, and that non-Africans today are the products of this mixture. The antiquity of Neanderthal gene flow into modern humans ...means that genomic regions that derive from Neanderthals in any one human today are usually less than a hundred kilobases in size. However, Neanderthal haplotypes are also distinctive enough that several studies have been able to detect Neanderthal ancestry at specific loci. We systematically infer Neanderthal haplotypes in the genomes of 1,004 present-day humans. Regions that harbour a high frequency of Neanderthal alleles are enriched for genes affecting keratin filaments, suggesting that Neanderthal alleles may have helped modern humans to adapt to non-African environments. We identify multiple Neanderthal-derived alleles that confer risk for disease, suggesting that Neanderthal alleles continue to shape human biology. An unexpected finding is that regions with reduced Neanderthal ancestry are enriched in genes, implying selection to remove genetic material derived from Neanderthals. Genes that are more highly expressed in testes than in any other tissue are especially reduced in Neanderthal ancestry, and there is an approximately fivefold reduction of Neanderthal ancestry on the X chromosome, which is known from studies of diverse species to be especially dense in male hybrid sterility genes. These results suggest that part of the explanation for genomic regions of reduced Neanderthal ancestry is Neanderthal alleles that caused decreased fertility in males when moved to a modern human genetic background.
Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to ...interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.
To investigate the consequences of hybridization between species, we studied three replicate hybrid populations that formed naturally between two swordtail fish species, estimating their fine-scale ...genetic map and inferring ancestry along the genomes of 690 individuals. In all three populations, ancestry from the "minor" parental species is more common in regions of high recombination and where there is linkage to fewer putative targets of selection. The same patterns are apparent in a reanalysis of human and archaic admixture. These results support models in which ancestry from the minor parental species is more likely to persist when rapidly uncoupled from alleles that are deleterious in hybrids. Our analyses further indicate that selection on swordtail hybrids stems predominantly from deleterious combinations of epistatically interacting alleles.
The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual’s disease risk ...through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.
Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the ...present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic ...variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.
Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those ...problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.