Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map ...successfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele. In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp-reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudo-haploid data, i.e. they randomly sample only one sequencing read per site. We show that reference bias is pervasive in published ancient DNA sequence data of prehistoric humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Most genomic regions we investigated show little to no mapping bias but even a small proportion of sites with bias can impact analyses of those particular loci or slightly skew genome-wide estimates. Therefore, reference bias has the potential to cause minor but significant differences in the results of downstream analyses such as population allele sharing, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially.
Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect ...knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of "standardized allele frequencies" that allows investigators to apply tests of their choice to multiple populations while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to detect nonparametric correlations with environmental variables; these correlations are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST and is shown to be more powerful, as we account for population history. We also extend the model to next-generation sequencing of population pools-a cost-efficient way to estimate population allele frequencies, but one that introduces an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by reanalyzing human SNP data from the Human Genome Diversity Panel populations and pooled next-generation sequencing data from Atlantic herring. An implementation of our method is available from http://gcbias.org.
Archaeogenomic research has proven to be a valuable tool to trace migrations of historic and prehistoric individuals and groups, whereas relationships within a group or burial site have not been ...investigated to a large extent. Knowing the genetic kinship of historic and prehistoric individuals would give important insights into social structures of ancient and historic cultures. Most archaeogenetic research concerning kinship has been restricted to uniparental markers, while studies using genome-wide information were mainly focused on comparisons between populations. Applications which infer the degree of relationship based on modern-day DNA information typically require diploid genotype data. Low concentration of endogenous DNA, fragmentation and other post-mortem damage to ancient DNA (aDNA) makes the application of such tools unfeasible for most archaeological samples. To infer family relationships for degraded samples, we developed the software READ (Relationship Estimation from Ancient DNA). We show that our heuristic approach can successfully infer up to second degree relationships with as little as 0.1x shotgun coverage per genome for pairs of individuals. We uncover previously unknown relationships among prehistoric individuals by applying READ to published aDNA data from several human remains excavated from different cultural contexts. In particular, we find a group of five closely related males from the same Corded Ware culture site in modern-day Germany, suggesting patrilocality, which highlights the possibility to uncover social structures of ancient populations by applying READ to genome-wide aDNA data. READ is publicly available from https://bitbucket.org/tguenther/read.
Southern Africa is consistently placed as a potential region for the evolution of Homo sapiens. We present genome sequences, up to 13x coverage, from seven ancient individuals from KwaZulu-Natal, ...South Africa. The remains of three Stone Age hunter-gatherers (about 2000 years old) were genetically similar to current-day southern San groups, and those of four Iron Age farmers (300 to 500 years old) were genetically similar to present-day Bantu-language speakers. We estimate that all modern-day Khoe-San groups have been influenced by 9 to 30% genetic admixture from East Africans/Eurasians. Using traditional and new approaches, we estimate the first modern human population divergence time to between 350,000 and 260,000 years ago. This estimate increases the deepest divergence among modern humans, coinciding with anatomical developments of archaic humans into modern humans, as represented in the local fossil record.
Prehistoric population structure associated with the transition to an agricultural lifestyle in Europe remains a contentious idea. Population-genomic data from 11 Scandinavian Stone Age human remains ...suggest that hunter-gatherers had lower genetic diversity than that of farmers. Despite their close geographical proximity, the genetic differentiation between the two Stone Age groups was greater than that observed among extant European populations. Additionally, the Scandinavian Neolithic farmers exhibited a greater degree of hunter-gatherer–related admixture than that of the Tyrolean Iceman, who also originated from a farming context. In contrast, Scandinavian hunter-gatherers displayed no significant evidence of introgression from farmers. Our findings suggest that Stone Age foraging groups were historically in low numbers, likely owing to oscillating living conditions or restricted carrying capacity, and that they were partially incorporated into expanding farming groups.
The consequences of the Neolithic transition in Europe—one of the most important cultural changes in human prehistory—is a subject of great interest. However, its effect on prehistoric and modernday ...people in Iberia, the westernmost frontier of the European continent, remains unresolved. We present, to our knowledge, the first genome-wide sequence data from eight human remains, dated to between 5,500 and 3,500 years before present, excavated in the El Portalón cave at Sierra de Atapuerca, Spain. We show that these individuals emerged from the same ancestral gene pool as early farmers in other parts of Europe, suggesting that migration was the dominant mode of transferring farming practices throughout western Eurasia. In contrast to central and northern early European farmers, the Chalcolithic El Portalón individuals additionally mixed with local southwestern hunter–gatherers. The proportion of hunter–gatherer-related admixture into early farmers also increased over the course of two millennia. The Chalcolithic El Portalón individuals showed greatest genetic affinity to modern-day Basques, who have long been considered linguistic and genetic isolates linked to the Mesolithic whereas all other European early farmers show greater genetic similarity to modern-day Sardinians. These genetic links suggest that Basques and their language may be linked with the spread of agriculture during the Neolithic. Furthermore, all modern-day Iberian groups except the Basques display distinct admixture with Caucasus/Central Asian and North African groups, possibly related to historical migration events. The El Portalón genomes uncover important pieces of the demographic history of Iberia and Europe and reveal how prehistoric groups relate to modern-day people.
The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a ...1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations.
Altitudinal gradients in mountain regions are short‐range clines of different environmental parameters such as temperature or radiation. We investigated genomic and phenotypic signatures of ...adaptation to such gradients in five Arabidopsis thaliana populations from the North Italian Alps that originated from 580 to 2350 m altitude by resequencing pools of 19–29 individuals from each population. The sample includes two pairs of low‐ and high‐altitude populations from two different valleys. High‐altitude populations showed a lower nucleotide diversity and negative Tajima's D values and were more closely related to each other than to low‐altitude populations from the same valley. Despite their close geographic proximity, demographic analysis revealed that low‐ and high‐altitude populations split between 260 000 and 15 000 years before present. Single nucleotide polymorphisms whose allele frequencies were highly differentiated between low‐ and high‐altitude populations identified genomic regions of up to 50 kb length where patterns of genetic diversity are consistent with signatures of local selective sweeps. These regions harbour multiple genes involved in stress response. Variation among populations in two putative adaptive phenotypic traits, frost tolerance and response to light/UV stress was not correlated with altitude. Taken together, the spatial distribution of genetic diversity reflects a potentially adaptive differentiation between low‐ and high‐altitude populations, whereas the phenotypic differentiation in the two traits investigated does not. It may resemble an interaction between adaptation to the local microhabitat and demographic history influenced by historical glaciation cycles, recent seed dispersal and genetic drift in local populations.
Population genomic studies of ancient human remains have shown how modern-day European population structure has been shaped by a number of prehistoric migrations. The Neolithization of Europe has ...been associated with large-scale migrations from Anatolia, which was followed by migrations of herders from the Pontic steppe at the onset of the Bronze Age. Southwestern Europe was one of the last parts of the continent reached by these migrations, and modern-day populations from this region show intriguing similarities to the initial Neolithic migrants. Partly due to climatic conditions that are unfavorable for DNA preservation, regional studies on the Mediterranean remain challenging. Here, we present genome-wide sequence data from 13 individuals combined with stable isotope analysis from the north and south of Iberia covering a four-millennial temporal transect (7,500–3,500 BP). Early Iberian farmers and Early Central European farmers exhibit significant genetic differences, suggesting two independent fronts of the Neolithic expansion. The first Neolithic migrants that arrived in Iberia had low levels of genetic diversity, potentially reflecting a small number of individuals; this diversity gradually increased over time from mixing with local hunter-gatherers and potential population expansion. The impact of post-Neolithic migrations on Iberia was much smaller than for the rest of the continent, showing little external influence from the Neolithic to the Bronze Age. Paleodietary reconstruction shows that these populations have a remarkable degree of dietary homogeneity across space and time, suggesting a strong reliance on terrestrial food resources despite changing culture and genetic make-up.
There are longstanding questions about the origins and ancestry of the Picts of early medieval Scotland (ca. 300-900 CE), prompted in part by exotic medieval origin myths, their enigmatic symbols and ...inscriptions, and the meagre textual evidence. The Picts, first mentioned in the late 3rd century CE resisted the Romans and went on to form a powerful kingdom that ruled over a large territory in northern Britain. In the 9th and 10th centuries Gaelic language, culture and identity became dominant, transforming the Pictish realm into Alba, the precursor to the medieval kingdom of Scotland. To date, no comprehensive analysis of Pictish genomes has been published, and questions about their biological relationships to other cultural groups living in Britain remain unanswered. Here we present two high-quality Pictish genomes (2.4 and 16.5X coverage) from central and northern Scotland dated from the 5th-7th century which we impute and co-analyse with >8,300 previously published ancient and modern genomes. Using allele frequency and haplotype-based approaches, we can firmly place the genomes within the Iron Age gene pool in Britain and demonstrate regional biological affinity. We also demonstrate the presence of population structure within Pictish groups, with Orcadian Picts being genetically distinct from their mainland contemporaries. When investigating Identity-By-Descent (IBD) with present-day genomes, we observe broad affinities between the mainland Pictish genomes and the present-day people living in western Scotland, Wales, Northern Ireland and Northumbria, but less with the rest of England, the Orkney islands and eastern Scotland-where the political centres of Pictland were located. The pre-Viking Age Orcadian Picts evidence a high degree of IBD sharing across modern Scotland, Wales, Northern Ireland, and the Orkney islands, demonstrating substantial genetic continuity in Orkney for the last ~2,000 years. Analysis of mitochondrial DNA diversity at the Pictish cemetery of Lundin Links (n = 7) reveals absence of direct common female ancestors, with implications for broader social organisation. Overall, our study provides novel insights into the genetic affinities and population structure of the Picts and direct relationships between ancient and present-day groups of the UK.