Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced ...a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data ...to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Since its introduction in 2016, the msprime simulator has grown in popularity and is now one of the most commonly used tools in population genetics. This article marks the 1.0 release of msprime and summarizes the many features it has accumulated through an open source community development model. Despite its generality, msprime’s performance is excellent—in many cases orders of magnitude faster and more memory efficient than more specialized methods.
The extent of the devastation of the Black Death pandemic (1346-1353) on European populations is known from documentary sources and its bacterial source illuminated by studies of ancient pathogen ...DNA. What has remained less understood is the effect of the pandemic on human mobility and genetic diversity at the local scale. Here, we report 275 ancient genomes, including 109 with coverage >0.1×, from later medieval and postmedieval Cambridgeshire of individuals buried before and after the Black Death. Consistent with the function of the institutions, we found a lack of close relatives among the friars and the inmates of the hospital in contrast to their abundance in general urban and rural parish communities. While we detect long-term shifts in local genetic ancestry in Cambridgeshire, we find no evidence of major changes in genetic ancestry nor higher differentiation of immune loci between cohorts living before and after the Black Death.