Our knowledge of ancient human population structure in sub-Saharan Africa, particularly prior to the advent of food production, remains limited. Here we report genome-wide DNA data from four ...children-two of whom were buried approximately 8,000 years ago and two 3,000 years ago-from Shum Laka (Cameroon), one of the earliest known archaeological sites within the probable homeland of the Bantu language group
. One individual carried the deeply divergent Y chromosome haplogroup A00, which today is found almost exclusively in the same region
. However, the genome-wide ancestry profiles of all four individuals are most similar to those of present-day hunter-gatherers from western Central Africa, which implies that populations in western Cameroon today-as well as speakers of Bantu languages from across the continent-are not descended substantially from the population represented by these four people. We infer an Africa-wide phylogeny that features widespread admixture and three prominent radiations, including one that gave rise to at least four major lineages deep in the history of modern humans.
Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a ...comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.
Large data sets on human genetic variation have been collected recently, but their usefulness for learning about history and natural selection has been limited by biases in the ways polymorphisms ...were chosen. We report large subsets of SNPs from the International HapMap Project that allow us to overcome these biases and to provide accurate measurement of a quantity of crucial importance for understanding genetic variation: the allele frequency spectrum. Our analysis shows that East Asian and northern European ancestors shared the same population bottleneck expanding out of Africa but that both also experienced more recent genetic drift, which was greater in East Asians.
Inherited lung cancer risk, particularly in nonsmokers, is poorly understood. Genomic and ancestry analysis of 1,153 lung cancers from Latin America revealed striking associations between Native ...American ancestry and their somatic landscape, including tumor mutational burden, and specific driver mutations in
, and
. A local Native American ancestry risk score was more strongly correlated with
mutation frequency compared with global ancestry correlation, suggesting that germline genetics (rather than environmental exposure) underlie these disparities. SIGNIFICANCE: The frequency of somatic
and
mutations in lung cancer varies by ethnicity, but we do not understand why. Our study suggests that the variation in
and
mutation frequency is associated with genetic ancestry and suggests further studies to identify germline alleles that underpin this association.
.
.
The genetic divergence time between two species varies substantially across the genome, conveying important information about the timing and process of speciation. Here we develop a framework for ...studying this variation and apply it to about 20 million base pairs of aligned sequence from humans, chimpanzees, gorillas and more distantly related primates. Human-chimpanzee genetic divergence varies from less than 84% to more than 147% of the average, a range of more than 4 million years. Our analysis also shows that human-chimpanzee speciation occurred less than 6.3 million years ago and probably more recently, conflicting with some interpretations of ancient fossils. Most strikingly, chromosome X shows an extremely young genetic divergence time, close to the genome minimum along nearly its entire length. These unexpected features would be explained if the human and chimpanzee lineages initially diverged, then later exchanged genes before separating permanently.
Identifying the ancestry of chromosomal segments of distinct ancestry has a wide range of applications from disease mapping to learning about history. Most methods require the use of unlinked ...markers; but, using all markers from genome-wide scanning arrays, it should in principle be possible to infer the ancestry of even very small segments with exquisite accuracy. We describe a method, HAPMIX, which employs an explicit population genetic model to perform such local ancestry inference based on fine-scale variation data. We show that HAPMIX outperforms other methods, and we explore its utility for inferring ancestry, learning about ancestral populations, and inferring dates of admixture. We validate the method empirically by applying it to populations that have experienced recent and ancient admixture: 935 African Americans from the United States and 29 Mozabites from North Africa. HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone.
Previous genetic studies have suggested a history of sub-Saharan African gene flow into some West Eurasian populations after the initial dispersal out of Africa that occurred at least 45,000 years ...ago. However, there has been no accurate characterization of the proportion of mixture, or of its date. We analyze genome-wide polymorphism data from about 40 West Eurasian groups to show that almost all Southern Europeans have inherited 1%-3% African ancestry with an average mixture date of around 55 generations ago, consistent with North African gene flow at the end of the Roman Empire and subsequent Arab migrations. Levantine groups harbor 4%-15% African ancestry with an average mixture date of about 32 generations ago, consistent with close political, economic, and cultural links with Egypt in the late middle ages. We also detect 3%-5% sub-Saharan African ancestry in all eight of the diverse Jewish populations that we analyzed. For the Jewish admixture, we obtain an average estimated date of about 72 generations. This may reflect descent of these groups from a common ancestral population that already had some African ancestry prior to the Jewish Diasporas.
More than two hundred papers have reported genome-wide data from ancient humans. While the raw data for the vast majority are fully publicly available testifying to the commitment of the ...paleogenomics community to open data, formats for both raw data and meta-data differ. There is thus a need for uniform curation and a centralized, version-controlled compendium that researchers can download, analyze, and reference. Since 2019, we have been maintaining the Allen Ancient DNA Resource (AADR), which aims to provide an up-to-date, curated version of the world's published ancient human DNA data, represented at more than a million single nucleotide polymorphisms (SNPs) at which almost all ancient individuals have been assayed. The AADR has gone through six public releases at the time of writing and review of this manuscript, and crossed the threshold of >10,000 individuals with published genome-wide ancient DNA data at the end of 2022. This note is intended as a citable descriptor of the AADR.
Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely ...low-coverage sequencing (0.1-0.5×) captures almost as much of the common (>5%) and low-frequency (1-5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r(2) of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.
This paper examines how ancient DNA data can enhance radiocarbon dating. Because there is a limit to the number of years that can separate the dates of death of related individuals, the ability to ...identify relatives through ancient DNA analysis can serve as a constraint on radiocarbon date range estimates. To determine the number of years that can separate related individuals, we modeled maximums derived from biological extremes of human reproduction and death ages and compiled data from historic and genealogical death records. We used these data to jointly study the date ranges of a global dataset of individuals that have been radiocarbon dated and for which ancient DNA analysis identified at least one relative. We found that many of these individuals could have their date uncertainties reduced by building in date of death separation constraints. We examined possible reasons for date discrepancies of related individuals, such as dating of different skeletal elements or wiggles in the radiocarbon curve. We also developed a program, refinedate, which researchers can download and use to help refine the radiocarbon date distributions of related individuals. Our research demonstrates that when combined, radiocarbon dating and ancient DNA analysis can provide a refined and richer view of the past.
•Ancient DNA allows archaeologists to identify individuals that are biologically related.•There are biological limitations to the number of years that can separate the death dates of related individuals.•We examined how biological relatedness can constrain C14 date ranges with OxCal and a program we developed, refinedate.•We examined if applying constraints to a large dataset can reveal dating issues in the archaeological record.