Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, ...revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples.
Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ( ) and overhangs ( ). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols.
mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/).
Supplementary data are available at Bioinformatics online.
Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also ...show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies.
We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41-197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology.
http://www2.imm.dtu.dk/∼gigu/Spasiba/
gilles.b.guillot@gmail.com
Supplementary data are available at Bioinformatics online.
Micro-RNAs (miRNAs) are now recognized as a major class of developmental regulators. Sequences of many miRNAs are highly conserved, yet they often exhibit temporal and spatial heterogeneity in ...expression among species and have been proposed as an important reservoir for adaptive evolution and divergence. With this in mind we studied miRNA expression during embryonic development of offspring from two contrasting morphs of the highly polymorphic salmonid Arctic charr (Salvelinus alpinus), a small benthic morph from Lake Thingvallavatn (SB) and an aquaculture stock (AC). These morphs differ extensively in morphology and adult body size. We established offspring groups of the two morphs and sampled at several time points during development. Four time points (3 embryonic and one just before first feeding) were selected for high-throughput small-RNA sequencing. We identified a total of 326 conserved and 427 novel miRNA candidates in Arctic charr, of which 51 conserved and 6 novel miRNA candidates were differentially expressed among developmental stages. Furthermore, 53 known and 19 novel miRNAs showed significantly different levels of expression in the two contrasting morphs. Hierarchical clustering of the 53 conserved miRNAs revealed that the expression differences are confined to the embryonic stages, where miRNAs such as sal-miR-130, 30, 451, 133, 26 and 199a were highly expressed in AC, whereas sal-miR-146, 183, 206 and 196a were highly expressed in SB embryos. The majority of these miRNAs have previously been found to be involved in key developmental processes in other species such as development of brain and sensory epithelia, skeletogenesis and myogenesis. Four of the novel miRNA candidates were only detected in either AC or SB. miRNA candidates identified in this study will be combined with available mRNA expression data to identify potential targets and involvement in developmental regulation.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Next-generation sequencing technologies have revolutionized the field of paleogenomics, allowing the reconstruction of complete ancient genomes and their comparison with modern references. However, ...this requires the processing of vast amounts of data and involves a large number of steps that use a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next-generation sequencing reads, PALEOMIX carries out adapter removal, mapping against reference genomes, PCR duplicate removal, characterization of and compensation for postmortem damage, SNP calling and maximum-likelihood phylogenomic inference, and it profiles the metagenomic contents of the samples. As such, PALEOMIX allows for a series of potential applications in paleogenomics, comparative genomics and metagenomics. Applying the PALEOMIX pipeline to the three ancient and seven modern Phytophthora infestans genomes as described here takes 5 d using a 16-core server.
We examined the rate and nature of mitochondrial DNA (mtDNA) mutations in humans using sequence data from 64,806 contemporary Icelanders from 2,548 matrilines. Based on 116,663 mother-child ...transmissions, 8,199 mutations were detected, providing robust rate estimates by nucleotide type, functional impact, position, and different alleles at the same position. We thoroughly document the true extent of hypermutability in mtDNA, mainly affecting the control region but also some coding-region variants. The results reveal the impact of negative selection on viable deleterious mutations, including rapidly mutating disease-associated 3243A>G and 1555A>G and pre-natal selection that most likely occurs during the development of oocytes. Finally, we show that the fate of new mutations is determined by a drastic germline bottleneck, amounting to an average of 3 mtDNA units effectively transmitted from mother to child.
Display omitted
•Detection of 8,199 mutations in 116,663 mother-child mtDNA transmissions•Position and allele-specific mutation rates reveal asymmetric hypermutability•Evidence for both pre- and post-natal selection against mtDNA variants•Children inherit effectively only ∼3 units of mtDNA from their mothers
This large-scale pedigree study of human mtDNA mutations reveals substantial selection against deleterious variants both before and after birth, characterizes extensive differences in mutability by position and allele, and shows that children only inherit around 3 units of mtDNA from their mothers.
Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is ...a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.
Kennewick Man, referred to as the Ancient One by Native Americans, is a male human skeleton discovered in Washington state (USA) in 1996 and initially radiocarbon dated to 8,340-9,200 calibrated ...years before present (BP). His population affinities have been the subject of scientific debate and legal controversy. Based on an initial study of cranial morphology it was asserted that Kennewick Man was neither Native American nor closely related to the claimant Plateau tribes of the Pacific Northwest, who claimed ancestral relationship and requested repatriation under the Native American Graves Protection and Repatriation Act (NAGPRA). The morphological analysis was important to judicial decisions that Kennewick Man was not Native American and that therefore NAGPRA did not apply. Instead of repatriation, additional studies of the remains were permitted. Subsequent craniometric analysis affirmed Kennewick Man to be more closely related to circumpacific groups such as the Ainu and Polynesians than he is to modern Native Americans. In order to resolve Kennewick Man's ancestry and affiliations, we have sequenced his genome to ∼1× coverage and compared it to worldwide genomic data including for the Ainu and Polynesians. We find that Kennewick Man is closer to modern Native Americans than to any other population worldwide. Among the Native American groups for whom genome-wide data are available for comparison, several seem to be descended from a population closely related to that of Kennewick Man, including the Confederated Tribes of the Colville Reservation (Colville), one of the five tribes claiming Kennewick Man. We revisit the cranial analyses and find that, as opposed to genome-wide comparisons, it is not possible on that basis to affiliate Kennewick Man to specific contemporary groups. We therefore conclude based on genetic comparisons that Kennewick Man shows continuity with Native North Americans over at least the last eight millennia.
De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause ...recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.
Significance Thirty years after the first DNA fragment from the extinct quagga zebra was sequenced, we set another milestone in equine genomics by sequencing its entire genome, along with the genomes ...of the surviving equine species. This extensive dataset allows us to decipher the genetic makeup underlying lineage-specific adaptations and reveal the complex history of equine speciation. We find that Equus first diverged in the New World, spread across the Old World 2.1–3.4 Mya, and finally experienced major demographic expansions and collapses coinciding with past climate changes. Strikingly, we find multiple instances of hybridization throughout the equine tree, despite extremely divergent chromosomal structures. This contrasts with theories promoting chromosomal incompatibilities as drivers for the origin of equine species.
Horses, asses, and zebras belong to a single genus, Equus , which emerged 4.0–4.5 Mya. Although the equine fossil record represents a textbook example of evolution, the succession of events that gave rise to the diversity of species existing today remains unclear. Here we present six genomes from each living species of asses and zebras. This completes the set of genomes available for all extant species in the genus, which was hitherto represented only by the horse and the domestic donkey. In addition, we used a museum specimen to characterize the genome of the quagga zebra, which was driven to extinction in the early 1900s. We scan the genomes for lineage-specific adaptations and identify 48 genes that have evolved under positive selection and are involved in olfaction, immune response, development, locomotion, and behavior. Our extensive genome dataset reveals a highly dynamic demographic history with synchronous expansions and collapses on different continents during the last 400 ky after major climatic events. We show that the earliest speciation occurred with gene flow in Northern America, and that the ancestor of present-day asses and zebras dispersed into the Old World 2.1–3.4 Mya. Strikingly, we also find evidence for gene flow involving three contemporary equine species despite chromosomal numbers varying from 16 pairs to 31 pairs. These findings challenge the claim that the accumulation of chromosomal rearrangements drive complete reproductive isolation, and promote equids as a fundamental model for understanding the interplay between chromosomal structure, gene flow, and, ultimately, speciation.