The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded ...by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.
Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods ...for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.
Objectives
Adult age at death estimation continues to challenge physical anthropologists. One estimation method involves counting tooth cementum annulations (TCA). Non‐destructively accessing TCA is ...a critical step to approaching fossil teeth of unknown age and to verifying life history profiles of human ancestors. This pilot study aims to (a) non‐destructively image TCA in teeth from a known age archeological human population by propagation phase contrast X‐ray synchrotron μCT (PPC‐SR‐μCT) (b) test the correlation between real and estimated ages, and the accuracy, precision and bias of age estimates.
Materials and Methods
We examine 20 permanent human canines (aged 20–81 years), from a 18th to 19th century known age collection from St. Luke's Church (London, England). We scanned transverse segments of acellular cementum in the apical portion of the middle root third using PPC‐SR‐μCT. We generated virtual transverse sections on which two observers perform two sessions of blind TCA counts. We calculate the estimated ages at death by adding 10 years to the TCA counts.
Results
A moderately strong positive linear relationship exists between real and estimated ages (r = 0.76, p < .001), with an average inaccuracy of 16.1 years and an average bias towards underestimation of 15.7 years. This difference is lower in individuals <50 years (6.8 and 6.5 years, respectively, n = 10) compared with those >50 years (24.9 years, n = 10).
Discussion
We reliably imaged and identified TCA in individuals <50 years from a known‐age archeological sample. Scanning refinement will yield a promising alternative to current destructive methods of TCA analyses and to aid access to life history events in adult fossil hominins.
Whole genome sequencing (WGS) of Plasmodium vivax is problematic due to the reliance on clinical isolates which are generally low in parasitaemia and sample volume. Furthermore, clinical isolates ...contain a significant contaminating background of host DNA which confounds efforts to map short read sequence of the target P. vivax DNA. Here, we discuss a methodology to significantly improve the success of P. vivax WGS on natural (non-adapted) patient isolates. Using 37 patient isolates from Indonesia, Thailand, and travellers, we assessed the application of CF11-based white blood cell filtration alone and in combination with short term ex vivo schizont maturation. Although CF11 filtration reduced human DNA contamination in 8 Indonesian isolates tested, additional short-term culture increased the P. vivax DNA yield from a median of 0.15 to 6.2 ng µl(-1) packed red blood cells (pRBCs) (p = 0.001) and reduced the human DNA percentage from a median of 33.9% to 6.22% (p = 0.008). Furthermore, post-CF11 and culture samples from Thailand gave a median P. vivax DNA yield of 2.34 ng µl(-1) pRBCs, and 2.65% human DNA. In 22 P. vivax patient isolates prepared with the 2-step method, we demonstrate high depth (median 654X coverage) and breadth (≥89%) of coverage on the Illumina GAII and HiSeq platforms. In contrast to the A+T-rich P. falciparum genome, negligible bias was observed in coverage depth between coding and non-coding regions of the P. vivax genome. This uniform coverage will greatly facilitate the detection of SNPs and copy number variants across the genome, enabling unbiased exploration of the natural diversity in P. vivax populations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Phylogenetic profiling encompasses an important set of methodologies for in silico high throughput inference of functional relationships between genes. The simplest profiles represent the ...distribution of gene presence-absence in a set of species as a sequence of 0's and 1's, and it is assumed that functionally related genes will have more similar profiles. The methodology has been successfully used in numerous studies of prokaryotic genomes, although its application in eukaryotes appears problematic, with reported low accuracy due to the complex genomic organization within this domain of life. Recently some groups have proposed an alternative approach based on the correlation of homologous gene group sizes, taking into account all potentially informative genetic events leading to a change in group size, regardless of whether they result in a de novo group gain or total gene group loss.
We have compared the performance of classical presence-absence and group size based approaches using a large, diverse set of eukaryotic species. In contrast to most previous comparisons in Eukarya, we take into account the species phylogeny. We also compare the approaches using two different group categories, based on orthology and on domain-sharing. Our results confirm a limited overall performance of phylogenetic profiling in eukaryotes. Although group size based approaches initially showed an increase in performance for the domain-sharing based groups, this seems to be an overestimation due to a simplistic negative control dataset and the choice of null hypothesis rejection criteria.
Presence-absence profiling represents a more accurate classifier of related versus non-related profile pairs, when the profiles under consideration have enough information content. Group size based approaches provide a complementary means of detecting domain or family level co-evolution between groups that may be elusive to presence-absence profiling. Moreover positive correlation between co-evolution scores and functional links imply that these methods could be used to estimate functional distances between gene groups and to cluster them based on their functional relatedness. This study should have important implications for the future development and application of phylogenetic profiling methods, not only in eukaryotic, but also in prokaryotic datasets.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
The presence of somatic copy-number alterations in tumor genomes can be used to predict both patient sensitivity to treatments as well as outcomes. The inclusion of allelic data improves ...statistical power to detect copy-number events and allows for discovery of copy-neutral events. We present GATK ACNV, an allelic copy-number variation method built on the Genome Analysis Toolkit. ACNV is a tool for detecting somatic copy-number activity from whole exome and whole genome sequencing data by segmenting the genome into regions of constant copy number and estimating copy ratio and minor-allele fraction in those regions.
ACNV uses a novel probabilistic model to account for reference bias (optionally using a panel of normals), which improves the estimation of minor-allele fraction. We combine this with the coverage model from GATK CNV by segmenting with a unified hidden Markov model, improving the statistical power to detect copy-number variation.
We validate ACNV using a purity series of the cell line HCC1143 and cancer samples from The Cancer Genome Atlas. Our results show that ACNV is able to discover regions of somatic copy-number activity accurately and with high resolution in both whole exome and whole genome sequencing data.
Citation Format: Aaron Chevalier, Lee Lichtenstein, Andrey Smirnov, Samuel K. Lee, Mehrtash Babidi, David I. Benjamin, Valentin Ruano-Rubio. GATK ACNV: allelic copy-number variation discovery from SNPs and coverage data abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3581. doi:10.1158/1538-7445.AM2017-3581
Abstract
We propose and evaluate a novel algorithm for inferring germline and somatic copy number variation from whole exome sequencing (WES) and whole genome sequencing (WGS) data. Starting with the ...depth of aligned short reads from a cohort of samples, we use a Bayesian model for learning sequencing bias and simultaneously detecting CNV events using a hidden Markov model for change-point detection. A unified framework is used to call both germline and somatic CNVs. Denoising and event discovery are performed self-consistently to achieve maximum accuracy. In contrast to previous methods, our model naturally accounts for mixed sex cohorts and can detect events on sex chromosomes. Furthermore, we can detect excessively noisy samples and extract useful information within a probabilistic framework. Our implementation can also utilize Spark clusters, enabling the processing of larger cohorts and allowing for improved runtime performance.
We benchmark the new method for precision, recall, and reproducibility of both germline and somatic variants. Evaluations are performed on a cohort of WES samples from The Cancer Genome Atlas with matching WGS data. For germline variants, we use blood normal samples and compare our calls on WES data against Genome STRiP calls on WGS data. We find that GATK CNV yields remarkably higher precision and recall compared to XHMM and CODEX software packages. For somatic variants, we compare our calls against TITAN and find a remarkably high concordance.
Citation Format: Mehrtash Babadi, David I. Benjamin, Samuel K. Lee, Andrey Smirnov, Aaron Chevalier, Lee Lichtenstein, Valentin Ruano Rubio. GATK CNV: copy-number variation discovery from coverage data abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3580. doi:10.1158/1538-7445.AM2017-3580
The evolutionary transition from homo-oligomerism to hetero-oligomerism in multimeric proteins and its contribution to function innovation and organism complexity remain to be investigated. Here, we ...undertake the challenge of contributing to this theoretical ground by investigating the hetero-oligomerism in the molecular chaperonin cytosolic chaperonin containing tailless complex polypeptide 1 (CCT) from archaea. CCT is amenable to this study because, in contrast to eukaryotic CCTs where sub-functionalization after gene duplication has been taken to completion, archaeal CCTs present no evidence for subunit functional specialization. Our analyses yield additional information to previous reports on archaeal CCT paralogy by identifying new duplication events. Analyses of selective constraints show that amino acid sites from 1 subunit have fixed slightly deleterious mutations at inter-subunit interfaces after gene duplication. These mutations have been followed by compensatory mutations in nearby regions of the same subunit and in the interface contact regions of its paralogous subunit. The strong selective constraints in these regions after speciation support the evolutionary entrapment of CCTs as hetero-oligomers. In addition, our results unveil different evolutionary dynamics depending on the degree of CCT hetero-oligomerism. Archaeal CCT protein complexes comprising 3 distinct classes of subunits present 2 evolutionary processes. First, slightly deleterious and compensatory mutations were fixed neutrally at inter-subunit regions. Second, sub-functionalization may have occurred at substrate-binding and adenosine triphosphate-binding regions after the 2nd gene duplication event took place. CCTs with 2 distinct types of subunits did not present evidence of sub-functionalization. Our results provide the 1st in silico evidence for the neutral fixation of hetero-oligomerism in archaeal CCTs and provide information on the evolution of hetero-oligomerism toward sub-functionalization in archaeal CCTs.