The nature and scale of recombination rate variation are largely unknown for most species. In humans, pedigree analysis has documented variation at the chromosomal level, and sperm studies have ...identified specific hotspots in which crossing-over events cluster. To address whether this picture is representative of the genome as a whole, we have developed and validated a method for estimating recombination rates from patterns of genetic variation. From extensive single-nucleotide polymorphism surveys in European and African populations, we find evidence for extreme local rate variation spanning four orders in magnitude, in which 50% of all recombination events take place in less than 10% of the sequence. We demonstrate that recombination hotspots are a ubiquitous feature of the human genome, occurring on average every 200 kilobases or less, but recombination occurs preferentially outside genes.
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use ...approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We compared fine-scale recombination rates at orthologous loci in humans and chimpanzees by analyzing polymorphism data in both species. Strong statistical evidence for hotspots of recombination was ...obtained in both species. Despite ~99% identity at the level of DNA sequence, however, recombination hotspots were found rarely (if at all) at the same positions in the two species, and no correlation was observed in estimates of fine-scale recombination rates. Thus, local patterns of recombination rate have evolved rapidly, in a manner disproportionate to the change in DNA sequence.
Var genes encode the major surface antigen (PfEMP1) of the blood stages of the human malaria parasite Plasmodium falciparum. Differential expression of up to 60 diverse var genes in each parasite ...genome underlies immune evasion. We compared the diversity of the DBLalpha domain of var genes sampled from 30 parasite isolates from a malaria endemic area of Papua New Guinea (PNG) and 59 from widespread geographic origins (global). Overall, we obtained over 8,000 quality-controlled DBLalpha sequences. Within our sampling frame, the global population had a total of 895 distinct DBLalpha "types" and negligible overlap among repertoires. This indicated that var gene diversity on a global scale is so immense that many genomes would need to be sequenced to capture its true extent. In contrast, we found a much lower diversity in PNG of 185 DBLalpha types, with an average of approximately 7% overlap among repertoires. While we identify marked geographic structuring, nearly 40% of types identified in PNG were also found in samples from different countries showing a cosmopolitan distribution for much of the diversity. We also present evidence to suggest that recombination plays a key role in maintaining the unprecedented levels of polymorphism found in these immune evasion genes. This population genomic framework provides a cost effective molecular epidemiological tool to rapidly explore the geographic diversity of var genes.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genes mutated in congenital malformation syndromes are frequently implicated in oncogenesis, but the causative germline and somatic mutations occur in separate cells at different times of an ...organism's life. Here we unify these processes to a single cellular event for mutations arising in male germ cells that show a paternal age effect. Screening of 30 spermatocytic seminomas for oncogenic mutations in 17 genes identified 2 mutations in FGFR3 (both 1948A>G, encoding K650E, which causes thanatophoric dysplasia in the germline) and 5 mutations in HRAS. Massively parallel sequencing of sperm DNA showed that levels of the FGFR3 mutation increase with paternal age and that the mutation spectrum at the Lys650 codon is similar to that observed in bladder cancer. Most spermatocytic seminomas show increased immunoreactivity for FGFR3 and/or HRAS. We propose that paternal age-effect mutations activate a common 'selfish' pathway supporting proliferation in the testis, leading to diverse phenotypes in the next generation including fetal lethality, congenital syndromes and cancer predisposition.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Approximating the coalescent with recombination McVean, Gilean A.T; Cardin, Niall J
Philosophical transactions of the Royal Society of London. Series B. Biological sciences,
07/2005, Letnik:
360, Številka:
1459
Journal Article
Recenzirano
Odprti dostop
The coalescent with recombination describes the distribution of genealogical histories and resulting patterns of genetic variation in samples of DNA sequences from natural populations. However, using ...the model as the basis for inference is currently severely restricted by the computational challenge of estimating the likelihood. We discuss why the coalescent with recombination is so challenging to work with and explore whether simpler models, under which inference is more tractable, may prove useful for genealogy-based inference. We introduce a simplification of the coalescent process in which coalescence between lineages with no overlapping ancestral material is banned. The resulting process has a simple Markovian structure when generating genealogies sequentially along a sequence, yet has very similar properties to the full model, both in terms of describing patterns of genetic variation and as the basis for statistical inference.
Observed mutation rates in humans appear higher in male than female gametes and often increase with paternal age. This bias, usually attributed to the accumulation of replication errors or ...inefficient repair processes, has been difficult to study directly. Here, we describe a sensitive method to quantify substitutions at nucleotide 755 of the fibroblast growth factor receptor 2 (FGFR2) gene in sperm. Although substitution levels increase with age, we show that even high levels originate from infrequent mutational events. We propose that these FGFR2 mutations, although harmful to embryonic development, are paradoxically enriched because they confer a selective advantage to the spermatogonial cells in which they arise.
Understanding the influences of population structure, selection, and recombination on polymorphism and linkage disequilibrium (LD) is integral to mapping genes contributing to drug resistance or ...virulence in Plasmodium falciparum. The parasite's short generation time, coupled with a high cross-over rate, can cause rapid LD break-down. However, observations of low genetic variation have led to suggestions of effective clonality: selfing, population admixture, and selection may preserve LD in populations. Indeed, extensive LD surrounding drug-resistant genes has been observed, indicating that recombination and selection play important roles in shaping recent parasite genome evolution. These studies, however, provide only limited information about haplotype variation at local scales. Here we describe the first (to our knowledge) chromosome-wide SNP haplotype and population recombination maps for a global collection of malaria parasites, including the 3D7 isolate, whose genome has been sequenced previously. The parasites are clustered according to continental origin, but alternative groupings were obtained using SNPs at 37 putative transporter genes that are potentially under selection. Geographic isolation and highly variable multiple infection rates are the major factors affecting haplotype structure. Variation in effective recombination rates is high, both among populations and along the chromosome, with recombination hotspots conserved among populations at chromosome ends. This study supports the feasibility of genome-wide association studies in some parasite populations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
One goal in sequencing the Plasmodium falciparum genome, the agent of the most lethal form of malaria, is to discover vaccine and drug targets. However, identifying those targets in a genome in which ...∼60% of genes have unknown functions is an enormous challenge. Because the majority of known malaria antigens and drug-resistant genes are highly polymorphic and under various selective pressures, genome-wide analysis for signatures of selection may lead to discovery of new vaccine and drug candidates. Here we surveyed 3,539 P. falciparum genes (∼65% of the predicted genes) for polymorphisms and identified various highly polymorphic loci and genes, some of which encode new antigens that we confirmed using human immune sera. Our collections of genome-wide SNPs (∼65% nonsynonymous) and polymorphic microsatellites and indels provide a high-resolution map (one marker per ∼4 kb) for mapping parasite traits and studying parasite populations. In addition, we report new antigens, providing urgently needed vaccine candidates for disease control.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
The variant call format and VCFtools Danecek, Petr; Auton, Adam; Abecasis, Goncalo ...
Bioinformatics,
08/2011, Letnik:
27, Številka:
15
Journal Article
Recenzirano
Odprti dostop
The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored ...in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Availability:
http://vcftools.sourceforge.net
Contact:
rd@sanger.ac.uk