Datamonkey is a popular web-based suite of phylogenetic analysis tools for use in evolutionary biology. Since the original release in 2005, we have expanded the analysis options to include recently ...developed algorithmic methods for recombination detection, evolutionary fingerprinting of genes, codon model selection, co-evolution between sites, identification of sites, which rapidly escape host-immune pressure and HIV-1 subtype assignment. The traditional selection tools have also been augmented to include recent developments in the field. Here, we summarize the analyses options currently available on Datamonkey, and provide guidelines for their use in evolutionary biology. Availability and documentation: http://www.datamonkey.org Contact: spond@ucsd.edu
Adaptive evolution frequently occurs in episodic bursts, localized to a few sites in a gene, and to a small number of lineages in a phylogenetic tree. A popular class of "branch-site" evolutionary ...models provides a statistical framework to search for evidence of such episodic selection. For computational tractability, current branch-site models unrealistically assume that all branches in the tree can be partitioned a priori into two rigid classes-"foreground" branches that are allowed to undergo diversifying selective bursts and "background" branches that are negatively selected or neutral. We demonstrate that this assumption leads to unacceptably high rates of false positives or false negatives when the evolutionary process along background branches strongly deviates from modeling assumptions. To address this problem, we extend Felsenstein's pruning algorithm to allow efficient likelihood computations for models in which variation over branches (and not just sites) is described in the random effects likelihood framework. This enables us to model the process at every branch-site combination as a mixture of three Markov substitution models-our model treats the selective class of every branch at a particular site as an unobserved state that is chosen independently of that at any other branch. When benchmarked on a previously published set of simulated sequences, our method consistently matched or outperformed existing branch-site tests in terms of power and error rates. Using three empirical data sets, previously analyzed for episodic selection, we discuss how modeling assumptions can influence inference in practical situations.
Ecological Speciation in South Atlantic Island Finches Ryan, Peter G; Bloomer, Paulette; Moloney, Coleen L ...
Science (American Association for the Advancement of Science),
03/2007, Letnik:
315, Številka:
5817
Journal Article
Recenzirano
Examples of sympatric speciation in nature are rare and hotly debated. We describe the parallel speciation of finches on two small islands in the Tristan da Cunha archipelago in the South Atlantic ...Ocean. Nesospiza buntings are a classic example of a simple adaptive radiation, with two species on each island: an abundant small-billed dietary generalist and a scarce large-billed specialist. Their morphological diversity closely matches the available spectrum of seed sizes, and genetic evidence suggests that they evolved independently on each island. Speciation is complete on the smaller island, where there is a single habitat with strongly bimodal seed size abundance, but is incomplete on the larger island, where a greater diversity of habitats has resulted in three lineages. Our study suggests that the buntings have undergone parallel ecological speciation.
Host immune responses against infectious pathogens exert strong selective pressures favouring the emergence of escape mutations that prevent immune recognition. Escape mutations within or flanking ...functionally conserved epitopes can occur at a significant cost to the pathogen in terms of its ability to replicate effectively. Such mutations come under selective pressure to revert to the wild type in hosts that do not mount an immune response against the epitope. Amino acid positions exhibiting this pattern of escape and reversion are of interest because they tend to coincide with immune responses that control pathogen replication effectively. We have used a probabilistic model of protein coding sequence evolution to detect sites in HIV-1 exhibiting a pattern of rapid escape and reversion. Our model is designed to detect sites that toggle between a wild type amino acid, which is susceptible to a specific immune response, and amino acids with lower replicative fitness that evade immune recognition. Through simulation, we show that this model has significantly greater power to detect selection involving immune escape and reversion than standard models of diversifying selection, which are sensitive to an overall increased rate of non-synonymous substitution. Applied to alignments of HIV-1 protein coding sequences, the model of immune escape and reversion detects a significantly greater number of adaptively evolving sites in env and nef. In all genes tested, the model provides a significantly better description of adaptively evolving sites than standard models of diversifying selection. Several of the sites detected are corroborated by association between Human Leukocyte Antigen (HLA) and viral sequence polymorphisms. Overall, there is evidence for a large number of sites in HIV-1 evolving under strong selective pressure, but exhibiting low sequence diversity. A phylogenetic model designed to detect rapid toggling between wild type and escape amino acids identifies a larger number of adaptively evolving sites in HIV-1, and can in some cases correctly identify the amino acid that is susceptible to the immune response.
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character ...distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a "corrected" empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.
Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the ...nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.
Although it is known that most HIV-1 infections worldwide result from exposure to virus in semen, it has not yet been established whether transmitted strains originate as RNA virions in seminal ...plasma or as integrated proviral DNA in infected seminal leukocytes. We present phylogenetic evidence that among six transmitting pairs of men who have sex with men, blood plasma virus in the recipient is consistently more closely related to the seminal plasma virus in the source. All sequences were subtype B, and the env C2V3 of transmitted variants tended to have higher mean isoelectric points, contain potential N-linked glycosylation sites, and favor CCR5 co-receptor usage. A statistically robust phylogenetically corrected analysis did not detect genetic signatures reliably associated with transmission, but further investigation of larger samples of transmitting pairs holds promise for determining which structural and genetic features of viral genomes are associated with transmission.
► Romanian HIV epidemic derives from the Angolan epidemic via multiple transmissions. ► The Romanian pediatric outbreak was a complex outbreak. ► Phylogenetic analysis of HIV demonstrate effects of ...historical events on epidemics.
During the late 1980s and early 1990s, an estimated 10,000 Romanian children were infected with HIV-1 subtype F nosocomially through contaminated needles and blood transfusions. However, the geographic source and origins of this epidemic remain unclear.
Here we used phylogenetic inference and “relaxed” molecular clock dating analysis to further characterize the Romanian HIV-1 subtype F epidemic.
These analyses revealed a major lineage of Romanian HIV sequences consisting nearly entirely of virus sampled from adolescents and children and a distinct cluster that included a much higher ratio of adult sequences. Divergence time estimates inferred the time of most recent common ancestor of subtype F1 sequences to be 1973 (1966–1980) and for all Angolan sequences to 1975 (1968–1980). The most common ancestor of the Romanian sequences was dated to 1978 (1972–1983) with pediatric and adolescent sequences interspersed throughout the lineage. The phylogenetic structure of the entire subtype F epidemic suggests that multiple introductions of subtype F into Romania occurred either from the Angolan epidemic or from more distant ancestors. Since the historical records note that the Romanian pediatric epidemic did not begin until the late 1980s, the inferred time of most recent common ancestor of the Romanian lineage of 1978 suggests that there were multiple introductions of subtype F occurred into the pediatric population from HIV already circulating in Romania.
Analysis of the subtype F HIV-1 epidemic in an historical context allows for a deeper appreciation of how the HIV pandemic has been influenced by socio-political events.