The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic ...and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.
Epistasis is defined as the interaction between different genes when expressing a specific phenotype. The most common way to characterize an epistatic relationship is using a penetrance table, which ...contains the probability of expressing the phenotype under study given a particular allele combination. Available simulators can only create penetrance tables for well-known epistasis models involving a small number of genes and under a large number of limitations.
Toxo is a MATLAB library designed to calculate penetrance tables of epistasis models of any interaction order which resemble real data more closely. The user specifies the desired heritability (or prevalence) and the program maximizes the table's prevalence (or heritability) according to the input epistatic model boundaries.
Toxo extends the capabilities of existing simulators that define epistasis using penetrance tables. These tables can be directly used as input for software simulators such as GAMETES so that they are able to generate data samples with larger interactions and more realistic prevalences/heritabilities.
Abstract
Motivation
There are many multiple testing correction methods. Some of them are robust to various dependencies in the data while others are not. Some of the implementations have problems for ...managing high dimensional list of P-values as currently demanded by microarray and other omic technologies.
Results
The program Myriads, formerly SGoF+, provides some of the most important P-value-based correction methods jointly with a test of dependency and a P-value simulator. Myriads easily manage hundreds of thousands of P-values.
Availability and implementation
http://myriads.webs.uvigo.es
Supplementary information
Supplementary data are available at Bioinformatics online.
The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their ...statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test.
It is shown that SGoF behaves especially well with small sample sizes when 1) the alternative hypothesis is weakly to moderately deviated from the null model, 2) there are widespread effects through the family of tests, and 3) the number of tests is large.
Therefore, SGoF should become an important tool for multitest adjustment when working with high-dimensional biological data.
Pollution and other anthropogenic effects have driven a decrease in Atlantic salmon (Salmo salar) in the Iberian Peninsula. The restocking effort carried out in the 1980s, with salmon from northern ...latitudes with the aim of mitigating the decline of native populations, failed, probably due to the deficiency in adaptation of foreign salmon from northern Europe to the warm waters of the Iberian Peninsula. This result would imply that the Iberian populations of Atlantic salmon have experienced local adaptation in their past evolutionary history, as has been described for other populations of this species and other salmonids. Local adaptation can occur by divergent selections between environments, favoring the fixation of alleles that increase the fitness of a population in the environment it inhabits relative to other alleles favored in another population. In this work, we compared the genomes of different populations from the Iberian Peninsula (Atlantic and Cantabric basins) and Scotland in order to provide tentative evidence of candidate SNPs responsible for the adaptive differences between populations, which may explain the failures of restocking carried out during the 1980s. For this purpose, the samples were genotyped with a 220,000 high-density SNP array (Affymetrix) specific to Atlantic salmon. Our results revealed potential evidence of local adaptation for North Spanish and Scottish populations. As expected, most differences concerned the comparison of the Iberian Peninsula with Scotland, although there were also differences between Atlantic and Cantabric populations. A high proportion of the genes identified are related to development and cellular metabolism, DNA transcription and anatomical structure. A particular SNP was identified within the NADP-dependent malic enzyme-2 (mMEP-2*), previously reported by independent studies as a candidate for local adaptation in salmon from the Iberian Peninsula. Interestingly, the corresponding SNP within the mMEP-2* region was consistent with a genomic pattern of divergent selection.
There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be ...undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation.
I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.
GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page 1.
The mode in which sexual organisms choose mates is a key evolutionary process, as it can have a profound impact on fitness and speciation. One way to study mate choice in the wild is by measuring ...trait correlation between mates. Positive assortative mating is inferred when individuals of a mating pair display traits that are more similar than those expected under random mating while negative assortative mating is the opposite. A recent review of 1134 trait correlations found that positive estimates of assortative mating were more frequent and larger in magnitude than negative estimates. Here, we describe the scale-of-choice effect (SCE), which occurs when mate choice exists at a smaller scale than that of the investigator's sampling, while simultaneously the trait is heterogeneously distributed at the true scale-of-choice. We demonstrate the SCE by Monte Carlo simulations and estimate it in two organisms showing positive (Littorina saxatilis) and negative (L. fabalis) assortative mating. Our results show that both positive and negative estimates are biased by the SCE by different magnitudes, typically toward positive values. Therefore, the low frequency of negative assortative mating observed in the literature may be due to the SCE's impact on correlation estimates, which demands new experimental evaluation.
The importance of simulation software in current and future evolutionary and genomic studies is just confirmed by the recent publication of several new simulation tools. The forward-in-time ...simulation strategy has, therefore, re-emerged as a complement of coalescent simulation. Additionally, more efficient coalescent algorithms, the same as new ideas about the combined use of backward and forward strategies have recently appeared. In the present work, a previous review is updated to include some new forward simulation tools. When simulating at the genome-scale the conflict between efficiency (i.e. execution speed and memory usage) and flexibility (i.e. complex modeling capabilities) emerges. This is the pivot around which simulation of evolutionary processes should improve. In addition, some effort should be made to consider the process of developing simulation tools from the point of view of the software engineering theory. Finally, some new ideas and technologies as general purpose graphic processing units are commented.
Natural color polymorphisms are widespread across animal species and usually have a simple genetic basis. This makes them an ideal system to study the evolutionary mechanisms responsible for ...maintaining biodiversity. In some populations of the intertidal snail
Littorina fabalis
, variation in shell color has remained stable for years, but the mechanisms responsible are unknown. Previous studies suggest that this stability could be caused by frequency-dependent sexual selection, but this hypothesis has not been tested. We analyzed shell color polymorphism in mating pairs and surrounding unmated individuals in two different populations of
L. fabalis
to estimate sexual fitness for color, as well as assortative mating. The estimated effective population size from neutral markers allowed us to disregard genetic drift as the main source of color frequency changes across generations. Shell color frequency was significantly correlated with sexual fitness showing a pattern of negative frequency dependent selection with high disassortative mating for color. The results suggested a contribution of male mate choice to maintain the polymorphism. Finally, the implementation of a multi-model inference approach based on information theory allowed us to test for the relative contribution of mate choice and mate competition to explain the maintenance of color polymorphism in this snail species.
We developed a new multiple hypothesis testing adjustment called SGoF+ implemented as a sequential goodness of fit metatest which is a modification of a previous algorithm, SGoF, taking advantage of ...the information of the distribution of p-values in order to fix the rejection region. The new method uses a discriminant rule based on the maximum distance between the uniform distribution of p-values and the observed one, to set the null for a binomial test. This new approach shows a better power/pFDR ratio than SGoF. In fact SGoF+ automatically sets the threshold leading to the maximum power and the minimum false non-discovery rate inside the SGoF' family of algorithms. Additionally, we suggest combining the information provided by SGoF+ with the estimate of the FDR that has been committed when rejecting a given set of nulls. We study different positive false discovery rate, pFDR, estimation methods to combine q-value estimates jointly with the information provided by the SGoF+ method. Simulations suggest that the combination of SGoF+ metatest with the q-value information is an interesting strategy to deal with multiple testing issues. These techniques are provided in the latest version of the SGoF+ software freely available at http://webs.uvigo.es/acraaj/SGoF.htm.