Smelly parallel MCMC chains Martino, L.; Elvira, V.; Luengo, D. ...
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
04/2015
Conference Proceeding
Monte Carlo (MC) methods are useful tools for Bayesian inference and stochastic optimization that have been widely applied in signal processing and machine learning. A well-known class of MC methods ...are Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce a novel parallel interacting MCMC scheme, where the parallel chains share information, thus yielding a faster exploration of the state space. The interaction is carried out generating a dynamic repulsion among the "smelly" parallel chains that takes into account the entire population of current states. The ergodicity of the scheme and its relationship with other sampling methods are discussed. Numerical results show the advantages of the proposed approach in terms of mean square error, robustness w.r.t. to initial values and parameter choice.
Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov ...chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset. Availability: Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html
We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior ...distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs.
CYP2D6, a member of the cytochrome P450 superfamily, is responsible for the metabolism of about 25% of the commonly prescribed drugs. Its activity ranges from complete deficiency to excessive ...activity, potentially causing toxicity of medication or therapeutic failure with recommended drug dosages. This study aimed to describe the CYP2D6 diversity at the global level.
A total of 1060 individuals belonging to 52 worldwide-distributed populations were genotyped at 12 highly informative variable sites, as well as for gene deletion and duplications. Phenotypes were predicted on the basis of haplotype combinations.
Our study shows that (i) CYP2D6 diversity is far greater within than between populations and groups thereof, (ii) null or low-activity variants occur at high frequencies in various areas of the world, (iii) linkage disequilibrium is lowest in Africa and highest in the Americas. Patterns of variation, within and among populations, are similar to those observed for other autosomal markers (e.g. microsatellites and protein polymorphisms), suggesting that the diversity observed at the CYP2D6 locus reflects the same factors affecting variation at random genome markers.
Bayesian inference often requires efficient numerical approximation algorithms, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known ...MCMC technique, widely applied in many signal processing problems. Drawing samples from univariate full-conditional distributions efficiently is essential for the practical application of the Gibbs sampler. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from these univariate target densities. The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Instead, the proposal is adjusted during an initial optimization stage, following a simple and extremely effective procedure. Hence, we have named the newly proposed approach as FUSS (Fast Universal Self-tuned Sampler), as it can be used to sample from any bounded univariate distribution and also from any bounded multi-variate distribution, either directly or by embedding it within a Gibbs sampler. Numerical experiments, on several synthetic data sets (including a challenging parameter estimation problem in a chaotic system) and a high-dimensional financial signal processing problem, show its good performance in terms of speed and estimation accuracy.
Sex-biased dispersal is often connected to the mating behaviour of the species. Even if patterns of natal dispersal are reasonably well documented for monogamous birds, only a few data are available ...for polygynous and especially lekking species. We investigated the dispersal of the capercaillie (Tetrao urogallus) by examining sex-specific gene flow among the leks. Genetic information was extracted using nuclear and mitochondrial molecular markers for sexed faecal samples and analysed by novel Bayesian statistical methods. Contrary to the traditional view that the males are highly philopatric and female is the dispersing sex, we found roughly equivalent gross and effective dispersal of the sexes. The level of polygamy has a strong influence on the effective population size and on the effective dispersal. The results do not support the theories that dispersal evolves solely as a result of resource competition or other advantages to males obtained through kin selection in lekking species.
Molecular markers have been demonstrated to be useful for the estimation of stock mixture proportions where the origin of individuals is determined from baseline samples. Bayesian statistical methods ...are widely recognized as providing a preferable strategy for such analyses. In general, Bayesian estimation is based on standard latent class models using data augmentation through Markov chain Monte Carlo techniques. In this study, we introduce a novel approach based on recent developments in the estimation of genetic population structure. Our strategy combines analytical integration with stochastic optimization to identify stock mixtures. An important enhancement over previous methods is the possibility of appropriately handling data where only partial baseline sample information is available. We address the potential use of nonmolecular, auxiliary biological information in our Bayesian model.
Streptococcus suis is part of the pig commensal microbiome but strains can also be pathogenic, causing pneumonia and meningitis in pigs as well as zoonotic meningitis. According to genomic analysis, ...S. suis is divided into asymptomatic carriage, respiratory and systemic strains with distinct genomic signatures. Because the strategies to target pathogenic S. suis are limited, new therapeutic approaches are needed. The virulence factor S. suis adhesin P (SadP) recognizes the galabiose Galα1–4Gal-oligosaccharide. Based on its oligosaccharide fine specificity, SadP can be divided into subtypes PN and PO. We show here that subtype PN is distributed in the systemic strains causing meningitis, whereas type PO is found in asymptomatic carriage and respiratory strains. Both types of SadP are shown to predominantly bind to pig lung globotriaosylceramide (Gb3). However, SadP adhesin from systemic subtype PN strains also binds to globotetraosylceramide (Gb4). Mutagenesis studies of the galabiose-binding domain of type PN SadP adhesin showed that the amino acid asparagine 285, which is replaced by an aspartate residue in type PO SadP, was required for binding to Gb4 and, strikingly, was also required for interaction with the glycomimetic inhibitor phenylurea-galabiose. Molecular dynamics simulations provided insight into the role of Asn-285 for Gb4 and phenylurea-galabiose binding, suggesting additional hydrogen bonding to terminal GalNAc of Gb4 and the urea group. Thus, the Asn-285–mediated molecular mechanism of type PN SadP binding to Gb4 could be used to selectively target S. suis in systemic disease without interfering with commensal strains, opening up new avenues for interventional strategies against this pathogen.
Pneumococcal disease outbreaks of vaccine preventable serotype 4 sequence type (ST)801 in shipyards have been reported in several countries. We aimed to use genomics to establish any international ...links between them.
Sequence data from ST801-related outbreak isolates from Norway (n = 17), Finland (n = 11) and Northern Ireland (n = 2) were combined with invasive pneumococcal disease surveillance from the respective countries, and ST801-related genomes from an international collection (n = 41 of > 40,000), totalling 106 genomes. Raw data were mapped and recombination excluded before phylogenetic dating.
Outbreak isolates were relatively diverse, with up to 100 SNPs (single nucleotide polymorphisms) and a common ancestor estimated around the year 2000. However, 19 Norwegian and Finnish isolates were nearly indistinguishable (0–2 SNPs) with the common ancestor dated around 2017.
The total diversity of ST801 within the outbreaks could not be explained by recent transmission alone, suggesting that harsh environmental and associated living conditions reported in the shipyards may facilitate invasion of colonising pneumococci. However, near identical strains in the Norwegian and Finnish outbreaks does suggest that transmission between international shipyards also contributed to those outbreaks. This indicates the need for improved preventative measures in this working population including pneumococcal vaccination.