The construction of decision-theoretical Bayesian designs for realistically complex nonlinear models is computationally challenging, as it requires the optimization of analytically intractable ...expected utility functions over high-dimensional design spaces. We provide the most general solution to date for this problem through a novel approximate coordinate exchange algorithm. This methodology uses a Gaussian process emulator to approximate the expected utility as a function of a single design coordinate in a series of conditional optimization steps. It has flexibility to address problems for any choice of utility function and for a wide range of statistical models with different numbers of variables, numbers of runs and randomization restrictions. In contrast to existing approaches to Bayesian design, the method can find multi-variable designs in large numbers of runs without resorting to asymptotic approximations to the posterior distribution or expected utility. The methodology is demonstrated on a variety of challenging examples of practical importance, including design for pharmacokinetic models and design for mixed models with discrete data. For many of these models, Bayesian designs are not currently available. Comparisons are made to results from the literature, and to designs obtained from asymptotic approximations. Supplementary materials for this article are available online.
Bayesian optimal design is considered for experiments where the response distribution depends on the solution to a system of nonlinear ordinary differential equations. The motivation is an experiment ...to estimate parameters in the equations governing the transport of amino acids through cell membranes in human placentas. Decision-theoretic Bayesian design of experiments for such nonlinear models is conceptually very attractive, allowing the formal incorporation of prior knowledge to overcome the parameter dependence of frequentist design and being less reliant on asymptotic approximations. However, the necessary approximation and maximization of the, typically analytically intractable, expected utility results in a computationally challenging problem. These issues are further exacerbated if the solution to the differential equations is not available in closed-form. This article proposes a new combination of a probabilistic solution to the equations embedded within a Monte Carlo approximation to the expected utility with cyclic descent of a smooth approximation to find the optimal design. A novel precomputation algorithm reduces the computational burden, making the search for an optimal design feasible for bigger problems. The methods are demonstrated by finding new designs for a number of common models derived from differential equations, and by providing optimal designs for the placenta experiment.
Supplementary materials
for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Recently, multiple systems estimation (MSE) has been applied to estimate the number of victims of human trafficking in different countries. The estimation procedure consists of a log-linear analysis ...of a contingency table of population registers and covariates. As the number of potential models increases exponentially with the number of registers and covariates, it is practically impossible to fit and compare all models. Therefore, the model search needs to be restricted to a small subset of all potential models. This paper addresses principles and criteria for model assessment and selection for MSE of human trafficking with special attention to sparsity which is typical to human trafficking data. The concepts are illustrated on data from Slovakia and Romania.
This article proposes a novel adaptive design algorithm that can be used to find optimal treatment allocations in N‐of‐1 clinical trials. This new methodology uses two Laplace approximations to ...provide a computationally efficient estimate of population and individual random effects within a repeated measures, adaptive design framework. Given the efficiency of this approach, it is also adopted for treatment selection to target the collection of data for the precise estimation of treatment effects. To evaluate this approach, we consider both a simulated and motivating N‐of‐1 clinical trial from the literature. For each trial, our methods were compared with the multiarmed bandit approach and a randomized N‐of‐1 trial design in terms of identifying the best treatment for each patient and the information gained about the model parameters. The results show that our new approach selects designs that are highly efficient in achieving each of these objectives. As such, we propose our Laplace‐based algorithm as an efficient approach for designing adaptive N‐of‐1 trials.
Performing censuses on stigmatized or vulnerable populations is challenging, however, for such populations partial enumeration is often possible using different lists or sources. If the sources ...overlap then multiple systems estimation (MSE) methods can be applied to obtain an estimate of the total population. These are typically expressed by a log-linear model which permits positive/negative dependencies between lists. This paper considers issues that arise for the application of MSE to modern slavery where there is little to no overlap of individuals across lists. We investigate the robustness of MSE in terms of the importance of each list and the impact of combining lists on the estimation process. We undertake a simulation study and consider real national modern slavery data from the UK and Romania.
We present a common framework for Bayesian emulation methodologies for multivariate output simulators, or computer models, that employ either parametric linear models or non-parametric Gaussian ...processes. Novel diagnostics suitable for multivariate covariance separable emulators are developed and techniques to improve the adequacy of an emulator are discussed and implemented. A variety of emulators are compared for a humanitarian relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, and a sensitivity analysis is conducted to determine the sensitivity of the simulator output to changes in the input variables. The results from parametric and non-parametric emulators are compared in terms of prediction accuracy, uncertainty quantification and scientific interpretability.
The design of an experiment can always be considered at least implicitly Bayesian, with prior knowledge used informally to aid decisions such as the variables to be studied and the choice of a ...plausible relationship between the explanatory variables and measured responses. Bayesian methods allow uncertainty in these decisions to be incorporated into design selection through prior distributions that encapsulate information available from scientific knowledge or previous experimentation. Further, a design may be explicitly tailored to the aim of the experiment through a decision-theoretic approach using an appropriate loss function. We review the area of decision-theoretic Bayesian design, with particular emphasis on recent advances in computational methods. For many problems arising in industry and science, experiments result in a discrete response that is well described by a member of the class of generalized linear models. Bayesian design for such nonlinear models is often seen as impractical as the expected loss is analytically intractable and numerical approximations are usually computationally expensive. We describe how Gaussian process emulation, commonly used in computer experiments, can play an important role in facilitating Bayesian design for realistic problems. A main focus is the combination of Gaussian process regression to approximate the expected loss with cyclic descent (coordinate exchange) optimization algorithms to allow optimal designs to be found for previously infeasible problems. We also present the first optimal design results for statistical models formed from dimensional analysis, a methodology widely employed in the engineering and physical sciences to produce parsimonious and interpretable models. Using the famous paper helicopter experiment, we show the potential for the combination of Bayesian design, generalized linear models, and dimensional analysis to produce small but informative experiments.
Bayesian analysis often concerns an evaluation of models with different dimensionality as is necessary in, for example, model selection or mixture models. To facilitate this evaluation, ...transdimensional Markov chain Monte Carlo (MCMC) relies on sampling a discrete indexing variable to estimate the posterior model probabilities. However, little attention has been paid to the precision of these estimates. If only few switches occur between the models in the transdimensional MCMC output, precision may be low and assessment based on the assumption of independent samples misleading. Here, we propose a new method to estimate the precision based on the observed transition matrix of the model-indexing variable. Assuming a first-order Markov model, the method samples from the posterior of the stationary distribution. This allows assessment of the uncertainty in the estimated posterior model probabilities, model ranks, and Bayes factors. Moreover, the method provides an estimate for the effective sample size of the MCMC output. In two model selection examples, we show that the proposed approach provides a good assessment of the uncertainty associated with the estimated posterior model probabilities.
We describe the R package acebayes and demonstrate its use to find Bayesian optimal experimental designs. A decision-theoretic approach is adopted, with the optimal design maximizing an expected ...utility. Finding Bayesian optimal designs for realistic problems is challenging, as the expected utility is typically intractable and the design space may be high-dimensional. The package implements the approximate coordinate exchange algorithm to optimize (an approximation to) the expected utility via a sequence of conditional one-dimensional optimization steps. At each step, a Gaussian process regression model is used to approximate, and subsequently optimize, the expected utility as the function of a single design coordinate (the value taken by one controllable variable for one run of the experiment). In addition to functions for bespoke design problems with user-defined utility functions, acebayes provides functions tailored to finding designs for common generalized linear and nonlinear models. The package provides a step-change in the complexity of problems that can be addressed, enabling designs to be found for much larger numbers of variables and runs than previously possible. We provide tutorials on the application of the methodology for four illustrative examples of varying complexity where designs are found for the goals of parameter estimation, model selection and prediction. These examples demonstrate previously unseen functionality of acebayes.
The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of complete and incomplete contingency tables using hierarchical log-linear models. This package allows a user ...to identify interactions between categorical factors (via complete contingency tables) and to estimate closed population sizes using capture-recapture studies (via incomplete contingency tables). The models are fitted using Markov chain Monte Carlo methods. In particular, implementations of the Metropolis-Hastings and reversible jump algorithms appropriate for log-linear models are employed. The conting package is demonstrated on four real examples.