Global envelope tests for spatial processes Myllymäki, Mari; Mrkvička, Tomáš; Grabarnik, Pavel ...
Journal of the Royal Statistical Society. Series B, Statistical methodology,
03/2017, Volume:
79, Issue:
2
Journal Article
Peer reviewed
Open access
Envelope tests are a popular tool in spatial statistics, where they are used in goodness-of-fit testing. These tests graphically compare an empirical function T(r) with its simulated counterparts ...from the null model. However, the type I error probability α is conventionally controlled for a fixed distance r only, whereas the functions are inspected on an interval of distances I. In this study, we propose two approaches related to Barnard's Monte Carlo test for building global envelope tests on l: ordering the empirical and simulated functions on the basis of their r-wise ranks among each other, and the construction of envelopes for a deviation test. These new tests allow the a priori choice of the global and they yield p-values. We illustrate these tests by using simulated and real point pattern data.
Extensions of linear models are very commonly used in the analysis of biological data. Whereas goodness of fit measures such as the coefficient of determination (R2) or the adjusted R2 are well ...established for linear models, it is not obvious how such measures should be defined for generalized linear and mixed models. There are by now several proposals but no consensus has yet emerged as to the best unified approach in these settings. In particular, it is an open question how to best account for heteroscedasticity and for covariance among observations present in residual error or induced by random effects. This paper proposes a new approach that addresses this issue and is universally applicable for arbitrary variance‐covariance structures including spatial models and repeated measures. It is exemplified using three biological examples.
Despite clear evidence that manifest variable path analysis requires highly reliable measures, path analyses with fallible measures are commonplace even in premier journals. Using fallible measures ...in path analysis can cause several serious problems: (a) As measurement error pervades a given data set, many path coefficients may be either over- or underestimated. (b) Extensive measurement error diminishes power and can prevent invalid models from being rejected. (c) Even a little measurement error can cause valid models to appear invalid. (d) Differential measurement error in various parts of a model can change the substantive conclusions that derive from path analysis. (e) All of these problems become increasingly serious and intractable as models become more complex. Methods to prevent and correct these problems are reviewed. The conclusion is that researchers should use more reliable measures (or correct for measurement error in the measures they do use), obtain multiple measures for use in latent variable modeling, and test simpler models containing fewer variables.
Summary
We propose a family of tests to assess the goodness of fit of a high dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed ...against testing specific non‐linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left‐over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals by using modern powerful regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness of fit in logistic regression models. Software implementing the methodology is available in the R package GRPtests.
Checking that models adequately represent data is an essential component of applied statistical inference. Ecologists increasingly use hierarchical Bayesian statistical models in their research. The ...appeal of this modeling paradigm is undeniable, as researchers can build and fit models that embody complex ecological processes while simultaneously accounting for observation error. However, ecologists tend to be less focused on checking model assumptions and assessing potential lack of fit when applying Bayesian methods than when applying more traditional modes of inference such as maximum likelihood. There are also multiple ways of assessing the fit of Bayesian models, each of which has strengths and weaknesses. For instance, Bayesian P values are relatively easy to compute, but are well known to be conservative, producing P values biased toward 0.5. Alternatively, lesser known approaches to model checking, such as prior predictive checks, cross-validation probability integral transforms, and pivot discrepancy measures may produce more accurate characterizations of goodness-of-fit but are not as well known to ecologists. In addition, a suite of visual and targeted diagnostics can be used to examine violations of different model assumptions and lack of fit at different levels of the modeling hierarchy, and to check for residual temporal or spatial autocorrelation. In this review, we synthesize existing literature to guide ecologists through the many available options for Bayesian model checking. We illustrate methods and procedures with several ecological case studies including (1) analysis of simulated spatiotemporal count data, (2) N-mixture models for estimating abundance of sea otters from an aircraft, and (3) hidden Markov modeling to describe attendance patterns of California sea lion mothers on a rookery. We find that commonly used procedures based on posterior predictive P values detect extreme model inadequacy, but often do not detect more subtle cases of lack of fit. Tests based on cross-validation and pivot discrepancy measures (including the "sampled predictive P value") appear to be better suited to model checking and to have better overall statistical performance. We conclude that model checking is necessary to ensure that scientific inference is well founded. As an essential component of scientific discovery, it should accompany most Bayesian analyses presented in the literature.
The literature proposes numerous so-called pseudo-Rsuperscript 2 measures for evaluating "goodness of fit" in regression models with categorical dependent variables. Unlike ordinary least ...square-Rsuperscript 2, log-likelihood-based pseudo-Rsuperscript 2s do not represent the proportion of explained variance but rather the improvement in model likelihood over a null model. The multitude of available pseudo-Rsuperscript 2 measures and the absence of benchmarks often lead to confusing interpretations and unclear reporting. Drawing on a meta-analysis of 274 published logistic regression models as well as simulated data, this study investigates fundamental differences of distinct pseudo-Rsuperscript 2 measures, focusing on their dependence on basic study design characteristics. Results indicate that almost all pseudo-Rsuperscript 2s are influenced to some extent by sample size, number of predictor variables, and number of categories of the dependent variable and its distribution asymmetry. Hence, an interpretation by goodness-of-fit benchmark values must explicitly consider these characteristics. The authors derive a set of goodness-of-fit benchmark values with respect to ranges of sample size and distribution of observations for this measure. This study raises awareness of fundamental differences in characteristics of pseudo-Rsuperscript 2s and the need for greater precision in reporting these measures.
Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic ...performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA’s tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.
The present guide provides a practical guide to conducting latent profile analysis (LPA) in the Mplus software system. This guide is intended for researchers familiar with some latent variable ...modeling but not LPA specifically. A general procedure for conducting LPA is provided in six steps: (a) data inspection, (b) iterative evaluation of models, (c) model fit and interpretability, (d) investigation of patterns of profiles in a retained model, (e) covariate analysis, and (f) presentation of results. A worked example is provided with syntax and results to exemplify the steps.
1. Information criteria (ICs) are used widely for data summary and model building in ecology, especially in applied ecology and wildlife management. Although ICs are useful for distinguishing among ...rival candidate models, ICs do not necessarily indicate whether the "best" model (or a model-averaged version) is a good representation of the data or whether the model has useful "explanatory" or "predictive" ability. 2. As editors and reviewers, we have seen many submissions that did not evaluate whether the nominal "best" model(s) found using IC is a useful model in the above sense. 3. We scrutinized six leading ecological journals for papers that used IC to models. More than half of papers using IC for model comparison did not evaluate the adequacy of the best model(s) in either "explaining" or "prdicting" the data. 4. Synthesis and applications. Authors need to evaluate the adequacy of the model identified as the "best" model by using information criteria methods to provide convincing evidence to readers and users that inferences from the best models are useful and reliable.