Repeatability (more precisely the common measure of repeatability, the intra‐class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy ...of phenotypes. It is the proportion of phenotypic variation that can be attributed to between‐subject (or between‐group) variation. As a consequence, the non‐repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non‐Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non‐Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation‐based, analysis of variance (ANOVA)‐based and linear mixed‐effects model (LMM)‐based methods, while for non‐Gaussian data, we focus on generalised linear mixed‐effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM‐ and GLMM‐based approaches mainly because of the ease with which confounding variables can be controlled for. Furthermore, we compare two types of repeatability (ordinary repeatability and extrapolated repeatability) in relation to narrow‐sense heritability. This review serves as a collection of guidelines and recommendations for biologists to calculate repeatability and heritability from both Gaussian and non‐Gaussian data.
The coefficient of determination R2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R2 for ...generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R2 that we called for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments.
Summary
Intra‐class correlations (ICC) and repeatabilities (R) are fundamental statistics for quantifying the reproducibility of measurements and for understanding the structure of biological ...variation. Linear mixed effects models offer a versatile framework for estimating ICC and R. However, while point estimation and significance testing by likelihood ratio tests is straightforward, the quantification of uncertainty is not as easily achieved.
A further complication arises when the analysis is conducted on data with non‐Gaussian distributions because the separation of the mean and the variance is less clear‐cut for non‐Gaussian than for Gaussian models. Nonetheless, there are solutions to approximate repeatability for the most widely used families of generalized linear mixed models (GLMMs).
Here, we introduce the R package rptR for the estimation of ICC and R for Gaussian, binomial and Poisson‐distributed data. Uncertainty in estimators is quantified by parametric bootstrapping and significance testing is implemented by likelihood ratio tests and through permutation of residuals. The package allows control for fixed effects and thus the estimation of adjusted repeatabilities (that remove fixed effect variance from the estimate) and enhanced agreement repeatabilities (that add fixed effect variance to the denominator). Furthermore, repeatability can be estimated from random‐slope models. The package features convenient summary and plotting functions.
Besides repeatabilities, the package also allows the quantification of coefficients of determination R2 as well as of raw variance components. We present an example analysis to demonstrate the core features and discuss some of the limitations of rptR.
Summary
The use of both linear and generalized linear mixed‐effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in ...the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed‐effects models.
The presentation of ‘variance explained’ (R2) as a relevant summarizing statistic of mixed‐effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness‐of‐fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest.
One reason for the under‐appreciation of R2 for mixed‐effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed‐effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation).
Here, we make a case for the importance of reporting R2 for mixed‐effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed‐effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems.
This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed‐effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.
We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p ...hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution.
Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with ...two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta‐analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta‐analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non‐normal error structure and/or variances, and (4) when data are non‐independent. Although interpretations of effect sizes are often difficult, we provide some pointers to help researchers. This paper serves both as a beginner’s instruction manual and a stimulus for changing statistical practice for the better in the biological sciences.
ABSTRACT
The effects of sex hormones on immune function have received much attention, especially following the proposal of the immunocompetence handicap hypothesis. Many studies, both experimental ...and correlational, have been conducted to test the relationship between immune function and the sex hormones testosterone in males and oestrogen in females. However, the results are mixed. We conducted four cross‐species meta‐analyses to investigate the relationship between sex hormones and immune function: (i) the effect of testosterone manipulation on immune function in males, (ii) the correlation between circulating testosterone level and immune function in males, (iii) the effect of oestrogen manipulation on immune function in females, and (iv) the correlation between circulating oestrogen level and immune function in females. The results from the experimental studies showed that testosterone had a medium‐sized immunosuppressive effect on immune function. The effect of oestrogen, on the other hand, depended on the immune measure used. Oestrogen suppressed cell‐mediated immune function while reducing parasite loads. The overall correlation (meta‐analytic relationship) between circulating sex hormone level and immune function was not statistically significant for either testosterone or oestrogen despite the power of meta‐analysis. These results suggest that correlational studies have limited value for testing the effects of sex hormones on immune function. We found little evidence of publication bias in the four data sets using indirect tests. There was a weak and positive relationship between year of publication and effect size for experimental studies of testosterone that became non‐significant after we controlled for castration and immune measure, suggesting that the temporal trend was due to changes in these moderators over time. Graphical analyses suggest that the temporal trend was due to an increased use of cytokine measures across time. We found substantial heterogeneity in effect sizes, except in correlational studies of testosterone, even after we accounted for the relevant random and fixed factors. In conclusion, our results provide good evidence that testosterone suppresses immune function and that the effect of oestrogen varies depending on the immune measure used.
ABSTRACT
Although a small set of external factors account for much of the spatial variation in plant and animal diversity, the search continues for general drivers of variation in parasite species ...richness among host species. Qualitative reviews of existing evidence suggest idiosyncrasies and inconsistent predictive power for all proposed determinants of parasite richness. Here, we provide the first quantitative synthesis of the evidence using a meta‐analysis of 62 original studies testing the relationship between parasite richness across animal, plant and fungal hosts, and each of its four most widely used presumed predictors: host body size, host geographical range size, host population density, and latitude. We uncover three universal predictors of parasite richness across host species, namely host body size, geographical range size and population density, applicable regardless of the taxa considered and independently of most aspects of study design. A proper match in the primary studies between the focal predictor and both the spatial scale of study and the level at which parasite species richness was quantified (i.e. within host populations or tallied across a host species' entire range) also affected the magnitude of effect sizes. By contrast, except for a couple of indicative trends in subsets of the full dataset, there was no strong evidence for an effect of latitude on parasite species richness; where found, this effect ran counter to the general latitude gradient in diversity, with parasite species richness tending to be higher further from the equator. Finally, the meta‐analysis also revealed a negative relationship between the magnitude of effect sizes and the year of publication of original studies (i.e. a time‐lag bias). This temporal bias may be due to the increasing use of phylogenetic correction in comparative analyses of parasite richness over time, as this correction yields more conservative effect sizes. Overall, these findings point to common underlying processes of parasite diversification fundamentally different from those controlling the diversity of free‐living organisms.
Global abundance estimates for 9,700 bird species Callaghan, Corey T; Nakagawa, Shinichi; Cornwell, William K
Proceedings of the National Academy of Sciences,
05/2021, Volume:
118, Issue:
21
Journal Article
Peer reviewed
Open access
Quantifying the abundance of species is essential to ecology, evolution, and conservation. The distribution of species abundances is fundamental to numerous longstanding questions in ecology, yet the ...empirical pattern at the global scale remains unresolved, with a few species' abundance well known but most poorly characterized. In large part because of heterogeneous data, few methods exist that can scale up to all species across the globe. Here, we integrate data from a suite of well-studied species with a global dataset of bird occurrences throughout the world-for 9,700 species (∼92% of all extant species)-and use missing data theory to estimate species-specific abundances with associated uncertainty. We find strong evidence that the distribution of species abundances is log left skewed: there are many rare species and comparatively few common species. By aggregating the species-level estimates, we find that there are ∼50 billion individual birds in the world at present. The global-scale abundance estimates that we provide will allow for a line of inquiry into the structure of abundance across biogeographic realms and feeding guilds as well as the consequences of life history (e.g., body size, range size) on population dynamics. Importantly, our method is repeatable and scalable: as data quantity and quality increase, our accuracy in tracking temporal changes in global biodiversity will increase. Moreover, we provide the methodological blueprint for quantifying species-specific abundance, along with uncertainty, for any organism in the world.