A key challenge for community ecology is to understand to what extent observational data can be used to infer the underlying community assembly processes. As different processes can lead to similar ...or even identical patterns, statistical analyses of non‐manipulative observational data never yield undisputable causal inference on the underlying processes. Still, most empirical studies in community ecology are based on observational data, and hence understanding under which circumstances such data can shed light on assembly processes is a central concern for community ecologists. We simulated a spatial agent‐based model that generates variation in metacommunity dynamics across multiple axes, including the four classic metacommunity paradigms as special cases. We further simulated a virtual ecologist who analysed snapshot data sampled from the simulations using eighteen output metrics derived from beta‐diversity and habitat variation indices, variation partitioning and joint species distribution modelling. Our results indicated two main axes of variation in the output metrics. The first axis of variation described whether the landscape has patchy or continuous variation, and thus was essentially independent of the properties of the species community. The second axis of variation related to the level of predictability of the metacommunity. The most predictable communities were niche‐based metacommunities inhabiting static landscapes with marked environmental heterogeneity, such as metacommunities following the species sorting paradigm or the mass effects paradigm. The most unpredictable communities were neutral‐based metacommunities inhabiting dynamics landscapes with little spatial heterogeneity, such as metacommunities following the neutral or patch sorting paradigms. The output metrics from joint species distribution modelling yielded generally the highest resolution to disentangle among the simulated scenarios. Yet, the different types of statistical approaches utilized in this study carried complementary information, and thus our results suggest that the most comprehensive evaluation of metacommunity structure can be obtained by combining them.
A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of ...existing models, it is unclear which models perform best for interpolation or extrapolation of existing data sets, particularly when one is concerned with species assemblages. We compared the predictive performance of 33 variants of 15 widely applied and recently emerged SDMs in the context of multispecies data, including both joint SDMs that model multiple species together, and stacked SDMs that model each species individually combining the predictions afterward. We offer a comprehensive evaluation of these SDM approaches by examining their performance in predicting withheld empirical validation data of different sizes representing five different taxonomic groups, and for prediction tasks related to both interpolation and extrapolation. We measure predictive performance by 12 measures of accuracy, discrimination power, calibration, and precision of predictions, for the biological levels of species occurrence, species richness, and community composition. Our results show large variation among the models in their predictive performance, especially for communities comprising many species that are rare. The results do not reveal any major trade-offs among measures of model performance; the same models performed generally well in terms of accuracy, discrimination, and calibration, and for the biological levels of individual species, species richness, and community composition. In contrast, the models that gave the most precise predictions were not well calibrated, suggesting that poorly performing models can make overconfident predictions. However, none of the models performed well for all prediction tasks. As a general strategy, we therefore propose that researchers fit a small set of models showing complementary performance, and then apply a cross-validation procedure involving separate data to establish which of these models performs best for the goal of the study.
Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and ...statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern analysis of community data. While non‐manipulative data allow for only correlative and not causal inference, this framework facilitates the formulation of data‐driven hypotheses regarding the processes that structure communities. We model environmental filtering by variation and covariation in the responses of individual species to the characteristics of their environment, with potential contingencies on species traits and phylogenetic relationships. We capture biotic assembly rules by species‐to‐species association matrices, which may be estimated at multiple spatial or temporal scales. We operationalise the HMSC framework as a hierarchical Bayesian joint species distribution model, and implement it as R‐ and Matlab‐packages which enable computationally efficient analyses of large data sets. Armed with this tool, community ecologists can make sense of many types of data, including spatially explicit data and time‐series data. We illustrate the use of this framework through a series of diverse ecological examples.
Community ecologists and conservation biologists often work with data that are too sparse for achieving reliable inference with species-specific approaches. Here we explore the idea of combining ...species-specific models into a single hierarchical model. The community component of the model seeks for shared patterns in how the species respond to environmental covariates. We illustrate the modeling framework in the context of logistic regression and presence-–absence data, but a similar hierarchical structure could also be used in many other types of applications. We first use simulated data to illustrate that the community component can improve parameterization of species-specific models especially for rare species, for which the data would be too sparse to be informative alone. We then apply the community model to real data on 500 diatom species to show that it has much greater predictive power than a collection of independent species-specific models. We use the modeling approach to show that roughly one-third of distance decay in community similarity can be explained by two variables characterizing water quality, rare species typically preferring nutrient-poor waters with high pH, and common species showing a more general pattern of resource use.
Signals of species interactions can be inferred from survey data by asking if some species occur more or less often together than what would be expected by random, or more generally, if any ...structural aspect of the community deviates from that expected from a set of independent species. However, a positive (or negative) association between two species does not necessarily signify a direct or indirect interaction, as it can result simply from the species having similar (or dissimilar) habitat requirements. We show how these two factors can be separated by multivariate logistic regression, with the regression part accounting for species-specific habitat requirements, and a correlation matrix for the positive or negative residual associations. We parameterize the model using Bayesian inference with data on 22 species of wood-decaying fungi acquired in 14 dissimilar forest sites. Our analyses reveal that some of the species commonly found to occur together in the same logs are likely to do so merely by similar habitat requirements, whereas other species combinations are systematically either over- or underrepresented also or only after accounting for the habitat requirements. We use our results to derive hypotheses on species interactions that can be tested in future experimental work.
When comparing multiple models of species distribution, models yielding higher predictive performance are clearly to be favored. A more difficult question is how to decide whether even the best model ...is “good enough”. Here, we clarify key choices and metrics related to evaluating the predictive performance of presence–absence models. We use a hierarchical case study to evaluate how four metrics of predictive performance (AUC, Tjur's R2, max‐Kappa, and max‐TSS) relate to each other, the random and fixed effects parts of the model, the spatial scale at which predictive performance is measured, and the cross‐validation strategy chosen. We demonstrate that the very same metric can achieve different values for the very same model, even when similar cross‐validation strategies are followed, depending on the spatial scale at which predictive performance is measured. Among metrics, Tjur's R2 and max‐Kappa generally increase with species' prevalence, whereas AUC and max‐TSS are largely independent of prevalence. Thus, Tjur's R2 and max‐Kappa often reach lower values when measured at the smallest scales considered in the study, while AUC and max‐TSS reaching similar values across the different spatial levels included in the study. However, they provide complementary insights on predictive performance. The very same model may appear excellent or poor not only due to the applied metric, but also how predictive performance is exactly calculated, calling for great caution on the interpretation of predictive performance. The most comprehensive evaluation of predictive performance can be obtained by evaluating predictive performance through the combination of measures providing complementary insights. Instead of following simple rules of thumb or focusing on absolute values, we recommend comparing the achieved predictive performance to the researcher's own a priori expectations on how easy it is to make predictions related to the same question that the model is used for.
We demonstrate that the very same metric can achieve different values for the very same model, even when similar cross‐validation strategies are followed, depending on the spatial scale at which predictive performance is measured. Tjur's R2 and max‐Kappa often reach lower values when measured at the smallest scales considered in the study, while AUC and max‐TSS reaching similar values across the different spatial levels included in the study. However, we show that a low value of Tjur's R2 or max‐Kappa should not uncritically be taken as evidence for poor predictive performance, neither should a high value of AUC or max‐TSS be accepted as proof of high predictive performance.
Summary
We present a hierarchical latent variable model that partitions variation in species occurrences and co‐occurrences simultaneously at multiple spatial scales. We illustrate how the ...parameterized model can be used to predict the occurrences of a species by using as predictors not only the environmental covariates, but also the occurrences of all other species, at all spatial scales.
We leverage recent progress in Bayesian latent variable models to implement a computationally effective algorithm that enables one to consider large communities and extensive sampling schemes.
We exemplify the framework with a community of 98 fungal species sampled in c. 22 500 dead wood units in 230 plots in 29 beech forests.
The networks identified by correlations and partial correlations were consistent, as were networks for natural and managed forests, but networks at different spatial scales were dissimilar.
Accounting for the occurrences of the other species roughly doubled the predictive powers of the models compared to accounting for environmental covariates only
.
Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously ...explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by example and discuss recent computation tools and future directions.
Many ecological questions require the joint analysis of abundances collected simultaneously across many taxonomic groups, and, if organisms are identified using modern tools such as metabarcoding, their number can be in the thousands.
While historically such data have been analyzed using ad hoc algorithms, it is now possible to fully specify joint statistical models for abundance using multivariate extensions of generalized linear mixed models.
These modern ‘joint modeling’ approaches allow the study of correlation patterns across taxa, at the same time as studying environmental response, to tease the two apart.
Latent variable models are an especially exciting tool that has recently been used for ordination as well as for studying the factors driving co-occurrence.
Summary
Joint species distribution models (JSDM) are increasingly used to analyse community ecology data. Recent progress with JSDMs has provided ecologists with new tools for estimating species ...associations (residual co‐occurrence patterns after accounting for environmental niches) from large data sets, as well as for increasing the predictive power of species distribution models (SDMs) by accounting for such associations. Yet, one critical limitation of JSDMs developed thus far is that they assume constant species associations. However, in real ecological communities, the direction and strength of interspecific interactions are likely to be different under different environmental conditions.
In this paper, we overcome the shortcoming of present JSDMs by allowing species associations covary with measured environmental covariates. To estimate environmental‐dependent species associations, we utilize a latent variable structure, where the factor loadings are modelled as a linear regression to environmental covariates.
We illustrate the performance of the statistical framework with both simulated and real data. Our results show that JSDMs perform substantially better in inferring environmental‐dependent species associations than single SDMs, especially with sparse data. Furthermore, JSDMs consistently overperform SDMs in terms of predictive power for generating predictions that account for environment‐dependent biotic associations.
We implemented the statistical framework as a MATLAB package, which includes tools both for model parameterization as well as for post‐processing of results, particularly for addressing whether and how species associations depend on the environmental conditions.
Our statistical framework provides a new tool for ecologists who wish to investigate from non‐manipulative observational community data the dependency of interspecific interactions on environmental context. Our method can be applied to answer the fundamental questions in community ecology about how species’ interactions shift in changing environmental conditions, as well as to predict future changes of species’ interactions in response to global change.
Inferring interspecific interactions indirectly from community data is of central interest in community ecology. Data on species communities can be surveyed using different methods, each of which may ...differ in the amount and type of species detected, and thus produce varying information on interaction networks. Since fruit bodies reflect only a fraction of the wood‐inhabiting fungal diversity, there is an ongoing debate in fungal ecology on whether fruit body‐based surveys are a valid method for studying fungal community dynamics compared to surveys based on DNA metabarcoding. In this paper, we focus on species‐to‐species associations and ask whether the associations inferred from data collected by fruit‐body surveys reflect the ones found from data collected by DNA‐based surveys. We estimate and compare the association networks resulting from different survey methods using a joint species distribution model. We recorded both raw and residual associations that respectively do not and do correct for the influence of the abiotic predictors when estimating the species‐to‐species associations. The analyses of the DNA data yielded a larger number of species‐to‐species associations than the analyses of the fruit body‐based data as expected. Yet, we estimated unique associations also from the fruit‐body data. Our results show that the directions of estimated residual associations were consistent between the data types, whereas the raw associations were much less consistent, highlighting the need to account for the influence of relevant environmental covariates when estimating association networks. We conclude that even though DNA‐based survey methods are more informative about the total number of interacting species, fruit‐body surveys are also an adequate method for inferring association networks in wood‐inhabiting fungi. Since the DNA and fruit‐body data carry on complementary information on fungal communities, the most comprehensive insights are obtained by combining the two survey methods.