A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of ...existing models, it is unclear which models perform best for interpolation or extrapolation of existing data sets, particularly when one is concerned with species assemblages. We compared the predictive performance of 33 variants of 15 widely applied and recently emerged SDMs in the context of multispecies data, including both joint SDMs that model multiple species together, and stacked SDMs that model each species individually combining the predictions afterward. We offer a comprehensive evaluation of these SDM approaches by examining their performance in predicting withheld empirical validation data of different sizes representing five different taxonomic groups, and for prediction tasks related to both interpolation and extrapolation. We measure predictive performance by 12 measures of accuracy, discrimination power, calibration, and precision of predictions, for the biological levels of species occurrence, species richness, and community composition. Our results show large variation among the models in their predictive performance, especially for communities comprising many species that are rare. The results do not reveal any major trade-offs among measures of model performance; the same models performed generally well in terms of accuracy, discrimination, and calibration, and for the biological levels of individual species, species richness, and community composition. In contrast, the models that gave the most precise predictions were not well calibrated, suggesting that poorly performing models can make overconfident predictions. However, none of the models performed well for all prediction tasks. As a general strategy, we therefore propose that researchers fit a small set of models showing complementary performance, and then apply a cross-validation procedure involving separate data to establish which of these models performs best for the goal of the study.
Summary
Recent studies have demonstrated a need for increased rigour in building and evaluating ecological niche models (ENMs) based on presence‐only occurrence data. Two major goals are to balance ...goodness‐of‐fit with model complexity (e.g. by ‘tuning’ model settings) and to evaluate models with spatially independent data. These issues are especially critical for data sets suffering from sampling bias, and for studies that require transferring models across space or time (e.g. responses to climate change or spread of invasive species). Efficient implementation of procedures to accomplish these goals, however, requires automation.
We developed ENMeval, an R package that: (i) creates data sets for k‐fold cross‐validation using one of several methods for partitioning occurrence data (including options for spatially independent partitions), (ii) builds a series of candidate models using Maxent with a variety of user‐defined settings and (iii) provides multiple evaluation metrics to aid in selecting optimal model settings. The six methods for partitioning data are n−1 jackknife, random k‐folds ( = bins), user‐specified folds and three methods of masked geographically structured folds. ENMeval quantifies six evaluation metrics: the area under the curve of the receiver‐operating characteristic plot for test localities (AUCTEST), the difference between training and testing AUC (AUCDIFF), two different threshold‐based omission rates for test localities and the Akaike information criterion corrected for small sample sizes (AICc).
We demonstrate ENMeval by tuning model settings for eight tree species of the genus Coccoloba in Puerto Rico based on AICc. Evaluation metrics varied substantially across model settings, and models selected with AICc differed from default ones.
In summary, ENMeval facilitates the production of better ENMs and should promote future methodological research on many outstanding issues.
Released 4 years ago, the Wallace EcoMod application (R package wallace) provided an open‐source and interactive platform for modeling species niches and distributions that served as a reproducible ...toolbox and educational resource. wallace harnesses R package tools documented in the literature and makes them available via a graphical user interface that runs analyses and returns code to document and reproduce them. Since its release, feedback from users and partners helped identify key areas for advancement, leading to the development of wallace 2. Following the vision of growth by community expansion, the core development team engaged with collaborators and undertook a major restructuring of the application to enable: simplified addition of custom modules to expand methodological options, analyses for multiple species in the same session, improved metadata features, new database connections, and saving/loading sessions. wallace 2 features nine new modules and added functionalities that facilitate data acquisition from climate‐simulation, botanical and paleontological databases; custom data inputs; model metadata tracking; and citations for R packages used (to promote documentation and give credit to developers). Three of these modules compose a new component for environmental space analyses (e.g., niche overlap). This expansion was paired with outreach to the biogeography and biodiversity communities, including international presentations and workshops that take advantage of the software's extensive guidance text. Additionally, the advances extend accessibility with a cloud‐computing implementation and include a suite of comprehensive unit tests. The features in wallace 2 greatly improve its expandability, breadth of analyses, and reproducibility options, including the use of emerging metadata standards. The new architecture serves as an example for other modular software, especially those developed using the rapidly proliferating R package shiny, by showcasing straightforward module ingestion and unit testing. Importantly, wallace 2 sets the stage for future expansions, including those enabling biodiversity estimation and threat assessments for conservation.
The contributions of species to ecosystem functions or services depend not only on their presence but also on their local abundance. Progress in predictive spatial modelling has largely focused on ...species occurrence rather than abundance. As such, limited guidance exists on the most reliable methods to explain and predict spatial variation in abundance. We analysed the performance of 68 abundance‐based species distribution models fitted to 800 000 standardised abundance records for more than 800 terrestrial bird and reef fish species. We found a large amount of variation in the performance of abundance‐based models. While many models performed poorly, a subset of models consistently reconstructed range‐wide abundance patterns. The best predictions were obtained using random forests for frequently encountered and abundant species and for predictions within the same environmental domain as model calibration. Extending predictions of species abundance outside of the environmental conditions used in model training generated poor predictions. Thus, interpolation of abundances between observations can help improve understanding of spatial abundance patterns, but our results indicate extrapolated predictions of abundance under changing climate have a much greater uncertainty. Our synthesis provides a road map for modelling abundance patterns, a key property of species distributions that underpins theoretical and applied questions in ecology and conservation.
With the expansion in the quantity and types of biodiversity data being collected, there is a need to find ways to combine these different sources to provide cohesive summaries of species’ potential ...and realized distributions in space and time. Recently, model-based data integration has emerged as a means to achieve this by combining datasets in ways that retain the strengths of each. We describe a flexible approach to data integration using point process models, which provide a convenient way to translate across ecological currencies. We highlight recent examples of large-scale ecological models based on data integration and outline the conceptual and technical challenges and opportunities that arise.
Integrated modeling of species distributions and abundance is emerging as a powerful tool in statistical ecology.Point processes provide a flexible framework for developing integrated models, combining data representing the locations of individual organisms, local population abundance, and species–site occupancy.These methods provide opportunities to make best use of existing and new data sources.We expect that data integration will underpin the next generation of models predicting the current, future, and potential distributions of species.
Information on where species occur is an important component of conservation and management decisions, but knowledge of distributions is often coarse or incomplete. Species distribution models ...provide a tool for mapping habitat and can produce credible, defensible, and repeatable information with which to inform decisions. However, these models are sensitive to data inputs and methodological choices, making it important to assess the reliability and utility of model predictions. We provide a rubric that model developers can use to communicate a model’s attributes and its appropriate uses. We emphasize the importance of tailoring model development and delivery to the species of interest and the intended use and the advantages of iterative modeling and validation. We highlight how species distribution models have been used to design surveys for new populations, inform spatial prioritization decisions for management actions, and support regulatory decision-making and compliance, tying these examples back to our model assessment rubric.
Spatial and temporal associations between sympatric species underpin biotic interactions, structure ecological assemblages, and sustain ecosystem functioning and stability. However, the resilience of ...interspecific spatiotemporal associations to human activity remains poorly understood, particularly in mountain forests where anthropogenic impacts are often pervasive. Here, we applied context-dependent Joint Species Distribution Models to a systematic camera-trap survey dataset from a global biodiversity hotspot in eastern Himalayas to understand how prominent human activities in mountain forests influence species associations within terrestrial mammal communities. We obtained 10,388 independent detections of 17 focal species (12 carnivores and five ungulates) from 322 stations over 43,163 camera days of effort. We identified a higher incidence of positive associations in habitats with higher levels of human modification (87%) and human presence (83%) compared to those located in habitats with lower human modification (64%) and human presence (65%) levels. We also detected a significant reduction of pairwise encounter time at increasing levels of human disturbance, corresponding to more frequent encounters between pairs of species. Our findings indicate that human activities can push mammals together into more frequent encounters and associations, which likely influences the coexistence and persistence of wildlife, with potential far-ranging ecological consequences.
Although numerous species distribution models have been developed, most were based on insufficient distribution data or used older climate change scenarios. We aimed to quantify changes in projected ...ranges and threat level by the years 2061–2080, for 12 European forest tree species under three climate change scenarios. We combined tree distribution data from the Global Biodiversity Information Facility, EUFORGEN, and forest inventories, and we developed species distribution models using MaxEnt and 19 bioclimatic variables. Models were developed for three climate change scenarios—optimistic (RCP2.6), moderate (RCP4.5), and pessimistic (RPC8.5)—using three General Circulation Models, for the period 2061–2080. Our study revealed different responses of tree species to projected climate change. The species may be divided into three groups: “winners”—mostly late‐successional species: Abies alba, Fagus sylvatica, Fraxinus excelsior, Quercus robur, and Quercus petraea; “losers”—mostly pioneer species: Betula pendula, Larix decidua, Picea abies, and Pinus sylvestris; and alien species—Pseudotsuga menziesii, Quercus rubra, and Robinia pseudoacacia, which may be also considered as “winners.” Assuming limited migration, most of the species studied would face a significant decrease in suitable habitat area. The threat level was highest for species that currently have the northernmost distribution centers. Ecological consequences of the projected range contractions would be serious for both forest management and nature conservation.
We quantified changes in projected ranges and threat level by the years 2061–2080 for 12 European forest tree species under three climate change scenarios and three Global Circulation Models using MaxEnt model. Due to different responses of tree species to projected climate change, species may be divided into “winners” – mostly late‐successional species, “losers” – mostly pioneer species, and alien species. Assuming limited migration, most of the species studied would face significant decrease of suitable habitat area, especially species that currently have the northernmost distribution centers.
Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and ...statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern analysis of community data. While non‐manipulative data allow for only correlative and not causal inference, this framework facilitates the formulation of data‐driven hypotheses regarding the processes that structure communities. We model environmental filtering by variation and covariation in the responses of individual species to the characteristics of their environment, with potential contingencies on species traits and phylogenetic relationships. We capture biotic assembly rules by species‐to‐species association matrices, which may be estimated at multiple spatial or temporal scales. We operationalise the HMSC framework as a hierarchical Bayesian joint species distribution model, and implement it as R‐ and Matlab‐packages which enable computationally efficient analyses of large data sets. Armed with this tool, community ecologists can make sense of many types of data, including spatially explicit data and time‐series data. We illustrate the use of this framework through a series of diverse ecological examples.
•We compared seven methods to include spatial restrictions into species distribution models (SDMs) with traditional ways to model species distribution.•Methods of including spatial layers as ...explanatory variables in SDMs were called a priori, while methods of overlapping accessible and suitable areas were called a posteriori methods.•Adding spatial restrictions improve the performance of SDMs by reducing overprediction.•A priori methods performed combined with simpler algorithms, such as GLM.•A posteriori methods were efficient reducing overprediction with the exception of one method which increase underprediction.
Species distribution models can be affected by overprediction when dispersal movement is not incorporated into the modelling process. We compared the efficiency of seven methods that take into account spatial constraints to reduce overprediction when using four algorithms for species distribution models. By using a virtual ecologist approach, we were able to measure the accuracy of each model in predicting actual species distributions. We built 40 virtual species distributions within the Neotropical realm. Then, we randomly sampled 50 occurrences that were used in seven spatially restricted species distribution models (hereafter called M-SDMs) and a non-spatially restricted ecological niche model (ENM). We used four algorithms; Maximum Entropy, Generalized Linear Models, Random Forest, and Support Vector Machine. M-SDM methods were divided into a priori methods, in which spatial restrictions were inserted with environmental variables in the modelling process, and a posteriori methods, in which reachable and suitable areas were overlapped. M-SDM efficiency was obtained by calculating the difference in commission and omission errors between M-SDMs and ENMs. We used linear mixed-effects models to test if differences in commission and omission errors varied among the M-SDMs and algorithms. Our results indicate that overall M-SDMs reduce overprediction with no increase in underprediction compared to ENMs with few exceptions, such as a priori methods combined with the Support Vector Machine algorithm. There is a high variation in modelling performance among species, but there were only a few cases in which overprediction or underprediction increased. We only compared methods that do not require species dispersal data, guaranteeing that they can be applied to less-studied species. We advocate that species distribution modellers should not ignore spatial constraints, especially because they can be included in models at low costs but high benefits in terms of overprediction reduction.