Proportional data, in which response variables are expressed as percentages or fractions of a whole, are analysed in many subfields of ecology and evolution. The scale‐independence of proportions ...makes them appropriate to analyse many biological phenomena, but statistical analyses are not straightforward, since proportions can only take values from zero to one and their variance is usually not constant across the range of the predictor. Transformations to overcome these problems are often applied, but can lead to biased estimates and difficulties in interpretation.
In this paper, we provide an overview of the different types of proportional data and discuss the different analysis strategies available. In particular, we review and discuss the use of promising, but little used, techniques for analysing continuous (also called non‐count‐based or non‐binomial) proportions (e.g. percent cover, fraction time spent on an activity): beta and Dirichlet regression, and some of their most important extensions.
A major distinction can be made between proportions arising from counts and those arising from continuous measurements. For proportions consisting of two categories, count‐based data are best analysed using well‐developed techniques such as logistic regression, while continuous proportions can be analysed with beta regression models. In the case of >2 categories, multinomial logistic regression or Dirichlet regression can be applied. Both beta and Dirichlet regression techniques model proportions at their original scale, which makes statistical inference more straightforward and produce less biased estimates relative to transformation‐based solutions. Extensions to beta regression, such as models for variable dispersion, zero‐one augmented data and mixed effects designs have been developed and are reviewed and applied to case studies. Finally, we briefly discuss some issues regarding model fitting, inference, and reporting that are particularly relevant to beta and Dirichlet regression.
Beta regression and Dirichlet regression overcome some problems inherent in applying classic statistical approaches to proportional data. To facilitate the adoption of these techniques by practitioners in ecology and evolution, we present detailed, annotated demonstration scripts covering all variations of beta and Dirichlet regression discussed in the article, implemented in the freely available language for statistical computing, r.
Foreign Language 抽象
在生态学和进化学的许多子领域中分析比例数据时,其中的响应变量被表示为整体的百分比或分数。比例相对于数据尺度的独立性使得其适用于分析许多生物学现象。但是由于比例只能从0到1取值,并且它们的方差在预测值的范围内通常不恒定,使得统计分析结果不具有直观性。为了克服上述问题,研究者通常采用数学变换等方法,但也可能导致有有偏估计和解释上的困难。
在本文中,我们概述了不同类型的比例数据,并讨论了现行的不同类型的分析方法,特别是一些用来分析连续(也称为非计数或非二项式)比例(例如,百分比覆盖,动物特定行为时间比例):β回归和Dirichlet回归,以及它们最重要的一些扩展。目前,虽然这些方法的使用范围较窄,但是我们认为它们有着广泛的应用前景。
可以对计数产生的比例和连续测量产生的比例进行区分。对于由2个类别组成的比例,若数据是计数的,可以使用例如逻辑回归等完善的方法进行分析,若数据是连续的,可以使用β回归模型。对于类别大于2的比例,可以使用多项逻辑回归或Dirichlet回归。β回归和Dirichlet回归均在数据的原始尺度上对比例建模,这不仅使得统计推断更加简单,并且相对数学变换的方法能产生更少的有偏估计。我们对β回归的扩展方法做了概述,例如变量扩散模型、0‐1增强数据和混合效应设计,并将这些方法应用于案例研究。最后,我们简要地讨论了与β回归和Dirichlet回归特别相关的模型拟合和统计推断。
β回归和Dirichlet回归克服了将经典统计方法应用于比例数据时固有的一些问题。为了帮助生态学和进化学研究者采用这些技术,我们提供了详细的、带注释的演示脚本,涵盖了文章中讨论的所有β回归和Dirichlet回归变体,可以用免费的统计学计算机语言R来实现。
Dynamic Global Vegetation Models (DGVMs) are indispensable for our understanding of climate change impacts. The application of traits in DGVMs is increasingly refined. However, a comprehensive ...analysis of the direct impacts of trait variation on global vegetation distribution does not yet exist. Here, we present such analysis as proof of principle. We run regressions of trait observations for leaf mass per area, stem-specific density, and seed mass from a global database against multiple environmental drivers, making use of findings of global trait convergence. This analysis explained up to 52% of the global variation of traits. Global trait maps, generated by coupling the regression equations to gridded soil and climate maps, showed up to orders of magnitude variation in trait values. Subsequently, nine vegetation types were characterized by the trait combinations that they possess using Gaussian mixture density functions. The trait maps were input to these functions to determine global occurrence probabilities for each vegetation type. We prepared vegetation maps, assuming that the most probable (and thus, most suited) vegetation type at each location will be realized. This fully traits-based vegetation map predicted 42% of the observed vegetation distribution correctly. Our results indicate that a major proportion of the predictive ability of DGVMs with respect to vegetation distribution can be attained by three traits alone if traits like stem-specific density and seed mass are included. We envision that our traits-based approach, our observation-driven trait maps, and our vegetation maps may inspire a new generation of powerful traits-based DGVMs.
Significance Models on vegetation dynamics are indispensable for our understanding of climate change impacts. These models contain variables describing vegetation attributes, so-called traits. However, the direct impacts of trait variation on global vegetation distribution are unknown. We derived global trait maps based on information on environmental drivers. Subsequently, we characterized nine globally representative vegetation types based on their trait combinations and could make valid predictions of their global occurrence probabilities based on trait maps. This study provides a proof of concept for the link between plant traits and vegetation types, stimulating enhanced application of trait-based approaches in vegetation modeling. We envision that our approach, our observation-driven trait maps, and vegetation maps may inspire a new generation of powerful traits-based vegetation models.
We explain how to obtain a generalized maximum-likelihood chi-square statistic,
X
M
L
2
, and a full-model Akaike Information Criterion (AIC) statistic for piecewise structural equation modeling ...(SEM); that is, structural equations without latent variables whose causal topology can be represented as a directed acyclic graph (DAG). The full piecewise SEM is decomposed into submodels as a Markov network, each of which can have different distributional assumptions or functional links and that can be modeled by any method that produces maximum-likelihood parameter estimates. The generalized
X
M
L
2
is a function of the difference in the maximum likelihoods of the model and its saturated equivalent and the full-model AIC is calculated by summing the AIC statistics of each of the submodels.
We provide a generic method of testing path models that include dependent errors, nonlinear functional relationships and using nonnormal, hierarchically structured data. First, we provide a ...decomposition of the causal model into smaller, independent sets. These sets can be modeled independently of each other with methods that respect the type of data in these sets. Second, we introduce copulas to model the dependent errors between non-normally distributed variables. Our method yields identical results as classical covariance-based path modelling when meeting its assumptions of linearity and normality, outperforms classical SEM given nonlinear functional relationships, and can easily accommodate any parametric probability function and nonlinear functional relationships.
Semi-natural habitats are integral to most agricultural areas and have the potential to support ecosystem services, especially biological control and pollination by supplying resources for the ...invertebrates providing these services and for soil conservation by preventing erosion and run-off. Some habitats are supported through agri-environment scheme funding in the European Union, but their value for ecosystem service delivery has been questioned. An improved understanding of previous research approaches and outcomes will contribute to the development of more sustainable farming systems, improve experimental designs and highlight knowledge gaps especially for funders and researchers. Here we compiled a systematic map to allow for the first time a review of the quantity of evidence collected in Europe that semi-natural habitats support biological control, pollination and soil conservation. A literature search selected 2252 publications, and, following review, 270 met the inclusion criteria and were entered into the database. Most publications were of pest control (143 publications) with less on pollination (78 publications) or soil-related aspects (31). For pest control and pollination, most publications reported a positive effect of semi-natural habitats. There were weaknesses in the evidence base though because of bias in study location and the crops, whilst metrics (e.g. yield) valued by end users were seldom measured. Hedgerows, woodland and grassland were the most heavily investigated semi-natural habitats, and the wider landscape composition was often considered. Study designs varied considerably yet only 24% included controls or involved manipulation of semi-natural habitats. Service providers were commonly measured and used as a surrogate for ecosystem service delivery. Key messages for policymakers and funders are that they should encourage research that includes more metrics required by end users, be prepared to fund longer-term studies (61% were of only 1-year duration) and investigate the role of soils within semi-natural habitats in delivering ecosystem services.
Aim: Most vascular plants on Earth form mycorrhizae, a symbiotic relationship between plants and fungi. Despite the broad recognition of the importance of mycorrhizae for global carbon and nutrient ...cycling, we do not know how soil and climate variables relate to the intensity of colonization of plant roots by mycorrhizal fungi. Here we quantify the global patterns of these relationships. Location: Global. Methods: Data on plant root colonization intensities by the two dominant types of mycorrhizal fungi world-wide, arbuscular (4887 plant species in 233 sites) and ectomycorrhizal fungi (125 plant species in 92 sites), were compiled from published studies. Data for climatic and soil factors were extracted from global datasets. For a given mycorrhizal type, we calculated at each site the mean root colonization intensity by mycorrhizal fungi across all potentially mycorrhizal plant species found at the site, and subjected these data to generalized additive model regression analysis with environmental factors as predictor variables. Results: We show for the first time that at the global scale the intensity of plant root colonization by arbuscular mycorrhizal fungi strongly relates to warm-season temperature, frost periods and soil carbon-to-nitrogen ratio, and is highest at sites featuring continental climates with mild summers and a high availability of soil nitrogen. In contrast, the intensity of ectomycorrhizal infection in plant roots is related to soil acidity, soil carbon-to-nitrogen ratio and seasonality of precipitation, and is highest at sites with acidic soils and relatively constant precipitation levels. Main conclusions: We provide the first quantitative global maps of intensity of mycorrhizal colonization based on environmental drivers, and suggest that environmental changes will affect distinct types of mycorrhizae differently. Future analyses of the potential effects of environmental change on global carbon and nutrient cycling via mycorrhizal pathways will need to take into account the relationships discovered in this study.
Path models, expressed as Directed Acyclic Graphs (DAGs), and the testing of such DAGs via a d-sep test, have become popular because they can incorporate complicated data structures that are ...difficult or impossible to accommodate in classical structural equation modeling. However, d-sep tests cannot accommodate DAGs that include unmeasured (latent) variables. We describe (i) how to convert a DAG with latent variables into an observationally equivalent graph without latents (a Mixed Acyclic Graph, MAG), (ii) how this MAG identifies which latents can/cannot be ignored without changing the causal meaning of the original DAG, and (iii) how to perform the MAG equivalent of a d-sep test.
Plant species diversity in Eurasian wetlands and grasslands depends not only on productivity but also on the relative availability of nutrients, particularly of nitrogen and phosphorus. Here we show ...that the impacts of nitrogen:phosphorus stoichiometry on plant species richness can be explained by selected plant life-history traits, notably by plant investments in growth versus reproduction. In 599 Eurasian sites with herbaceous vegetation we examined the relationship between the local nutrient conditions and community-mean life-history traits. We found that compared with plants in nitrogen-limited communities, plants in phosphorus-limited communities invest little in sexual reproduction (for example, less investment in seed, shorter flowering period, longer lifespan) and have conservative leaf economy traits (that is, a low specific leaf area and a high leaf dry-matter content). Endangered species were more frequent in phosphorus-limited ecosystems and they too invested little in sexual reproduction. The results provide new insight into how plant adaptations to nutrient conditions can drive the distribution of plant species in natural ecosystems and can account for the vulnerability of endangered species.
Wild genetic resources and their ability to adapt to environmental change are critically important in light of the projected climate change, while constituting the foundation of agricultural ...sustainability. To address the expected negative effects of climate change on Robusta coffee trees (Coffea canephora), collecting missions were conducted to explore its current native distribution in Uganda over a broad climatic range. Wild material from seven forests could thus be collected. We used 19 microsatellite (SSR) markers to assess genetic diversity and structure of this material as well as material from two ex-situ collections and a feral population. The Ugandan C. canephora diversity was then positioned relative to the species' global diversity structure. Twenty-two climatic variables were used to explore variations in climatic zones across the sampled forests. Overall, Uganda's native C. canephora diversity differs from other known genetic groups of this species. In northwestern (NW) Uganda, four distinct genetic clusters were distinguished being from Zoka, Budongo, Itwara and Kibale forests A large southern-central (SC) cluster included Malabigambo, Mabira, and Kalangala forest accessions, as well as feral and cultivated accessions, suggesting similarity in genetic origin and strong gene flow between wild and cultivated compartments. We also confirmed the introduction of Congolese varieties into the SC region where most Robusta coffee production takes place. Identified populations occurred in divergent environmental conditions and 12 environmental variables significantly explained 16.3% of the total allelic variation across populations. The substantial genetic variation within and between Ugandan populations with different climatic envelopes might contain adaptive diversity to cope with climate change. The accessions that we collected have substantially enriched the diversity hosted in the Ugandan collections and thus contribute to ex situ conservation of this vital genetic resource. However, there is an urgent need to develop strategies to enhance complementary in-situ conservation of Coffea canephora in native forests in northwestern Uganda.
Signal processing techniques are of vital importance to bring THz spectroscopy to a maturity level to reach practical applications. In this work, we illustrate the use of machine learning techniques ...for THz time-domain spectroscopy assisted by domain knowledge based on light-matter interactions. We aim at the potential agriculture application to determine the amount of free water on plant leaves, so-called leaf wetness. This quantity is important for understanding and predicting plant diseases that need leaf wetness for disease development. The overall transmission of 12,000 distinct water droplet patterns on a plastized leaf was experimentally acquired using THz time-domain spectroscopy. We report on key insights of applying decision trees and convolutional neural networks to the data using physics-motivated choices. Eventually, we discuss the generalizability of these models to determine leaf wetness after testing them on cases with increasing deviations from the training set.