UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Estimation of missing Ellen...
    Leccese, Letizia; Fanelli, Giuliano; Cambria, Vito Emanuele; Massimi, Marco; Attorre, Fabio; Alfò, Marco; Aćić, Svetlana; Bergmeier, Erwin; Čarni, Andraž; Cuk, Mirjana; Custerevska, Renata; Dimopoulos, Panayotis; Hoda, Petrit; Mullaj, Alfred; Šilc, Urban; Skvorc, Zeljko; Stancic, Zvjezdana; Dajic Stevanovic, Zora; Tzonev, Rossen; Vassilev, Kiril; Malatesta, Luca; De Sanctis, Michele

    Ecological indicators, March 2024, 2024-03-00, 2024-03-01, Letnik: 160
    Journal Article

    •Estimating missing Ellenberg Indicator Values (EIV) could help plant ecology studies.•We tested and compared several methods for estimating missing EIV from existing data.•Multiple Linear Regression and k-Nearest Neighbour performed better than the others.•Statistical methods are more effective than imputation based on expert knowledge.•This approach would greatly facilitate monitoring species with unknown EIV. Ellenberg indicator values (EIV) are widely used in vegetation ecology, but the values for many species in Southeastern Europe are not available due to incomplete knowledge of their ecology: it is therefore of paramount importance to estimate missing values in existing databases. The entire EIV set for a single species can be missing or a single EIV can be missing for species for which other indicator values are available. Our aim here is to provide a simple method to impute missing values for species who have missing data in a single or multiple EIV. For this purpose, we adopt a multiple imputation procedure and compare a number of imputation methods on the basis of two datasets: i) “indices”, the set of 9 Ellenberg indicators taken from literature, available for 10,824 species and ii) “vegetation”, a set describing the physical and climatic characteristics (Light, Temperature, Continentality, Soil moisture, Nitrogen, Soil pH, Hemeroby index, Humidity, Organic_matter) of 29,935 relevés from Southeastern Europe where at least one tree species is present. The imputation methods we considered are: k-Nearest Neighbour, multiple linear regression (with or without collinearity correction), Reprediction Algorithm, Weighted Averaging (WA) and Weighted Averaging Partial Least Squares (WAPLS) regression. The different methods of imputation were compared by looking at the output produced and its deviation from the “true” observed values for a set of species with known EIVs. We have considered a set of species with known EIVs and proceeded to multiple imputation using the methods above; as a measure of performance we adopted the mean squared error (MSE) estimate, and expert judgement of ecological consistency. Models based on Regression and k-Nearest Neighbour seem to outperform the others. On the contrary, Reprediction algorithm in its different forms: produced less satisfactory results. Imputation of missing values is generally based on expert knowledge or on some variant of weighted averaging (also known as Hill’s method). Here we show that other methods may be more effective and should be appropriately considered by vegetation scientists, since those may allow the application of EIVs in other biogeographic regions.