Species extinction risk status is critical to support conservation actions. However, full assessments published on the Red List are slow and resource intensive. To tackle assessments for mega-diverse ...groups, gains can be made through preliminary assessments that can help prioritize efforts toward full assessments. Here, we quantified how incomplete data collation and errors in the taxonomic, spatial, and temporal dimensions of species-occurrence data translate into misclassifications of extinction risk. Using a dataset of >30 million records of terrestrial plants occurring in Brazil compiled from nine databases we conducted preliminary risk assessments for ~94 % of the 6046 species assessed by the Brazilian Red List authority. We found that no unique database contained data sufficient to perform extinction risk assessment of all species; e.g., the risk of 78 % of species can be assessed using data from GBIF. The overall accuracy (66–75 %) and specificity (89–98 %, correct prediction of non-threatened species) were less affected by incomplete data collation and issues in species-occurrence records. Sensitivity rates (correct prediction of threatened species) were commonly low to moderate and strongly affected by incomplete data collation (13–47 %) and spatial issues (38 %). Our results demonstrate that species' preliminary risk assessments have high accuracy in identifying non-threatened species, even when data collection is low and in the presence of issues in species occurrence data highlighting that such an approach can be used to efficiently prioritize species for full Red List assessments. In addition, caution is needed before declaring a species as threatened without considering data collation intensity and quality.
The increase in online and openly accessible biodiversity databases provides a vast and invaluable resource to support research and policy. However, without scrutiny, errors in primary species ...occurrence data can lead to erroneous results and misleading information.
Here, we introduce the Biodiversity Data Cleaning (bdc), an R package to address quality issues and improve the fitness‐for‐use of biodiversity datasets. The bdc package brings together several aspects of biodiversity data cleaning in one place. It is organized in thematic modules related to different biodiversity dimensions, including (a) Merge datasets: standardization and integration of different datasets; (b) Pre‐filter: flagging and removal of invalid or non‐interpretable information, followed by data amendments; (c) Taxonomy: cleaning, parsing and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; (d) Space: flagging of erroneous, suspect and low‐precision geographic coordinates; and (e) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, the package contains features to visualize, document and report data quality—which is essential for making data quality assessment transparent and reproducible. The modules illustrated, and functions within, were linked to form a proposed reproducible workflow that can also integrate functions from other R packages.
We demonstrated the bdc package's applicability in cleaning more than 30 million occurrence records for terrestrial plant species in Brazil. We found that around one‐fifth of the original datasets hold the standard quality requirements.
Compared to other available R packages, the main strengths of the bdc package are that it brings together available tools—and a series of new ones—to assess the quality of different dimensions of biodiversity data into a single and flexible toolkit. The functions can be applied to many taxonomic groups, datasets (including regional or local repositories), countries, or world‐wide. We hope the bdc package can facilitate the data cleaning process and catalyse improvements to allow the wise and efficient use of primary biodiversity data.
In recent years, new settlement mapping products have become available at the global and continental scale. Although accuracy assessments have indicated the high quality of these products, ...assessments were performed mainly on urban areas. However, there is also a need to monitor rural settlement development, as it is often located in proximity to biodiversity hotspots. In this paper, we verified the suitability of three settlement products (i.e., Global Urban Footprint - GUF, European Settlement Map - ESM and Open Street Map - OSM) to detect rural settlements in the Carpathian ecoregion. Two independent accuracy assessments indicated that the GUF captures rural settlements most effectively (overall accuracy - OA - 65.4% and 92.5% depending on the procedure). In contrast, the ESM overestimated settlements (OA - 49.5% and 90.8%), while the OSM (OA - 61.2% and 90.2%) was the most inconsistent source of settlement data. A regional comparison indicated some deviations from these accuracies, reflecting the variability of settlement structures within the study area. This study highlights that although the GUF was the best-performing product in mapping rural settlements across the whole study area, the settlement information it provided was rather conservative, and rural settlements are still insufficiently represented in all tested datasets.
Geo-tagged photographs are used increasingly as a source of Volunteered Geographic Information (VGI), which could potentially be used for land use and land cover applications. The purpose of this ...paper is to analyze the feasibility of using this source of spatial information for three use cases related to land cover: Calibration, validation and verification. We first provide an inventory of the metadata that are collected with geo-tagged photographs and then consider what elements would be essential, desirable, or unnecessary for the aforementioned use cases. Geo-tagged photographs were then extracted from Flickr, Panoramio and Geograph for an area of London, UK, and classified based on their usefulness for land cover mapping including an analysis of the accompanying metadata. Finally, we discuss protocols for geo-tagged photographs for use of VGI in relation to land cover applications.
All digital data contain error and many are uncertain. Digital models of elevation surfaces consist of files containing large numbers of measurements representing the height of the surface of the ...earth, and therefore a proportion of those measurements are very likely to be subject to some level of error and uncertainty. The collection and handling of such data and their associated uncertainties has been a subject of considerable research, which has focused largely upon the description of the effects of interpolation and resolution uncertainties, as well as modelling the occurrence of errors. However, digital models of elevation derived from new technologies employing active methods of laser and radar ranging are becoming more widespread, and past research will need to be re-evaluated in the near future to accommodate such new data products. In this paper we review the source and nature of errors in digital models of elevation, and in the derivatives of such models. We examine the correction of errors and assessment of fitness for use, and finally we identify some priorities for future research.
This article presents a new methodology for data fitness-for-use assessment. Most current measures of data quality rely on metadata and other data producer-derived information. This creates a void of ...options for a user-driven assessment of data quality when metadata are sparse or unavailable, as is often the case with citizen science and volunteered geographic information. This article puts forward data fitness-for-use (DaFFU), a method that can be adapted for a wide range of data uses. Using the mathematical framework of multiple criteria decision making (MCDM), we create a method to select the best data set from multiple options using a select set of user criteria. The DaFFU methodology is demonstrated with both a simple exemplar and a detailed case study for watershed management. The simple exemplar illustrates how varying parameters and weights influence the outcome. The case study on watershed management considers four possible data sets and six data quality criteria for wetland delineation and an application toward watershed nitrogen retention, each of which has a claim on being of the "best" quality, depending on which data quality aspect the user evaluates. The DaFFU methodology allows the user to consider these data in terms of how they will be used and to use selected data quality measures. Case study results show this methodology is a robust and flexible approach to quantitatively assessing multiple data sets in terms of their intended use.
Volunteered Geographic Information (VGI) phenomena offer an alternative or supplement to the authoritative mechanism of geospatial data acquisition. It allows people without professional geospatial ...skills or knowledge to participate in the geospatial data collection. VGI has been boosted by recent advances in geospatial technology and applications. VGI applications have shown great potential in various areas such as disaster management and public health. However, VGI suffers from a lack of quality assurance, because VGI contributors may lack knowledge of the geospatial domain and credibility. Moreover, VGI data may have different levels of detail and precision, and may have been collected for different purposes. Appropriate VGI data for a specific application may be less appropriate for another application. End-users may use VGI data without being aware of its appropriateness to their requirements. This may cause a risk that arises when end-users inappropriately use, or be uncertain about using VGI data in their applications. This risk may undermine the VGI project in which end-users are involved. This paper proposes an approach that aims to enhance VGI quality assurance by measuring the spatio-semantic similarity between user requirements and provided VGI, and to evaluate the fitness-for-use of VGI. The proposed approach is based on an algorithm to help VGI end-users to deal with the risks related to the quality of VGI data. The approach helps end-user make appropriate decisions about VGI data (e.g., considering or not considering VGI data and being careful when using VGI data).
Mine closure in the Witwatersrand Goldfields of South Africa has resulted in an acid mine drainage (AMD) legacy that is difficult to manage and costly to address. As a short-term measure, three large ...high-density sludge (HDS) plants were erected that treat 185 megalitres of AMD per day (ML/day), at great cost to taxpayers. Longer-term solutions are sought, as the salt load to the Vaal River System is unacceptable. Long-term modelling was used to assess whether the untreated and HDS-treated AMD could be used for irrigation and to determine the scale of the potential opportunity. The Goldfields waters are not very acidic, and simulations indicate it should be feasible to utilise even the untreated water for irrigation, especially if growers commit to applying limestone to their fields. HDS treatment lowers the corrosivity and trace element concentrations, and because the water is gypsiferous, double cropping will precipitate more than a third of the salts in solution as gypsum in the soil profile, thereby reducing salt load to the water environment. The potential irrigated area depends on the cropping system; it is about 9000 ha for rotational cropping and 30,000 ha for supplemental maize irrigation. It is prudent to seriously consider irrigation as a potential long-term water management option for the Goldfields AMD.
The field of biodiversity informatics is in a massive, "grow-out" phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless ...remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data "leakage" or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
Fitness for use information should be stored to enable easy identification of data objects that are suitable for re-use - a feature which can only be assessed by the data user. With the described ...Quality Maturity Matrix (QMM), we want to provide a metric for a discrete measurement of the fitness for use of data objects. We use the data maturity to describe the degree of formalization and standardization of the data with respect to the quality of data and metadata. The data objects mature as they pass through the different post-production steps where they undergo different curation measures. The higher the maturity and the level in the QMM, the easier is it for the user to judge the appropriateness of the data for a possible re-use. For our development of the Quality Maturity Matrix we link the maturity levels to the five phases concept, production/processing, project collaboration/intended use, long-term archiving, and impact re-use. Each of the five levels is measured with regard to the four criteria consistency, completeness, accessibility, and accuracy. For the description we use the terms of the Open Archival Information System (OAIS). We relate our data focused QMM to some existing maturity matrices which put the focus on the maturity of the curation process rather than of the data objects themselves. In addition, we make an attempt to establish a connection between the QMM criteria of data assessment and the FAIR Data principles. Keywords: data management, Fitness for Use, data production steps, Quality Maturity Matrix, FAIR Data Principles