Data integration enables global biodiversity synthesis Heberling, J Mason; Miller, Joseph T; Noesgaard, Daniel ...
Proceedings of the National Academy of Sciences - PNAS,
02/2021, Volume:
118, Issue:
6
Journal Article
Peer reviewed
Open access
The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale ...public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world's largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.
Abstract
UNITE (https://unite.ut.ee/) is a web-based database and sequence management environment for the molecular identification of fungi. It targets the formal fungal barcode—the nuclear ribosomal ...internal transcribed spacer (ITS) region—and offers all ∼1 000 000 public fungal ITS sequences for reference. These are clustered into ∼459 000 species hypotheses and assigned digital object identifiers (DOIs) to promote unambiguous reference across studies. In-house and web-based third-party sequence curation and annotation have resulted in more than 275 000 improvements to the data over the past 15 years. UNITE serves as a data provider for a range of metabarcoding software pipelines and regularly exchanges data with all major fungal sequence databases and other community resources. Recent improvements include redesigned handling of unclassifiable species hypotheses, integration with the taxonomic backbone of the Global Biodiversity Information Facility, and support for an unlimited number of parallel taxonomic classification systems.
We investigated the interaction between fungal communities of soil and dead wood substrates. For this, we applied molecular species identification and stable isotope tracking to both soil and ...decaying wood in an unmanaged boreal Norway spruce-dominated stand. Altogether, we recorded 1990 operational taxonomic units, out of which more than 600 were shared by both substrates and 589 were found to exclusively inhabit wood. On average the soil was more species-rich than the decaying wood, but the species richness in dead wood increased monotonically along the decay gradient, reaching the same species richness and community composition as soil in the late stages. Decaying logs at all decay stages locally influenced the fungal communities from soil, some fungal species occurring in soil only under decaying wood. Stable isotope analyses suggest that mycorrhizal species colonising dead wood in the late decay stages actively transfer nitrogen and carbon between soil and host plants. Most importantly, Piloderma sphaerosporum and Tylospora sp. mycorrhizal species were highly abundant in decayed wood. Soil- and wood-inhabiting fungal communities interact at all decay phases of wood that has important implications in fungal community dynamics and thus nutrient transportation.
ABSTRACT
Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential ...Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence‐only or presence–absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi‐source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter‐ or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi‐source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA‐based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals.
Biodiversity loss is a major challenge. Over the past century, the average rate of vertebrate extinction has been about 100-fold higher than the estimated background rate and population declines ...continue to increase globally. Birth and death rates determine the pace of population increase or decline, thus driving the expansion or extinction of a species. Design of species conservation policies hence depends on demographic data (e.g., for extinction risk assessments or estimation of harvesting quotas). However, an overview of the accessible data, even for better known taxa, is lacking. Here, we present the Demographic Species Knowledge Index, which classifies the available information for 32,144 (97%) of extant described mammals, birds, reptiles, and amphibians. We show that only 1.3% of the tetrapod species have comprehensive information on birth and death rates. We found no demographic measures, not even crude ones such as maximum life span or typical litter/clutch size, for 65% of threatened tetrapods. More field studies are needed; however, some progress can be made by digitalizing existing knowledge, by imputing data from related species with similar life histories, and by using information from captive populations. We show that data from zoos and aquariums in the Species360 network can significantly improve knowledge for an almost eightfold gain. Assessing the landscape of limited demographic knowledge is essential to prioritize ways to fill data gaps. Such information is urgently needed to implement management strategies to conserve at-risk taxa and to discover new unifying concepts and evolutionary relationships across thousands of tetrapod species.
Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits ...will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction.
Advancements in environmental DNA (eDNA) metabarcoding have revolutionised our capacity to assess biodiversity, especially for cryptic or less-studied organisms, such as fungi, bacteria and ...micro-invertebrates. Despite its cost-effectiveness, the spatial selection for sampling sites remains a critical challenge due to the considerable time and resources required for processing and analysing eDNA samples. This study introduces a Biodiversity Digital Twin Prototype, aimed at optimising the selection and prioritisation of eDNA sampling locations. Leveraging available eDNA data and integrating user-defined criteria, this digital twin facilitates informed decision-making in selecting future sampling sites. Through the development of an associated data formatting tool, we also facilitate the accessibility and utility of DNA metabarcoding data for broader conservation efforts. This prototype will serve multiple end-users, from researchers and monitoring initiatives to commercial enterprises, by providing an intuitive interface for interactive exploration and prioritisation, based on estimated complementarity of future samples. The prototype offers a scalable approach to biodiversity sampling. Ultimately, this tool aims to refine our understanding of global biodiversity patterns and support targeted conservation strategies through efficient eDNA sampling.
Fungi and Coleoptera are among the most evolutionarily successful and diverse heterotrophic organisms in the world. Due to their unique adaptive capacities, fungi and beetles co-occur and interact in ...various terrestrial habitats. In addition to commensal and mutualistic fungus–beetle relations, combative interactions involve aggressors from both sides such as entomopathogenic fungi and fungivorous beetles. Fungivory, most commonly in combination with saprophagy and xylophagy, is characteristic of many families of Coleoptera. The resource-exploiting fungal mycelia are most frequently consumed by beetles together with the woody substrata. The focus of the present review is on Coleoptera with larvae or adults feeding on a primarily fungal diet: fruit bodies and spores.
Uzbekistan, located in Central Asia, harbors high diversity of woody plants. Diversity of wood-inhabiting fungi in the country, however, remained poorly known. This study summarizes the ...wood-inhabiting basidiomycte fungi (poroid and corticoid fungi plus similar taxa such as
, and
) (Agaricomycetes, Basidiomycota) that have been found in Uzbekistan from 1950 to 2020. This work is based on 790 fungal occurrence records: 185 from recently collected specimens, 101 from herbarium specimens made by earlier collectors, and 504 from literature-based records. All data were deposited as a species occurrence record dataset in the Global Biodiversity Information Facility and also summarized in the form of an annotated checklist in this paper. All 286 available specimens were morphologically examined. For 138 specimens, the 114 ITS and 85 LSU nrDNA sequences were newly sequenced and used for phylogenetic analysis. In total, we confirm the presence of 153 species of wood-inhabiting poroid and corticioid fungi in Uzbekistan, of which 31 species are reported for the first time in Uzbekistan, including 19 that are also new to Central Asia. These 153 fungal species inhabit 100 host species from 42 genera of 23 families. Polyporales and Hymenochaetales are the most recorded fungal orders and are most widely distributed around the study area. This study provides the first comprehensively updated and annotated the checklist of wood-inhabiting poroid and corticioid fungi in Uzbekistan. Such study should be expanded to other countries to further clarify species diversity of wood-inhabiting fungi around Central Asia.