Metabarcoding is an emerging genetic tool to rapidly assess biodiversity in ecosystems. It involves high-throughput sequencing of a standard gene from an environmental sample and comparison to a ...reference database. However, no consensus has emerged regarding laboratory pipelines to screen species diversity and infer species abundances from environmental samples. In particular, the effect of primer bias and the detection limit for specimens with a low biomass has not been systematically examined, when processing samples in bulk. We developed and tested a DNA metabarcoding protocol that utilises the standard cytochrome c oxidase subunit I (COI) barcoding fragment to detect freshwater macroinvertebrate taxa. DNA was extracted in bulk, amplified in a single PCR step, and purified, and the libraries were directly sequenced in two independent MiSeq runs (300-bp paired-end reads). Specifically, we assessed the influence of specimen biomass on sequence read abundance by sequencing 31 specimens of a stonefly species with known haplotypes spanning three orders of magnitude in biomass (experiment I). Then, we tested the recovery of 52 different freshwater invertebrate taxa of similar biomass using the same standard barcoding primers (experiment II). Each experiment was replicated ten times to maximise statistical power. The results of both experiments were consistent across replicates. We found a distinct positive correlation between species biomass and resulting numbers of MiSeq reads. Furthermore, we reliably recovered 83% of the 52 taxa used to test primer bias. However, sequence abundance varied by four orders of magnitudes between taxa despite the use of similar amounts of biomass. Our metabarcoding approach yielded reliable results for high-throughput assessments. However, the results indicated that primer efficiency is highly species-specific, which would prevent straightforward assessments of species abundance and biomass in a sample. Thus, PCR-based metabarcoding assessments of biodiversity should rely on presence-absence metrics.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
A central challenge in the present era of biodiversity loss is to assess and manage human impacts on freshwater ecosystems. Macroinvertebrates are an important group for bioassessment as many taxa ...show specific responses to environmental conditions. However, generating accurate macroinvertebrate inventories based on larval morphology is difficult and error-prone. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify invertebrates in bulk samples to the species level, has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for bioassessment critically relies on carefully evaluating primers. We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 most globally relevant freshwater invertebrate groups for stream assessment. Using these sequence alignments, we developed four primer combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples, each consisting of 52 freshwater invertebrate taxa. Additionally, popular metabarcoding primers from the literature and the developed primers were tested in silico against the 15 relevant invertebrate groups. The developed primers varied in amplification efficiency and the number of detected taxa, yet all detected more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature. We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that some primers currently used in metabarcoding studies may not be suitable for amplification of freshwater macroinvertebrates. Therefore, careful primer evaluation and more region / ecosystem specific primers are needed before DNA metabarcoding can be used for routine bioassessment of freshwater ecosystems.
Summary
DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. Its success is often limited due to variable binding sites that ...introduce amplification biases. Thus, the development of optimized primers for communities or taxa under study in a certain geographic region and/or ecosystems is of critical importance. However, no tool for obtaining and processing of reference sequence data in bulk that can serve as a backbone for primer design is currently available.
We developed the r package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI data bases for specified target taxonomic groups and then applies sequence clustering into operational taxonomic units (OTUs) to reduce biases introduced by the different number of available sequences per species. Additionally, PrimerMiner offers functionalities to evaluate primers in silico, which are in our opinion more realistic than the strategy employed in another available software for that purpose, ecoPCR.
We used PrimerMiner to download cytochrome c oxidase subunit I (COI) sequences for 15 important freshwater invertebrate groups, relevant for ecosystem assessment. By processing COI markers from both data bases, we were able to increase the amount of reference data 249‐fold on average, compared to using complete mitochondrial genomes alone. Furthermore, we visualized the generated OTU sequence alignments and describe how to evaluate primers in silico using PrimerMiner.
With PrimerMiner, we provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. The OTU‐based reference alignments generated with PrimerMiner can be used for manual primer design or processed with bioinformatic tools for primer development.
Environmental bulk samples often contain many different taxa that vary several orders of magnitude in biomass. This can be problematic in DNA metabarcoding and metagenomic high‐throughput sequencing ...approaches, as large specimens contribute disproportionately high amounts of DNA template. Thus, a few specimens of high biomass will dominate the dataset, potentially leading to smaller specimens remaining undetected. Sorting of samples by specimen size (as a proxy for biomass) and balancing the amounts of tissue used per size fraction should improve detection rates, but this approach has not been systematically tested. Here, we explored the effects of size sorting on taxa detection using two freshwater macroinvertebrate bulk samples, collected from a low‐mountain stream in Germany. Specimens were morphologically identified and sorted into three size classes (body size < 2.5 × 5, 5 × 10, and up to 10 × 20 mm). Tissue powder from each size category was extracted individually and pooled based on tissue weight to simulate samples that were not sorted by biomass (“Unsorted”). Additionally, size fractions were pooled so that each specimen contributed approximately equal amounts of biomass (“Sorted”). Mock samples were amplified using four different DNA metabarcoding primer sets targeting the Cytochrome c oxidase I (COI) gene. Sorting taxa by size and pooling them proportionately according to their abundance lead to a more equal amplification of taxa compared to the processing of complete samples without sorting. The sorted samples recovered 30% more taxa than the unsorted samples at the same sequencing depth. Our results imply that sequencing depth can be decreased approximately fivefold when sorting the samples into three size classes and pooling by specimen abundance. Even coarse size sorting can substantially improve taxa detection using DNA metabarcoding. While high‐throughput sequencing will become more accessible and cheaper within the next years, sorting bulk samples by specimen biomass or size is a simple yet efficient method to reduce current sequencing costs.
DNA metabarcoding samples can have highly variable specimen biomass, contributing unequal amounts of DNA in the extraction, potentially introducing biases and possibly taxa remaining undetected. We quantify the effects of unequal specimen sizes, and show how size sorting of metabarcoding samples prior to extraction can increase number of detected taxa and reduce sequencing costs.
Summary
DNA metabarcoding holds great promise for the assessment of macroinvertebrates in stream ecosystems. However, few large‐scale studies have compared the performance of DNA metabarcoding with ...that of routine morphological identification.
We performed metabarcoding using four primer sets on macroinvertebrate samples from 18 stream sites across Finland. The samples were collected in 2013 and identified based on morphology as part of a Finnish stream monitoring program. Specimens were morphologically classified, following standardised protocols, to the lowest taxonomic level for which identification was feasible in the routine national monitoring.
DNA metabarcoding identified more than twice the number of taxa than the morphology‐based protocol, and also yielded a higher taxonomic resolution. For each sample, we detected more taxa by metabarcoding than by the morphological method, and all four primer sets exhibited comparably good performance. Sequence read abundance and the number of specimens per taxon (a proxy for biomass) were significantly correlated in each sample, although the adjusted R2 values were low. With a few exceptions, the ecological status assessment metrics calculated from morphological and DNA metabarcoding datasets were similar. Given the recent reduction in sequencing costs, metabarcoding is currently approximately as expensive as morphology‐based identification.
Using samples obtained in the field, we demonstrated that DNA metabarcoding can achieve comparable assessment results to current protocols relying on morphological identification. Thus, metabarcoding represents a feasible and reliable method to identify macroinvertebrates in stream bioassessment, and offers powerful advantage over morphological identification in providing identification for taxonomic groups that are unfeasible to identify in routine protocols. To unlock the full potential of DNA metabarcoding for ecosystem assessment, however, it will be necessary to address key problems with current laboratory protocols and reference databases.
Most animal species on Earth are insects, and recent reports suggest that their abundance is in drastic decline. Although these reports come from a wide range of insect taxa and regions, the evidence ...to assess the extent of the phenomenon is sparse. Insect populations are challenging to study, and most monitoring methods are labor intensive and inefficient. Advances in computer vision and deep learning provide potential new solutions to this global challenge. Cameras and other sensors can effectively, continuously, and noninvasively perform entomological observations throughout diurnal and seasonal cycles. The physical appearance of specimens can also be captured by automated imaging in the laboratory. When trained on these data, deep learning models can provide estimates of insect abundance, biomass, and diversity. Further, deep learning models can quantify variation in phenotypic traits, behavior, and interactions. Here, we connect recent developments in deep learning and computer vision to the urgent demand for more cost-efficient monitoring of insects and other invertebrates. We present examples of sensor-based monitoring of insects. We show how deep learning tools can be applied to exceptionally large datasets to derive ecological information and discuss the challenges that lie ahead for the implementation of such solutions in entomology. We identify four focal areas, which will facilitate this transformation: 1) validation of image-based taxonomic identification; 2) generation of sufficient training data; 3) development of public, curated reference databases; and 4) solutions to integrate deep learning and molecular tools.
DNA metabarcoding workflows produce hundreds to ten-thousands of Operational Taxonomic Units (OTUs) or Exact Sequence Variants (ESVs) per analysis. In most workflows, a taxonomic assignment to these ...generated sequences is needed. This is typically done using publicly available databases. Especially, yet not exclusively, for Eumetazoan metabarcoding, the Barcode of Life Data system (BOLD) is the most comprehensive and curated reference barcode database and, therefore, typically the first choice for taxonomic assignment. While an application programme interface (API) exists to query data in large batches, no information on the many and important unpublished data are obtained through the API. The alternative approach using the BOLD identification engine on the website provides full access, yet it is restricted to 100 sequences at once. We developed a small platform-independent and graphical user interface (GUI) software package, BOLDigger, which aims to solve this problem by automating the process of sending successive requests of up to 100 sequences without surpassing the capacities of BOLD. BOLDigger can be used to download the results of the identification engine, as well as metadata for the obtained hits. For the selection of the best fitting hit, three different methods are implemented. A new approach, combining a threshold-based approach with the metadata information, was implemented to make use of the metadata.
The bioassessment of aquatic ecosystems is currently based on various biotic indices that use the occurrence and/or abundance of selected taxonomic groups to define ecological status. These ...conventional indices have some limitations, often related to difficulties in morphological identification of bioindicator taxa. Recent development of DNA barcoding and metabarcoding could potentially alleviate some of these limitations, by using DNA sequences instead of morphology to identify organisms and to characterize a given ecosystem. In this paper, we review the structure of conventional biotic indices, and we present the results of pilot metabarcoding studies using environmental DNA to infer biotic indices. We discuss the main advantages and pitfalls of metabarcoding approaches to assess parameters such as richness, abundance, taxonomic composition and species ecological values, to be used for calculation of biotic indices. We present some future developments to fully exploit the potential of metabarcoding data and improve the accuracy and precision of their analysis. We also propose some recommendations for the future integration of DNA metabarcoding to routine biomonitoring programs.
Display omitted
•Current biomonitoring approaches are widely used but have some limitations.•DNA metabarcoding provides a new complementary tool for biomonitoring.•Metabarcoding allows extending the range of taxa used as bioindicators.•Metabarcoding data could be used to establish molecular metrics and indices.•Future work should standardise procedures and improve data analysis.
Effective identification of species using short DNA fragments (DNA barcoding and DNA metabarcoding) requires reliable sequence reference libraries of known taxa. Both taxonomically comprehensive ...coverage and content quality are important for sufficient accuracy. For aquatic ecosystems in Europe, reliable barcode reference libraries are particularly important if molecular identification tools are to be implemented in biomonitoring and reports in the context of the EU Water Framework Directive (WFD) and the Marine Strategy Framework Directive (MSFD). We analysed gaps in the two most important reference databases, Barcode of Life Data Systems (BOLD) and NCBI GenBank, with a focus on the taxa most frequently used in WFD and MSFD. Our analyses show that coverage varies strongly among taxonomic groups, and among geographic regions. In general, groups that were actively targeted in barcode projects (e.g. fish, true bugs, caddisflies and vascular plants) are well represented in the barcode libraries, while others have fewer records (e.g. marine molluscs, ascidians, and freshwater diatoms). We also found that species monitored in several countries often are represented by barcodes in reference libraries, while species monitored in a single country frequently lack sequence records. A large proportion of species (up to 50%) in several taxonomic groups are only represented by private data in BOLD. Our results have implications for the future strategy to fill existing gaps in barcode libraries, especially if DNA metabarcoding is to be used in the monitoring of European aquatic biota under the WFD and MSFD. For example, missing species relevant to monitoring in multiple countries should be prioritized for future collaborative programs. We also discuss why a strategy for quality control and quality assurance of barcode reference libraries is needed and recommend future steps to ensure full utilisation of metabarcoding in aquatic biomonitoring.
Display omitted
•DNA barcode representation in public databases of 28,000 aquatic species is analysed.•Gaps in barcode reference libraries are largest for diatoms and invertebrates.•Sequence coverage varies considerably among invertebrate groups.•Species monitored by one or few countries more frequently lack reference barcodes.•Strategies should be implemented to maintain quality of barcode reference libraries.
DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads ...to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal.
This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package "JAMP" and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates.
We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177-200 OTUs, each containing an average of 2.40-3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stonefly
and the caddisfly
showed a distinct north-south cline with respect to haplotype distribution, while the beetle
and the isopod
displayed no clear population pattern but differed in genetic diversity.
We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity.