The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. ...The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.
Biological monitoring has failed to develop from simple binary assessment outcomes of the impacted/unimpacted type, towards more diagnostic frameworks, despite significant scientific effort over the ...past fifty years. It is our assertion that this is largely because of the limited information content of biological samples processed by traditional morphology‐based taxonomy, which is a slow, imprecise process, focused on restricted groups of organisms. We envision a new paradigm in ecosystem assessment, which we refer to as ‘Biomonitoring 2.0’. This new schema employs DNA‐based identification of taxa, coupled with high‐throughput DNA sequencing on next‐generation sequencing platforms. We discuss the transformational nature of DNA‐based approaches in biodiversity discovery and ecosystem assessment and outline a path forward for their future widespread application.
We introduce a method for assigning names to CO1 metabarcode sequences with confidence scores in a rapid, high-throughput manner. We compiled nearly 1 million CO1 barcode sequences appropriate for ...classifying arthropods and chordates. Compared to our previous Insecta classifier, the current classifier has more than three times the taxonomic coverage, including outgroups, and is based on almost five times as many reference sequences. Unlike other popular rDNA metabarcoding markers, we show that classification performance is similar across the length of the CO1 barcoding region. We show that the RDP classifier can make taxonomic assignments about 19 times faster than the popular top BLAST hit method and reduce the false positive rate from nearly 100% to 34%. This is especially important in large-scale biodiversity and biomonitoring studies where datasets can become very large and the taxonomic assignment problem is not trivial. We also show that reference databases are becoming more representative of current species diversity but that gaps still exist. We suggest that it would benefit the field as a whole if all investigators involved in metabarocoding studies, through collaborations with taxonomic experts, also planned to barcode representatives of their local biota as a part of their projects.
Since 2005, advances in next‐generation sequencing technologies have revolutionized biological science. The analysis of environmental DNA through the use of specific gene markers such as ...species‐specific DNA barcodes has been a key application of next‐generation sequencing technologies in ecological and environmental research. Access to parallel, massive amounts of sequencing data, as well as subsequent improvements in read length and throughput of different sequencing platforms, is leading to a better representation of sample diversity at a reasonable cost. New technologies are being developed rapidly and have the potential to dramatically accelerate ecological and environmental research. The fast pace of development and improvements in next‐generation sequencing technologies can reflect on broader and more robust applications in environmental DNA research. Here, we review the advantages and limitations of current next‐generation sequencing technologies in regard to their application for environmental DNA analysis.
Multi-marker metabarcoding is increasingly being used to generate biodiversity information across different domains of life from microbes to fungi to animals such as for molecular ecology and ...biomonitoring applications in different sectors from academic research to regulatory agencies and industry. Current popular bioinformatic pipelines support microbial and fungal marker analysis, while ad hoc methods are often used to process animal metabarcode markers from the same study. MetaWorks provides a harmonized processing environment, pipeline, and taxonomic assignment approach for demultiplexed Illumina reads for all biota using a wide range of metabarcoding markers such as 16S, ITS, and COI. A Conda environment is provided to quickly gather most of the programs and dependencies for the pipeline. Several workflows are provided such as: taxonomically assigning exact sequence variants, provides an option to generate operational taxonomic units, and facilitates single-read processing. Pipelines are automated using Snakemake to minimize user intervention and facilitate scalability. All pipelines use the RDP classifier to provide taxonomic assignments with confidence measures. We extend the functionality of the RDP classifier for taxonomically assigning 16S (bacteria), ITS (fungi), and 28S (fungi), to also support COI (eukaryotes), rbcL (eukaryotes, land plants, diatoms), 12S (fish, vertebrates), 18S (eukaryotes, diatoms) and ITS (fungi, plants). MetaWorks properly handles ITS by trimming flanking conserved rRNA gene regions as well as protein coding genes by providing two options for removing obvious pseudogenes. MetaWorks can be downloaded from
The use of environmental DNA (eDNA) in biodiversity assessments offers a step-change in sensitivity, throughput and simultaneous measures of ecosystem diversity and function. There remains, however, ...a need to examine eDNA persistence in the wild through simultaneous temporal measures of eDNA and biota. Here, we use metabarcoding of two markers of different lengths, derived from an annual time series of aqueous lake eDNA to examine temporal shifts in ecosystem biodiversity and in an ecologically important group of macroinvertebrates (Diptera: Chironomidae). The analyses allow different levels of detection and validation of taxon richness and community composition (β-diversity) through time, with shorter eDNA fragments dominating the eDNA community. Comparisons between eDNA, community DNA, taxonomy and UK species abundance data further show significant relationships between diversity estimates derived across the disparate methodologies. Our results reveal the temporal dynamics of eDNA and validate the utility of eDNA metabarcoding for tracking seasonal diversity at the ecosystem scale.
Mixed community or environmental DNA marker gene sequencing has become a commonly used technique for biodiversity analyses in freshwater systems. Many cytochrome c oxidase subunit I (COI) primer sets ...are now available for such work. The purpose of this study is to test whether COI primer choice affects the recovery of arthropod richness, beta diversity, and recovery of target assemblages in the benthos kick-net samples typically used in freshwater biomonitoring. We examine six commonly used COI primer sets on samples collected from six freshwater sites. Biodiversity analyses show that richness is sensitive to primer choice and the combined use of multiple COI amplicons recovers higher richness. Thus, to recover maximum richness, multiple primer sets should be used with COI metabarcoding. In ordination analyses based on community dissimilarity, samples consistently cluster by site regardless of amplicon choice or PCR replicate. Thus, for broadscale community analyses, overall beta diversity patterns are robust to COI marker choice. Recovery of traditional freshwater bioindicator assemblages such as Ephemeroptera, Trichoptera, Plectoptera, and Chironomidae as well as Arthropoda site indicators were differentially detected by each amplicon tested. This work will help future biodiversity and biomonitoring studies develop not just standardized, but optimized workflows that either maximize taxon-detection or the selection of amplicons for water quality or Arthropoda site indicators.
Environmental DNA (eDNA) metabarcoding is an increasingly popular method for rapid biodiversity assessment. As with any ecological survey, false negatives can arise during sampling and, if ...unaccounted for, lead to biased results and potentially misdiagnosed environmental assessments. We developed a multi-scale, multi-species occupancy model for the analysis of community biodiversity data resulting from eDNA metabarcoding; this model accounts for imperfect detection and additional sources of environmental and experimental variation. We present methods for model assessment and model comparison and demonstrate how these tools improve the inferential power of eDNA metabarcoding data using a case study in a coastal, marine environment. Using occupancy models to account for factors often overlooked in the analysis of eDNA metabarcoding data will dramatically improve ecological inference, sampling design, and methodologies, empowering practitioners with an approach to wield the high-resolution biodiversity data of next-generation sequencing platforms.
Multi-marker metabarcoding is increasingly being used to generate biodiversity information across different domains of life from microbes to fungi to animals such as for molecular ecology and ...biomonitoring applications in different sectors from academic research to regulatory agencies and industry. Current popular bioinformatic pipelines support microbial and fungal marker analysis, while ad hoc methods are often used to process animal metabarcode markers from the same study. MetaWorks provides a harmonized processing environment, pipeline, and taxonomic assignment approach for demultiplexed Illumina reads for all biota using a wide range of metabarcoding markers such as 16S, ITS, and COI. A Conda environment is provided to quickly gather most of the programs and dependencies for the pipeline. Several workflows are provided such as: taxonomically assigning exact sequence variants, provides an option to generate operational taxonomic units, and facilitates single-read processing. Pipelines are automated using Snakemake to minimize user intervention and facilitate scalability. All pipelines use the RDP classifier to provide taxonomic assignments with confidence measures. We extend the functionality of the RDP classifier for taxonomically assigning 16S (bacteria), ITS (fungi), and 28S (fungi), to also support COI (eukaryotes), rbcL (eukaryotes, land plants, diatoms), 12S (fish, vertebrates), 18S (eukaryotes, diatoms) and ITS (fungi, plants). MetaWorks properly handles ITS by trimming flanking conserved rRNA gene regions as well as protein coding genes by providing two options for removing obvious pseudogenes. MetaWorks can be downloaded from https://github.com/terrimporter/MetaWorks and quickstart instructions, pipeline details, and a tutorial for new users can be found at https://terrimporter.github.io/MetaWorksSite.
DNA barcoding and metabarcoding are techniques that focus on signature genomic regions that in theory provide species level resolution, but in practice this is not always possible. We place ...animal-focused COI metabarcoding in context with respect to the use of marker gene sequencing in microbial and fungal ecology. We focus on three specific aspects of metabarcodes: (1) the process of metabarcode sequence clustering, (2) how metabarcode cluster types affect the results of biodiversity analyses, and (3) the current state of reference sequence databases used for metabarcode identification. Using examples from the arthropod COI metabarcode literature, we show that exact sequence variants (ESVs) detect more unique taxa than operational taxonomic units (OTUs) but with similar patterns in taxonomic resolution. We also show that the difference between ordinations based on ESVs or OTUs recover similar groupings. We compile a list of reference sequence databases useful for multi-marker metabarcoding and present a list of reference sequence databases specifically formatted for use with a naive Bayesian classifier for rigorous metabarcode taxonomic assignments. Sophisticated tools and reference databases are available for analyzing COI sequences, and these compare favorably with those available for other metabarcode markers such as the ribosomal RNA genes used to target microbes and fungi.