Cymbidium is an orchid genus that has undergone rapid radiation and has high ornamental, economic, ecological and cultural importance, but its classification based on morphology is controversial. The ...plastid genome (plastome), as an extension of plant standard DNA barcodes, has been widely used as a potential molecular marker for identifying recently diverged species or complicated plant groups. In this study, we newly generated 237 plastomes of 50 species (at least two individuals per species) by genome skimming, covering 71.4% of members of the genus Cymbidium. Sequence‐based analyses (barcoding gaps and automatic barcode gap discovery) and tree‐based analyses (maximum likelihood, Bayesian inference and multirate Poisson tree processes model) were conducted for species identification of Cymbidium. Our work provides a comprehensive DNA barcode reference library for Cymbidium species identification. The results show that compared with standard DNA barcodes (rbcL + matK) as well as the plastid trnH‐psbA, the species identification rate of the plastome increased moderately from 58% to 68%. At the same time, we propose an optimized identification strategy for Cymbidium species. The plastome cannot completely resolve the species identification of Cymbidium, the main reasons being incomplete lineage sorting, artificial cultivation, natural hybridization and chloroplast capture. To further explore the potential use of nuclear data in identifying species, the Skmer method was adopted and the identification rate increased to 72%. It appears that nuclear genome data have a vital role in species identification and are expected to be used as next‐generation nuclear barcodes.
Taxonomic identification of biological materials can be achieved through DNA barcoding, where an unknown "barcode" sequence is compared to a reference database. In many disciplines, obtaining ...accurate taxonomic identifications can be imperative (e.g., evolutionary biology, food regulatory compliance, forensics). The Barcode of Life DataSystems (BOLD) and GenBank are the main public repositories of DNA barcode sequences. In this study, an assessment of the accuracy and reliability of sequences in these databases was performed. To achieve this, 1) curated reference materials for plants, macro-fungi and insects were obtained from national collections, 2) relevant barcode sequences (rbcL, matK, trnH-psbA, ITS and COI) from these reference samples were generated and used for searching against both databases, and 3) optimal search parameters were determined that ensure the best match to the known species in either database. While GenBank outperformed BOLD for species-level identification of insect taxa (53% and 35%, respectively), both databases performed comparably for plants and macro-fungi (~81% and ~57%, respectively). Results illustrated that using a multi-locus barcode approach increased identification success. This study outlines the utility of the BLAST search tool in GenBank and the BOLD identification engine for taxonomic identifications and identifies some precautions needed when using public sequence repositories in applied scientific disciplines.
DNA barcodes are widely used for identification and discovery of species. While such use draws on information at the DNA level, the current amassment of ca. 4.7 million COI barcodes also offers a ...unique resource for exploring functional constraints on DNA evolution. Here, we explore amino acid variation in a crosscut of the entire animal kingdom. Patterns of DNA variation were linked to functional constraints at the level of the amino acid sequence in functionally important parts of the enzyme. Six amino acid sites show variation with possible effects on enzyme function. Overall, patterns of amino acid variation suggest convergent or parallel evolution at the protein level connected to the transition into a parasitic life style. Denser sampling of two diverse insect taxa revealed that the beetles (Coleoptera) show more amino acid variation than the butterflies and moths (Lepidoptera), indicating fundamental difference in patterns of molecular evolution in COI. Several amino acid sites were found to be under notably strong purifying selection in Lepidoptera as compared to Coleoptera. Overall, these findings demonstrate the utility of the global DNA barcode library to extend far beyond identification and taxonomy, and will hopefully be followed by a multitude of work.
DNA metabarcoding enables efficient characterization of species composition in environmental DNA or bulk biodiversity samples, and this approach is making significant and unique contributions in the ...field of ecology. In metabarcoding of animals, the cytochrome c oxidase subunit I (COI) gene is frequently used as the marker of choice because no other genetic region can be found in taxonomically verified databases with sequences covering so many taxa. However, the accuracy of metabarcoding datasets is dependent on recovery of the targeted taxa using conserved amplification primers. We argue that COI does not contain suitably conserved regions for most amplicon-based metabarcoding applications. Marker selection deserves increased scrutiny and available marker choices should be broadened in order to maximize potential in this exciting field of research.
The proliferation of DNA data is revolutionizing all fields of systematic research. DNA barcode sequences, now available for millions of specimens and several hundred thousand species, are ...increasingly used in algorithmic species delimitations. This is complicated by occasional incongruences between species and gene genealogies, as indicated by situations where conspecific individuals do not form a monophyletic cluster in a gene tree. In two previous reviews, nonmonophyly has been reported as being common in mitochondrial DNA gene trees. We developed a novel web service "Monophylizer" to detect non-monophyly in phylogenetic trees and used it to ascertain the incidence of species nonmonophyly in COI (a.k.a. coxl) barcode sequence data from 4977 species and 41,583 specimens of European Lepidoptera, the largest data set of DNA barcodes analyzed from this regard. Particular attention was paid to accurate species identification to ensure data integrity. We investigated the effects of tree-building method, sampling effort, and other methodological issues, all of which can influence estimates of non-monophyly. We found a 12% incidence of non-monophyly, a value significantly lower than that observed in previous studies. Neighbor joining (NJ) and maximum likelihood (ML) methods yielded almost equal numbers of non-monophyletic species, but 24.1% of these cases of non-monophyly were only found by one of these methods. Non-monophyletic species tend to show either low genetic distances to their nearest neighbors or exceptionally high levels of intraspecific variability. Cases of polyphyly in COI trees arising as a result of deep intraspecific divergence are negligible, as the detected cases reflected misidentifications or methodological errors. Taking into consideration variation in sampling effort, we estimate that the true incidence of non-monophyly is ~23%, but with operational factors still being included. Within the operational factors, we separately assessed the frequency of taxonomic limitations (presence of overlooked cryptic and oversplit species) and identification uncertainties. We observed that operational factors are potentially present in more than half (58.6%) of the detected cases of non-monophyly. Furthermore, we observed that in about 20% of non-monophyletic species and entangled species, the lineages involved are either allopatric or parapatric—conditions where species delimitation is inherently subjective and particularly dependent on the species concept that has been adopted. These observations suggest that species-level non-monophyly in COI gene trees is less common than previously supposed, with many cases reflecting misidentifications, the subjectivity of species delimitation or other operational factors.
One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available ...to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited.
In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages.
This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Effective identification of species using short DNA fragments (DNA barcoding and DNA metabarcoding) requires reliable sequence reference libraries of known taxa. Both taxonomically comprehensive ...coverage and content quality are important for sufficient accuracy. For aquatic ecosystems in Europe, reliable barcode reference libraries are particularly important if molecular identification tools are to be implemented in biomonitoring and reports in the context of the EU Water Framework Directive (WFD) and the Marine Strategy Framework Directive (MSFD). We analysed gaps in the two most important reference databases, Barcode of Life Data Systems (BOLD) and NCBI GenBank, with a focus on the taxa most frequently used in WFD and MSFD. Our analyses show that coverage varies strongly among taxonomic groups, and among geographic regions. In general, groups that were actively targeted in barcode projects (e.g. fish, true bugs, caddisflies and vascular plants) are well represented in the barcode libraries, while others have fewer records (e.g. marine molluscs, ascidians, and freshwater diatoms). We also found that species monitored in several countries often are represented by barcodes in reference libraries, while species monitored in a single country frequently lack sequence records. A large proportion of species (up to 50%) in several taxonomic groups are only represented by private data in BOLD. Our results have implications for the future strategy to fill existing gaps in barcode libraries, especially if DNA metabarcoding is to be used in the monitoring of European aquatic biota under the WFD and MSFD. For example, missing species relevant to monitoring in multiple countries should be prioritized for future collaborative programs. We also discuss why a strategy for quality control and quality assurance of barcode reference libraries is needed and recommend future steps to ensure full utilisation of metabarcoding in aquatic biomonitoring.
Display omitted
•DNA barcode representation in public databases of 28,000 aquatic species is analysed.•Gaps in barcode reference libraries are largest for diatoms and invertebrates.•Sequence coverage varies considerably among invertebrate groups.•Species monitored by one or few countries more frequently lack reference barcodes.•Strategies should be implemented to maintain quality of barcode reference libraries.
Toward quantitative metabarcoding Shelton, Andrew Olaf; Gold, Zachary J.; Jensen, Alexander J. ...
Ecology (Durham),
February 2023, 2023-Feb, 2023-02-00, 20230201, Volume:
104, Issue:
2
Journal Article
Peer reviewed
Open access
Amplicon‐sequence data from environmental DNA (eDNA) and microbiome studies provide important information for ecology, conservation, management, and health. At present, amplicon‐sequencing ...studies—known also as metabarcoding studies, in which the primary data consist of targeted, amplified fragments of DNA sequenced from many taxa in a mixture—struggle to link genetic observations to the underlying biology in a quantitative way, but many applications require quantitative information about the taxa or systems under scrutiny. As metabarcoding studies proliferate in ecology, it becomes more important to develop ways to make them quantitative to ensure that their conclusions are adequately supported. Here we link previously disparate sets of techniques for making such data quantitative, showing that the underlying polymerase chain reaction mechanism explains the observed patterns of amplicon data in a general way. By modeling the process through which amplicon‐sequence data arise, rather than transforming the data post hoc, we show how to estimate the starting DNA proportions from a mixture of many taxa. We illustrate how to calibrate the model using mock communities and apply the approach to simulated data and a series of empirical examples. Our approach opens the door to improve the use of metabarcoding data in a wide range of applications in ecology, public health, and related fields.
•DNA barcoding is facing many challenges as it incorporates new technological advances.•DNA barcoding and metabarcoding are highly complementary approaches.•We need a coordinated advancement of ...DNA-based species identification.•We need to unify traditional taxonomy, barcoding, and metabarcoding approaches.
DNA-based species identification, known as barcoding, transformed the traditional approach to the study of biodiversity science. The field is transitioning from barcoding individuals to metabarcoding communities. This revolution involves new sequencing technologies, bioinformatics pipelines, computational infrastructure, and experimental designs. In this dynamic genomics landscape, metabarcoding studies remain insular and biodiversity estimates depend on the particular methods used. In this opinion article, I discuss the need for a coordinated advancement of DNA-based species identification that integrates taxonomic and barcoding information. Such an approach would facilitate access to almost 3 centuries of taxonomic knowledge and 1 decade of building repository barcodes. Conservation projects are time sensitive, research funding is becoming restricted, and informed decisions depend on our ability to embrace integrative approaches to biodiversity science.
The main aim of DNA barcoding is to establish a shared community resource of DNA sequences that can be used for organismal identification and taxonomic clarification. This approach was successfully ...pioneered in animals using a portion of the cytochrome oxidase 1 (CO1) mitochondrial gene. In plants, establishing a standardized DNA barcoding system has been more challenging. In this paper, we review the process of selecting and refining a plant barcode; evaluate the factors which influence the discriminatory power of the approach; describe some early applications of plant barcoding and summarise major emerging projects; and outline tool development that will be necessary for plant DNA barcoding to advance.