The nuclear ribosomal internal transcribed spacer (ITS) region is the formal fungal barcode and in most cases the marker of choice for the exploration of fungal diversity in environmental samples. ...Two problems are particularly acute in the pursuit of satisfactory taxonomic assignment of newly generated ITS sequences: (i) the lack of an inclusive, reliable public reference data set and (ii) the lack of means to refer to fungal species, for which no Latin name is available in a standardized stable way. Here, we report on progress in these regards through further development of the UNITE database (http://unite.ut.ee) for molecular identification of fungi. All fungal species represented by at least two ITS sequences in the international nucleotide sequence databases are now given a unique, stable name of the accession number type (e.g. Hymenoscyphus pseudoalbidus|GU586904|SH133781.05FU), and their taxonomic and ecological annotations were corrected as far as possible through a distributed, third‐party annotation effort. We introduce the term ‘species hypothesis’ (SH) for the taxa discovered in clustering on different similarity thresholds (97–99%). An automatically or manually designated sequence is chosen to represent each such SH. These reference sequences are released (http://unite.ut.ee/repository.php) for use by the scientific community in, for example, local sequence similarity searches and in the QIIME pipeline. The system and the data will be updated automatically as the number of public fungal ITS sequences grows. We invite everybody in the position to improve the annotation or metadata associated with their particular fungal lineages of expertise to do so through the new Web‐based sequence management system in UNITE.
Within uncharacterized groups, DNA barcodes, short DNA sequences that are present in a wide range of species, can be used to assign organisms into species. We propose an automatic procedure that ...sorts the sequences into hypothetical species based on the barcode gap, which can be observed whenever the divergence among organisms belonging to the same species is smaller than divergence among organisms from different species. We use a range of prior intraspecific divergence to infer from the data a model‐based one‐sided confidence limit for intraspecific divergence. The method, called Automatic Barcode Gap Discovery (ABGD), then detects the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning. Using six published data sets of metazoans, we show that ABGD is computationally efficient and performs well for standard prior maximum intraspecific divergences (a few per cent of divergence for the five data sets), except for one data set where less than three sequences per species were sampled. We further explore the theoretical limitations of ABGD through simulation of explicit speciation and population genetics scenarios. Our results emphasize in particular the sensitivity of the method to the presence of recent speciation events, via (unrealistically) high rates of speciation or large numbers of species. In conclusion, ABGD is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.
Most of the present EU Water Framework Directive (WFD) compliant fish‐based assessment methods of European rivers are multi‐metric indices computed from traditional electrofishing (TEF) samples, but ...this method has known shortcomings, especially in large rivers. The probability of detecting rare species remains limited, which can alter the sensitivity of the indices. In recent years, environmental (e)DNA metabarcoding techniques have progressed sufficiently to allow applications in various ecological domains as well as eDNA‐based ecological assessment methods. A review of the 25 current WFD‐compliant methods for river fish shows that 81% of the metrics used in these methods are expressed in richness or relative abundance and thus compatible with eDNA samples. However, more than half of the member states' methods include at least one metric related to age or size structure and would have to adapt their current fish index if reliant solely on eDNA‐derived information. Most trait‐based metrics expressed in richness are higher when computed from eDNA than when computed from TEF samples. Comparable values are obtained only when the TEF sampling effort increases. Depending on the species trait considered, most trait‐based metrics expressed in relative abundance are significantly higher for eDNA than for TEF samples or vice versa due to over‐estimation of sub‐surface species or under‐estimation of benthic and rare species by TEF sampling, respectively. An existing predictive fish index, adapted to make it compatible with eDNA data, delivers an ecological assessment comparable with the current approved method for 22 of the 25 sites tested. Its associated uncertainty is lower than that of current fish indices. Recommendations for the development of future fish eDNA‐based indices and the associated eDNA water sampling strategy are discussed.
This study presents DNA barcode records for 4118 specimens representing 561 species of bees belonging to the six families of Apoidea (Andrenidae, Apidae, Colletidae, Halictidae, Megachilidae and ...Melittidae) found in Central Europe. These records provide fully compliant barcode sequences for 503 of the 571 bee species in the German fauna and partial sequences for 43 more. The barcode results are largely congruent with traditional taxonomy as only five closely allied pairs of species could not be discriminated by barcodes. As well, 90% of the species possessed sufficiently deep sequence divergence to be assigned to a different Barcode Index Number (BIN). In fact, 56 species (11%) were assigned to two or more BINs reflecting the high levels of intraspecific divergence among their component specimens. Fifty other species (9.7%) shared the same Barcode Index Number with one or more species, but most of these species belonged to a distinct barcode cluster within a particular BIN. The barcode data contributed to clarifying the status of nearly half the examined taxonomically problematic species of bees in the German fauna. Based on these results, the role of DNA barcoding as a tool for current and future taxonomic work is discussed.
Developmental deconvolution of complex organs and tissues at the level of individual cells remains challenging. Non-invasive genetic fate mapping has been widely used, but the low number of distinct ...fluorescent marker proteins limits its resolution. Much higher numbers of cell markers have been generated using viral integration sites, viral barcodes, and strategies based on transposons and CRISPR-Cas9 genome editing; however, temporal and tissue-specific induction of barcodes in situ has not been achieved. Here we report the development of an artificial DNA recombination locus (termed Polylox) that enables broadly applicable endogenous barcoding based on the Cre-loxP recombination system. Polylox recombination in situ reaches a practical diversity of several hundred thousand barcodes, allowing tagging of single cells. We have used this experimental system, combined with fate mapping, to assess haematopoietic stem cell (HSC) fates in vivo. Classical models of haematopoietic lineage specification assume a tree with few major branches. More recently, driven in part by the development of more efficient single-cell assays and improved transplantation efficiencies, different models have been proposed, in which unilineage priming may occur in mice and humans at the level of HSCs. We have introduced barcodes into HSC progenitors in embryonic mice, and found that the adult HSC compartment is a mosaic of embryo-derived HSC clones, some of which are unexpectedly large. Most HSC clones gave rise to multilineage or oligolineage fates, arguing against unilineage priming, and suggesting coherent usage of the potential of cells in a clone. The spreading of barcodes, both after induction in embryos and in adult mice, revealed a basic split between common myeloid-erythroid development and common lymphocyte development, supporting the long-held but contested view of a tree-like haematopoietic structure.
The world's seafood supply and trade have increased in the last decades, as well as the potential for marketed species substitution. Currently, seafood safety and authenticity assessment have become ...central issues, directly related with the identification of improper labeling of processed foods. To detect and prevent mislabeling issues, species identification using DNA barcodes has been widely used as effective molecular markers. Therefore, this review intends to present the current status on the application of DNA barcodes to seafood species authentication. In this regard, the barcode regions, reference databases and related methodologies are described, while applications are listed and summarized. Cytochrome c oxidase subunit I (COI) gene has been the preferential targeted DNA region in animal species identification, including fish and shellfish, though other mitochondrial (cytb, 12S rRNA, 16S rRNA) and nuclear genes have been used. DNA barcoding relying on Sanger's sequencing has been the most used approach for seafood authentication. Nevertheless, in recent years, noteworthy progresses have been advanced toward DNA barcoding strategies, involving next generation sequencing. Methods relying on real-time PCR using species-specific primers and probes or followed by high resolution melting analysis combined with DNA barcodes represent alternative and promising approaches for simple, cost-effective and high-throughput species discrimination in processed seafood. Still, polymerase chain reaction with restriction fragment length polymorphism detection, targeting DNA barcodes, continues to be a well-established and broadly accepted method in seafood authentication.
Metabarcoding is an emerging genetic tool to rapidly assess biodiversity in ecosystems. It involves high-throughput sequencing of a standard gene from an environmental sample and comparison to a ...reference database. However, no consensus has emerged regarding laboratory pipelines to screen species diversity and infer species abundances from environmental samples. In particular, the effect of primer bias and the detection limit for specimens with a low biomass has not been systematically examined, when processing samples in bulk. We developed and tested a DNA metabarcoding protocol that utilises the standard cytochrome c oxidase subunit I (COI) barcoding fragment to detect freshwater macroinvertebrate taxa. DNA was extracted in bulk, amplified in a single PCR step, and purified, and the libraries were directly sequenced in two independent MiSeq runs (300-bp paired-end reads). Specifically, we assessed the influence of specimen biomass on sequence read abundance by sequencing 31 specimens of a stonefly species with known haplotypes spanning three orders of magnitude in biomass (experiment I). Then, we tested the recovery of 52 different freshwater invertebrate taxa of similar biomass using the same standard barcoding primers (experiment II). Each experiment was replicated ten times to maximise statistical power. The results of both experiments were consistent across replicates. We found a distinct positive correlation between species biomass and resulting numbers of MiSeq reads. Furthermore, we reliably recovered 83% of the 52 taxa used to test primer bias. However, sequence abundance varied by four orders of magnitudes between taxa despite the use of similar amounts of biomass. Our metabarcoding approach yielded reliable results for high-throughput assessments. However, the results indicated that primer efficiency is highly species-specific, which would prevent straightforward assessments of species abundance and biomass in a sample. Thus, PCR-based metabarcoding assessments of biodiversity should rely on presence-absence metrics.
Rodentia is the most diverse order among mammals, with more than 2,000 species currently described. Most of the time, species assignation is so difficult based on morphological data solely that ...identifying rodents at the specific level corresponds to a real challenge. In this study, we compared the applicability of 100 bp mini-barcodes from cytochrome b and cytochrome c oxidase 1 genes to enable rodent species identification. Based on GenBank sequence datasets of 115 rodent species, a 136 bp fragment of cytochrome b was selected as the most discriminatory mini-barcode, and rodent universal primers surrounding this fragment were designed. The efficacy of this new molecular tool was assessed on 946 samples including rodent tissues, feces, museum samples and feces/pellets from predators known to ingest rodents. Utilizing next-generation sequencing technologies able to sequence mixes of DNA, 1,140 amplicons were tagged, multiplexed and sequenced together in one single 454 GS-FLX run. Our method was initially validated on a reference sample set including 265 clearly identified rodent tissues, corresponding to 103 different species. Following validation, 85.6% of 555 rodent samples from Europe, Asia and Africa whose species identity was unknown were able to be identified using the BLASTN program and GenBank reference sequences. In addition, our method proved effective even on degraded rodent DNA samples: 91.8% and 75.9% of samples from feces and museum specimens respectively were correctly identified. Finally, we succeeded in determining the diet of 66.7% of the investigated carnivores from their feces and 81.8% of owls from their pellets. Non-rodent species were also identified, suggesting that our method is sensitive enough to investigate complete predator diets. This study demonstrates how this molecular identification method combined with high-throughput sequencing can open new realms of possibilities in achieving fast, accurate and inexpensive species identification.
DNA barcoding has had a major impact on biodiversity science. The elegant simplicity of establishing massive scale databases for a few barcode loci is continuing to change our understanding of ...species diversity patterns, and continues to enhance human abilities to distinguish among species. Capitalizing on the developments of next generation sequencing technologies and decreasing costs of genome sequencing, there is now the opportunity for the DNA barcoding concept to be extended to new kinds of genomic data. We illustrate the benefits and capacity to do this, and also note the constraints and barriers to overcome before it is truly scalable. We advocate a twin track approach: (i) continuation and acceleration of global efforts to build the DNA barcode reference library of life on earth using standard DNA barcodes and (ii) active development and application of extended DNA barcodes using genome skimming to augment the standard barcoding approach.
Taxonomic identification of biological materials can be achieved through DNA barcoding, where an unknown "barcode" sequence is compared to a reference database. In many disciplines, obtaining ...accurate taxonomic identifications can be imperative (e.g., evolutionary biology, food regulatory compliance, forensics). The Barcode of Life DataSystems (BOLD) and GenBank are the main public repositories of DNA barcode sequences. In this study, an assessment of the accuracy and reliability of sequences in these databases was performed. To achieve this, 1) curated reference materials for plants, macro-fungi and insects were obtained from national collections, 2) relevant barcode sequences (rbcL, matK, trnH-psbA, ITS and COI) from these reference samples were generated and used for searching against both databases, and 3) optimal search parameters were determined that ensure the best match to the known species in either database. While GenBank outperformed BOLD for species-level identification of insect taxa (53% and 35%, respectively), both databases performed comparably for plants and macro-fungi (~81% and ~57%, respectively). Results illustrated that using a multi-locus barcode approach increased identification success. This study outlines the utility of the BLAST search tool in GenBank and the BOLD identification engine for taxonomic identifications and identifies some precautions needed when using public sequence repositories in applied scientific disciplines.