Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification ...uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom. Database URL: http://www.rsyst.inra.fr/.
The Gomphonema parvulum complex includes species displaying considerable morphological variability and a wide geographical distribution. These characteristics make them difficult to identify by ...microscopy and raise the question of their taxonomic validity and of the possibility of biogeographical differentiation between them. In this context, we isolated 39 G. parvulum s.l. strains from rivers located in a tropical island (Mayotte) and in mainland Europe. By sequencing three DNA fragments (ITS, rbcL and cox1), four clades (A, B, C and D) were clearly identified, and an additional one (B’) was distinguishable only on the rbcL sequence. The main four clades can be separated by their morphological criteria, in particular by the shape of the central area, but some overlaps were found between them. We therefore consider that the G. parvulum complex contains at least four semi-cryptic species corresponding to the four main clades. One of them (A) was found only on Mayotte, while two others (C and D) were found only in Europe. The last clade (B) contained strains from both Europe and Mayotte. Pyrosequencing data confirmed the geographical differences in the distribution of these species, suggesting that the G. parvulum complex displays biogeographic structure.
Diatoms are main bioindicators used to assess the ecological quality of rivers, but their identification is difficult and time-consuming. Next Generation Sequencing (NGS) can be used to study ...communities of microorganisms, so we carried out a test of the reliability of 454 pyrosequencing for estimating diatom inventories in environmental samples. We used small subunit ribosomal deoxyribonucleic acid (SSU rDNA), ribulose-1, 5-bisphosphate carboxylase (rbcL), and cytochrome oxidase I (COI) markers and examined reference libraries to define thresholds between the intra- and interspecific and intra- and intergeneric genetic distances. Based on tests of 1 mock community, we used a threshold of 99% identity for SSU rDNA and rbcL sequences to study freshwater diatoms at the species level. We applied 454 pyrosequencing to 4 contrasting environmental samples (with one in duplicate), assigned taxon names to environmental sequences, and compared the qualitative and quantitative molecular inventories to those obtained by microscopy. Species richness detected by microscopy was always higher than that detected by pyrosequencing. Some morphologically detected taxa may have been persistent frustules from dead cells. Some taxa detected by molecular analysis were not detected by morphology and vice versa. The main source of divergence appears to be inadequate taxonomic coverage in DNA reference libraries. Only a small percentage of species (but almost all genera) in morphological inventories were included in DNA reference libraries. DNA reference libraries contained a smaller percentage of species from tropical (27.1–38.1%) than from temperate samples (53.7–77.8%). Agreement between morphological and molecular inventories was better for species with relative abundance >1% than for rare species. The rbcL marker appeared to provide more reproducible results (94.9% species similarity between the 2 duplicates) and was very useful for molecular identification, but procedural standardization is needed. The water-quality ranking assigned to a site via the Pollution Sensitivity diatom index was the same whether calculated with molecular or morphological data. Pyrosequencing is a promising approach for detecting all species, even rare ones, once reference libraries have been developed.
DNA barcoding, being developed for biomonitoring, requires a database of reference sequences and knowledge of how much sequences can deviate before they are assigned to separate species. The ...molecular hunt for hidden species also raises the question of species definitions. We examined whether there are objective criteria for sequence-based species delimitation in diatoms, using Nitzschia palea, an important monophyletic indicator species already known to contain cryptic diversity. Strains from a wide geographical range were sequenced for 28S rRNA, COI and rbcL. Homogeneity indices and the Chao index failed to objectively select a precise number of species existing in N. palea as well as an evolutionary method based on coalescence theory. COI always gave higher diversity estimations than 28S rRNA or rbcL. Mating data did not provide a precise calibration of molecular species thresholds. Rarefaction curves indicated that further MOTUs would be detected with more isolates than we sampled (81 clones, 42 localities). Although some genotypes had intercontinental distributions, there was a positive relationship between genetic and geographical distance, suggesting even higher richness than we assessed, given that many regions were not sampled. Overall, no objective criteria were found for species separation; instead barcoding will need a consensual approach to molecular species limits.
Diatom species identification with DNA metabarcoding is an economical, fast and reliable alternative to identification via light microscopy for river quality monitoring. Using a short DNA sequence of ...the rbcL gene and 'Diat.barcode', a reference barcode library, enables the identification of more than 90% of the environmental sequences to species level in French rivers. But the completeness of this library is much lower in other regions, such as the tropical French overseas departments. A barcode library completion method using high-throughput sequencing data combined with microscopy count data from natural samples (Rimet et al.
2018
) was applied and tested in rivers of Martinique and Guadeloupe (West Indies), for which only 45% of the environmental sequences could be identified to species level using Diat.barcode v9. Assigning barcodes to the most abundant species in the islands by this method is illustrated with Ulnaria goulardii and two new species belonging to Nupela and Epithemia, which are also described in this paper. The more complex situation of morphologically similar species is illustrated by reference to Gomphonema designatum and G. bourbonense. Using a combination of molecular and morphological data, their conspecificity, as G. bourbonense, is demonstrated with their reference barcodes. However, when several morphologically similar species and several environmental sequences belonging to the same clade are present, it is not possible to relate the barcodes to corresponding morphological species.
Applying this method enabled the Diat.barcode library (v.10) to be updated, with 84% of the environmental sequences from the West Indies now identifiable at the species level. However, many morphological species still lack barcodes. In these cases, more classical methods, such as cell isolation, Sanger sequencing and morphological observations of cultures, must be applied.
The recent emergence of barcoding approaches coupled to those of next‐generation sequencing (NGS) has raised new perspectives for studying environmental communities. In this framework, we tested the ...possibility to derive accurate inventories of diatom communities from pyrosequencing outputs with an available DNA reference library. We used three molecular markers targeting the nuclear, chloroplast and mitochondrial genomes (SSU rDNA, rbcL and cox1) and three samples of a mock community composed of 30 known diatom strains belonging to 21 species. In the goal to detect methodological biases, one sample was constituted directly from pooled cultures, whereas the others consisted of pooled PCR products. The NGS reads obtained by pyrosequencing (Roche 454) were compared first to a DNA reference library including the sequences of all the species used to constitute the mock community, and second to a complete DNA reference library with a larger taxonomic coverage. A stringent taxonomic assignation gave inventories that were compared to the real one. We detected biases due to DNA extraction and PCR amplification that resulted in false‐negative detection. Conversely, pyrosequencing errors appeared to generate false positives, especially in case of closely allied species. The taxonomic coverage of DNA reference libraries appears to be the most crucial factor, together with marker polymorphism which is essential to identify taxa at the species level. RbcL offers a high resolving power together with a large DNA reference library. Although needing further optimization, pyrosequencing is suitable for identifying diatom assemblages and may find applications in the field of freshwater biomonitoring.
Although diatom taxa have been observed and described for many years using light and electron microscopy, several taxa have called for some clarifications and taxonomic reassessments. This is the ...case for the order Cymbellales D.G. Mann, which is widely represented in freshwater. The phylogenetic relationships among taxa belonging to this order are not always clear because their taxonomic status has been repeatedly revised. In this study, diatom cells were isolated from rivers in Italy, Luxembourg, Portugal and Spain. In total, 21 18S rDNA gene sequences, representing six genera of Cymbellales (Cymbella C. Agardh, Didymosphenia M. Schmidt, Encyonema Kützing, Gomphoneis Cleve, Gomphonema Ehrenberg and Reimeria Kociolek & Stoermer) were determined. These sequences were analyzed along with other known GenBank diatom 18S rDNA gene sequences. The results indicate that the Cymbellaceae Greville and Gomphonemataceae Kützing, especially the genus Gomphonema, are paraphyletic, and that the significance of some of the morphological characteristics traditionally used for classification purposes requires a reassessment. These results also demonstrate the importance of a polyphasic approach combining both morphological and molecular data in attempting to improve the taxonomy and classification system of diatoms.
Sequence analysis of the 18S rDNA gene from 93 taxa belonging to the pennate diatoms plus one centric, Cyclotella meneghiniana Kützing, were made using two different alignments (Clustal and secondary ...structure) and two different types of algorithms (neighbour-joining and maximum likelihood). The monophyly of the bacillariacean taxa depends on the type of alignment used for the 18S gene. A secondary structure alignment does not support its monophyly, whereas a Clustal alignment does, but only in the maximum likelihood analysis. The Eunotiales were basal to all other raphid diatoms if a maximum likelihood analysis was used, regardless of the alignment, whereas a neighbour-joining analysis, regardless of the alignment, pulled the Eunotiales inside the raphid diatom sister to one of the bacillariophycean clades in the secondary structure alignment and sister to a monophyletic bacillariophycean clade in the Clustal alignment. The classification of the Bacillariaceae by Krammer & Lange-Bertalot and Round, Crawford & Mann was not supported by the 18S phylogeny. Taxa of the section Lanceolatae Grunow were present in different clades, but sister relationships between well-supported clades were not supported. Multiseriate striae, which are often considered an important feature, were not supported as being clade-defining features. The two groups, A and B of Krammer & Lange-Bertalot in the section Lanceolatae were not supported by the phylogenetic analyses.
The recent emergence of barcoding approaches coupled to those of next-generation sequencing (NGS) has raised new perspectives for studying environmental communities. In this framework, we tested the ...possibility to derive accurate inventories of diatom communities from pyrosequencing outputs with an available DNA reference library. We used three molecular markers targeting the nuclear, chloroplast and mitochondrial genomes (SSU rDNA, rbcL and cox1) and three samples of a mock community composed of 30 known diatom strains belonging to 21 species. In the goal to detect methodological biases, one sample was constituted directly from pooled cultures, whereas the others consisted of pooled PCR products. The NGS reads obtained by pyrosequencing (Roche 454) were compared first to a DNA reference library including the sequences of all the species used to constitute the mock community, and second to a complete DNA reference library with a larger taxonomic coverage. A stringent taxonomic assignation gave inventories that were compared to the real one. We detected biases due to DNA extraction and PCR amplification that resulted in false-negative detection. Conversely, pyrosequencing errors appeared to generate false positives, especially in case of closely allied species. The taxonomic coverage of DNA reference libraries appears to be the most crucial factor, together with marker polymorphism which is essential to identify taxa at the species level. RbcL offers a high resolving power together with a large DNA reference library. Although needing further optimization, pyrosequencing is suitable for identifying diatom assemblages and may find applications in the field of freshwater biomonitoring.
La Directive Cadre Européenne sur l'eau impose d‘évaluer la qualité des cours d‘eau au moyen d‘indicateurs chimiques et biologiques dont les diatomées font partie. Les indices basés sur la ...composition taxonomique et l‘abondance relative des taxa de diatomées sont robustes. Cependant, de nombreux échantillons doivent être analysés chaque année alors que l‘identification de ces micro-algues en microscopie optique est difficile à cause des incertitudes taxonomiques, et nécessite temps et expertise. Ainsi, des améliorations peuvent encore être apportées pour faciliter le suivi en routine de la qualité de l‘eau.Les techniques de biologie moléculaire sont des outils efficaces pour identifier les microorganismes et pourraient donc être utilisées pour améliorer l‘identification des diatomées. Les objectifs de cette thèse étaient donc de compléter les connaissances sur la taxonomie des diatomées d‘eau douce par des méthodes moléculaires et de progresser dans le développement d‘un outil moléculaire permettant l‘identification des diatomées dans des échantillons naturels, en vue de son utilisation en bioindication.L‘étude de la taxonomie de plusieurs groupes de diatomées a été réalisée en combinant des approches morphologiques et des approches moléculaires. Nos travaux ont montré les capacités des séquences ADN pour discriminer les taxa de diatomées et révéler leurs relations phylogénétiques. L‘utilisation de séquences ADN a montré que les critères morphologiques utilisés pour identifier les diatomées ne correspondaient pas systématiquement à leurs relations phylogénétiques. L‘utilisation de différents marqueurs a permis des discriminations à différents niveaux taxonomiques. Nos résultats ont également révélé l‘importance de combiner des approches complémentaires, morphologiques et moléculaires, pour améliorer notre compréhension des relations entre les différents taxa de diatomées et ainsi stabiliser leur taxonomie.Les séquences ADN permettant une discrimination des taxa de diatomées, nous avons testé un outil moléculaire de séquençage haut-débit, le pyroséquençage 454, dans le but d‘identifier les taxa composant les communautés de diatomées. Nous avons ainsi assemblé des bases de séquences de référence bénéficiant d‘une identification taxonomique. Nous avons également participé au développement d‘outils bioinformatiques nécessaires à l‘analyse des données de pyroséquençage. Enfin, nous avons pu tester ces outils pour établir des inventaires taxonomiques de diatomées dans des communautés artificielles (mélanges de souches) et dans des communautés environnementales (biofilms d‘eau douce). Ces études ont prouvé le potentiel du pyroséquençage 454 pour étudier les communautés de diatomées à des niveaux taxonomiques précis. La comparaison de différents marqueurs nucléiques a révélé que le marqueur rbcL était le marqueur le plus adapté à l‘identification des diatomées par pyroséquençage. En effet, en prenant en compte les bases de séquences de référence, la reproductibilité et les biais de la méthode ainsi que le pouvoir résolutif du marqueur, l‘utilisation du rbcL a permis la meilleure estimation de la composition en diatomées d‘échantillons complexes.Des progrès devront encore être faits avant de pouvoir utiliser les outils moléculaires pour évaluer la qualité de l‘eau par les diatomées. Cependant nos différentes études permettront de guider les prochaines analyses de manière à aboutir à un suivi de la qualité de l‘eau basé sur des inventaires moléculaires des taxa de diatomée