The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and ...error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.
We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.
The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.
QDD is an open access program providing a user-friendly tool for microsatellite detection and primer design from large sets of DNA sequences. The program is designed to deal with all steps of ...treatment of raw sequences obtained from pyrosequencing of enriched DNA libraries, but it is also applicable to data obtained through other sequencing methods, using FASTA files as input. The following tasks are completed by QDD: tag sorting, adapter/vector removal, elimination of redundant sequences, detection of possible genomic multicopies (duplicated loci or transposable elements), stringent selection of target microsatellites and customizable primer design. It can treat up to one million sequences of a few hundred base pairs in the tag-sorting step, and up to 50 000 sequences in a single input file for the steps involving estimation of sequence similarity. Availability: QDD is freely available under the GPL licence for Windows and Linux from the following web site: http://www.univ-provence.fr/gsite/Local/egee/dir/meglecz/QDD.html Contact: emese.meglecz@univ-provence.fr Supplementary information: Supplementary data are available at Bioinformatics online.
Microsatellite marker development has been greatly simplified by the use of high‐throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of ...markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high‐coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command‐line version as well as integrated into a Galaxy server, providing a user‐friendly interface, as well as the possibility to utilize a large variety of NGS tools.
Microsatellites (or SSRs: simple sequence repeats) are among the most frequently used DNA markers in many areas of research. The use of microsatellite markers is limited by the difficulties involved ...in their de novo isolation from species for which no genomic resources are available. We describe here a high‐throughput method for isolating microsatellite markers based on coupling multiplex microsatellite enrichment and next‐generation sequencing on 454 GS‐FLX Titanium platforms. The procedure was calibrated on a model species (Apis mellifera) and validated on 13 other species from various taxonomic groups (animals, plants and fungi), including taxa for which severe difficulties were previously encountered using traditional methods. We obtained from 11 497 to 34 483 sequences depending on the species and the number of detected microsatellite loci ranged from 199 to 5791. We thus demonstrated that this procedure can be readily and successfully applied to a large variety of taxonomic groups, at much lower cost than would have been possible with traditional protocols. This method is expected to speed up the acquisition of high‐quality genetic markers for nonmodel organisms.
Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to ...assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.
Thecosomata is a marine zooplankton group, which played an important role in the carbonate cycle in oceans due to their shell composition. So far, there is important discrepancy between the previous ...morphological-based taxonomies, and subsequently the evolutionary history of Thecosomata. In this study, the remarkable planktonic sampling of TARA Oceans expedition associated with a set of various other missions allowed us to assess the phylogenetic relationships of Thecosomata using morphological and molecular data (28 S and COI genes). The two gene trees showed incongruities (e.g. Hyalocylis, Cavolinia), and high congruence between morphological and 28S trees (e.g. monophyly of Euthecosomata). The monophyly of straight shell species led us to reviving the Orthoconcha, and the split of Limacinidae led us to the revival of Embolus inflata replacing Limacina inflata. The results also jeopardized the Euthecosomata families that are based on plesiomorphic character state as in the case for Creseidae which was not a monophyletic group. Divergence times were also estimated, and suggested that the evolutionary history of Thecosomata was characterized by four major diversifying events. By bringing the knowledge of palaeontology, we propose a new evolutionary scenario for which macro-evolution implying morphological innovations were rhythmed by climatic changes and associated species turn-over that spread from the Eocene to Miocene, and were shaped principally by predation and shell buoyancy.
Understanding the impact of non-native species on native species is a major challenge in molecular ecology, particularly for genetically compatible fish species. Invasions are generally difficult to ...study because their effects may be confused with those of environmental or human disturbances. Colonized ecosystems are differently impacted by human activities, resulting in diverse responses and interactions between native and non-native species. We studied the dynamics between two Cyprinids species (invasive Chondrostoma nasus and endemic Parachondrostoma toxostoma) and their hybrids in 16 populations (from allopatric to sympatric situations and from little to highly fragmented areas) corresponding to 2,256 specimens. Each specimen was assigned to a particular species or to a hybrid pool using molecular identification (cytochrome b and 41 microsatellites). We carried out an ecomorphological analysis based on size, age, body shape, and diet (gut vacuity and molecular fecal contents). Our results contradicted our initial assumptions on the pattern of invasion and the rate of introgression. There was no sign of underperformance for the endemic species in areas where hybridisation occurred. In the unfragmented zone, the introduced species was found mostly downstream, with body shapes similar to those in allopatric populations while both species were found to be more insectivorous than the reference populations. However, high level of hybridisation was detected, suggesting interactions between the two species during spawning and/or the existence of hybrid swarm. In the disturbed zone, introgression was less frequent and slender body shape was associated with diatomivorous behaviour, smaller size (juvenile characteristics) and greater gut vacuity. Results suggested that habitat degradation induced similar ecomorphological trait changes in the two species and their hybrids (i.e. a transition towards a pedomorphic state) where the invasive species is more affected than the native species. Therefore, this study reveals a diversity of relationships between two genetically compatible species and emphasizes constraints on the invasion process in disturbed areas.
Interspecific hybridization is widespread, occurring in a taxonomically diverse array of species. The Cyprinidae family, which displays more than 30% hybridization, is a good candidate for studies of ...processes underlying isolation and speciation, such as genetic exchange between previously isolated lineages. This is particularly relevant in the case of recent hybridization between an invasive species, Chondrostoma nasus nasus (from Eastern Europe), and C. toxostoma toxostoma (a threatened species endemic to southern France), in which bidirectional introgressive hybridization has been demonstrated.
We studied 128 specimens from reference populations and 1495 hybrid zone specimens (two years of sampling and four stations), using five molecular markers (one mitochondrial gene, four nuclear introns), morphology (meristic and plastic characters) and life history traits (weight, size, coefficient of condition, sex, age, shoaling). We identified 65 hybrid combinations and visualized spatial and temporal changes in composition. The direction of mitochondrial introgression was density-dependent in favor of the rarer species and we demonstrate that the sexual selection hypothesis is a preponderant explanation in the asymmetry of introgression. Despite genomic evolution in the hybrid zone, convergence was observed for body shape and coefficient of condition, indicating changes in foraging behavior with respect to reference populations, reflecting strong environmental pressure.
The complex rules of hybrid zone dynamics are established very early in the contact zone. We propose "inheritance from the rare species" as a new evolutionary hypothesis for animal models. The endemic species was not assimilated by the invasive species. Survival rates for this species were highest in the middle of the river (the warmest part) due to a trade-off between food availability and fecundity. The environment-independent hybrid combination may result from nuclear-mitochondrial interactions involving the Tpi1b gene or a gene linked to this gene (Chromosome 16). This genomic region is also responsible for shoaling behavior in Danio rerio and is a promising zone for studies of changes in population dynamics and advances in integrated studies of hybrid zones.
In Figure 7, there is an error related to the schematized shell morphology and genus correspondence. Please see the corrected Figure 7 here: thumbnail Download: * PPT PowerPoint slide * PNG larger ...image * TIFF original image Figures Citation: Corse E, Rampal J, Cuoc C, Pech N, Perez Y, Gilles A (2013) Correction: Phylogenetic Analysis of Thecosomata Blainville, 1824 (Holoplanktonic Opisthobranchia) Using Morphological and Molecular Data.