The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of ...completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need.
We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities.
CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.
Numerous adverse reactions have arisen following the use of inaccurately identified medicinal plant ingredients, resulting in conditions such as aristolochic acid nephropathy and herb-induced ...poisoning. This problem has prompted increased global concern over the safety of herbal medicines. DNA barcoding, a technique aiming at detecting species-specific differences in a short region of DNA, provides a powerful new tool for addressing this problem. A preliminary system for DNA barcoding herbal materials has been established based on a two-locus combination of ITS2+psbA–trnH barcodes. There are 78,847 sequences belonging to 23,262 species in the system, which include more than 95% of crude herbal drugs in pharmacopeia, such as those of China, Japan, Korea, India, USA, and Europe. The system has been widely used in traditional herbal medicine enterprises. This review summarizes recent key advances in the DNA barcoding of medicinal plant ingredients (herbal materia medica) as a contribution towards safe and efficacious herbal medicines.
Background: The plant working group of the Consortium for the Barcode of Life recommended the two-locus combination of rbcL + matK as the plant barcode, yet the combination was shown to successfully ...discriminate among 907 samples from 550 species at the species level with a probability of 72%. The group admits that the two-locus barcode is far from perfect due to the low identification rate, and the search is not over. Methodology/Principal Findings: Here, we compared seven candidate DNA barcodes (psbA-trnH, matK, rbcL, rpoC1, ycf5, ITS2, and ITS) from medicinal plant species. Our ranking criteria included PCR amplification efficiency, differential intra- and inter-specific divergences, and the DNA barcoding gap. Our data suggest that the second internal transcribed spacer (ITS2) of nuclear ribosomal DNA represents the most suitable region for DNA barcoding applications. Furthermore, we tested the discrimination ability of ITS2 in more than 6600 plant samples belonging to 4800 species from 753 distinct genera and found that the rate of successful identification with the ITS2 was 92.7% at the species level. Conclusions: The ITS2 region can be potentially used as a standard DNA barcode to identify medicinal plants and their closely related species. We also propose that ITS2 can serve as a novel universal barcode for the identification of a broader range of plant taxa.
Identifying plant, fungal, and animal ingredients in a specific mixture remains challenging during the limitation of PCR amplification and low specificity of traditional methods. Genomic DNA was ...extracted from mock and pharmaceutical samples. Four type of DNA barcodes were generated from shotgun sequencing dataset with the help of a local bioinformatic pipeline. Taxa of each barcode was assigned by blast to TCM-BOL, BOLD, and GenBank. Traditional methods including microscopy, thin layer chromatography (TLC), and high-performance liquid chromatography (HPLC) were carried out according to Chinese pharmacopoeia. On average, 6.8 Gb shotgun reads were sequenced from genomic DNA of each sample. Then, 97, 11, 10, 14, and one operational taxonomic unit (OTU) were generated for ITS2, psbA-trnH, rbcL, matK, and COI, respectively. All the labeled ingredients including eight plant, one fungal, and one animal species were successfully detected in both the mock and pharmaceutical samples, in which Chebulae Fructus, Poria, and Fritilariae Thunbergia Bulbus were identified via mapping reads to organelle genomes. In addition, four unlabeled plant species were detected from pharmaceutical samples, while 30 genera of fungi, such as Schwanniomyces, Diaporthe, Fusarium were detected from mock and pharmaceutical samples. Furthermore, the microscopic, TLC, and HPLC analysis were all in accordance with the standards stipulated by Chinese Pharmacopoeia. This study indicated that shotgun metabarcoding could simultaneously identified plant, fungal, and animal ingredients in herbal products, which has the ability to serve as a valuable complement to traditional methods.
The trnH-psbA intergenic spacer region has been used in many DNA barcoding studies. However, a comprehensive evaluation with rigorous sequence preprocessing and statistical testing on the utility of ...trnH-psbA and its combinations as DNA barcodes is lacking.
Sequences were searched from GenBank for a meta-analysis on the usefulness of trnH-psbA and its combinations as DNA barcodes. After preprocessing, we constructed full and matching data sets that contained 17 983 trnH-psbA sequences and 2190 sets of trnH-psbA, matK, rbcL, and ITS2 sequences from the same sample, repectively. These datasets were used to analyze the ability of trnH-psbA and its combinations to discriminate species by the BLAST and BLAST+P methods. The Fisher's exact test was used to evaluate the significance of performance differences. For the full data set, the identification success rates of trnH-psbA exceeded 70% in 18 families and 12 genera, respectively. For the matching data set, the identification rates of trnH-psbA were significantly higher than those of the other loci in two families and four genera. Similarly, the identification rates of trnH-psbA+ITS2 were significantly higher than those of matK+rbcL in 18 families and 21 genera. CONCLUSION/SIGNIFICANE: This study provides valuable information on the higher utility of trnH-psbA and its combinations. We found that trnH-psbA+ITS2 combination performs better or equally well compared with other combinations in most taxonomic groups investigated. This information will guide the optimal usage of trnH-psbA and its combinations for species identification.
Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 ...predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi.
Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is ...of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and
regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and
intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines.
Accurate identification the species composition in mixtures poses a significant challenge, especially in processed mixtures comprising multiple species, such as those found in food and ...pharmaceuticals. Therefore, we have attempted to utilize shotgun metabarcoding technology to tackle this issue. In this study, the method was initially established using two mock samples of the Mongolian compound preparation Gurigumu-7 (G-7), which was then applied to three pharmaceutical products and 12 hospital-made preparations. A total of 119.72 Gb of raw data sets were obtained through shotgun metagenomic sequencing. By combining ITS2, matK , and rbcL , all the labeled bio-ingredients specified in the G-7 prescription can be detected, although some species may not be detectable in all samples. The prevalent substitution of Akebia quinata can be found in all the pharmaceutical and hospital samples, except for YN02 and YN12. The toxic alternative to Akebia quinata , Aristolochia manshuriensis , was exclusively identified in the YN02 sample. To further confirm this result, we validated it in YN02 using HPLC and real-time PCR with TaqMan probes. The results showed that aristolochic acid A (AAA) was detected in YN02 using HPLC, and the ITS2 sequence of Aristolochia manshuriensis has been validated in YN02 through qPCR and the use of a TaqMan probe. This study confirms that shotgun metabarcoding can effectively identify the biological components in Mongolian medicine compound preparation G-7. It also demonstrates the method’s potential to be utilized as a general identification technique for mixtures containing a variety of plants.
The freshwater leech Whitmania pigra (W. pigra) Whitman (Annelida phylum) is a model organism for neurodevelopmental studies. However, molecular biology research on its embryonic development is still ...scarce. Here, we described a series of developmental stages of the W. pigra embryos and defined five broad stages of embryogenesis: cleavage stages, blastocyst stage, gastrula stage, organogenesis and refinement, juvenile. We obtained a total of 239.64 Gb transcriptome data of eight representative developmental phases of embryos (from blastocyst stage to maturity), which was then assembled into 21,482 unigenes according to our reference genome sequenced by single-molecule real-time (SMRT) long-read sequencing. We found 3114 genes differentially expressed during the eight phases with phase-specific expression pattern. Using a comprehensive transcriptome dataset, we demonstrated that 57, 49 and 77 DEGs were respectively related to morphogenesis, signal pathways and neurogenesis. 49 DEGs related to signal pathways included 30 wnt genes, 14 notch genes, and 5 hedgehog genes. In particular, we found a cluster consisting of 7 genes related to signal pathways as well as synapses, which were essential for regulating embryonic development. Eight genes cooperatively participated in regulating neurogenesis. Our results reveal the whole picture of W. pigra development mechanism from the perspective of transcriptome and provide new clues for organogenesis and neurodevelopmental studies of Annelida species.