Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in ...2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy.
Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform.
All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
•This is the first report on metabolic gene clusters in paper mulberry silage.•Ensiling causes a shift in dominant bacteria from Gram-negative to Gram-positive.•Enterobacter and Lactobacillus species ...determine silage fermentation quality.•PacBio sequencing reveals microbial dynamics during silage fermentation.•Paper mulberry can be used to prepare high-quality nutrient-rich silage.
To develop a new high-protein woody forage resource for livestock to alleviate feed shortages in the tropics, we applied PacBio single-molecule, real-time (SMRT) sequencing to explore the community structure, species diversity and metabolic gene clusters of natural microorganisms associated with paper mulberry (PM) silage fermentation. High levels of microbial diversity and abundance were observed in PM raw material, and these levels decreased with the progression of silage fermentation. During woody ensiling, the dominant bacteria shifted from pathogenic Gram-negative Proteobacteria to beneficial Gram-positive Firmicutes. Lactic acid bacteria became the most dominant bacteria that affected fermentation quality in terminal silages. Global and overview maps, carbohydrate metabolism and amino-acid metabolism were the important microbial metabolic pathways that impacted the final fermentation product of silage. PacBio SMRT sequencing revealed specific microbial-related information concerning silage. PM is rich in nutrients and macro mineral contents, which are preserved well during ensiling, indicating that PM silage can serve as a new woody resource suitable for ruminants.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that ...combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.
Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.
The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.
We report the first whole genome sequence (WGS) assembly and annotation of a dwarf coconut variety, 'Catigan Green Dwarf' (CATD). The genome sequence was generated using the PacBio SMRT sequencing ...platform at 15X coverage of the expected genome size of 2.15 Gbp, which was corrected with assembled 50X Illumina paired-end MiSeq reads of the same genome. The draft genome was improved through Chicago sequencing to generate a scaffold assembly that results in a total genome size of 2.1 Gbp consisting of 7,998 scaffolds with N50 of 570,487 bp. The final assembly covers around 97.6% of the estimated genome size of coconut 'CATD' based on homozygous k-mer peak analysis. A total of 34,958 high-confidence gene models were predicted and functionally associated to various economically important traits, such as pest/disease resistance, drought tolerance, coconut oil biosynthesis, and putative transcription factors. The assembled genome was used to infer the evolutionary relationship within the palm family based on genomic variations and synteny of coding gene sequences. Data show that at least three (3) rounds of whole genome duplication occurred and are commonly shared by these members of the
family. A total of 7,139 unique SSR markers were designed to be used as a resource in marker-based breeding. In addition, we discovered 58,503 variants in coconut by aligning the Hainan Tall (HAT) WGS reads to the non-repetitive regions of the assembled CATD genome. The gene markers and genome-wide SSR markers established here will facilitate the development of varieties with resilience to climate change, resistance to pests and diseases, and improved oil yield and quality.
Tripsacum dactyloides (2n = 4x = 72) and Zea perennis (2n = 4x = 40) are tertiary gene pools of Zea mays L. and exhibit many abiotic adaptations absent in modern maize, especially salt tolerance. A ...previously reported allopolyploid (hereafter referred to as MTP, 2n = 74) synthesized using Zea mays, Tripsacum dactyloides, and Zea perennis has even stronger salt tolerance than Z. perennis and T. dactyloides. This allopolyploid will be a powerful genetic bridge for the genetic improvement of maize. However, the molecular mechanisms underlying its salt tolerance, as well as the key genes involved in regulating its salt tolerance, remain unclear.
Single-molecule real-time sequencing and RNA sequencing were used to identify the genes involved in salt tolerance and reveal the underlying molecular mechanisms. Based on the SMRT-seq results, we obtained 227,375 reference unigenes with an average length of 2300 bp; most of the unigenes were annotated to Z. mays sequences (76.5%) in the NR database. Moreover, a total of 484 and 1053 differentially expressed genes (DEGs) were identified in the leaves and roots, respectively. Functional enrichment analysis of DEGs revealed that multiple pathways responded to salt stress, including "Flavonoid biosynthesis," "Oxidoreductase activity," and "Plant hormone signal transduction" in the leaves and roots, and "Iron ion binding," "Acetyl-CoA carboxylase activity," and "Serine-type carboxypeptidase activity" in the roots. Transcription factors, such as those in the WRKY, B3-ARF, and bHLH families, and cytokinin negatively regulators negatively regulated the salt stress response. According to the results of the short time series-expression miner analysis, proteins involved in "Spliceosome" and "MAPK signal pathway" dynamically responded to salt stress as salinity changed. Protein-protein interaction analysis revealed that heat shock proteins play a role in the large interaction network regulating salt tolerance.
Our results reveal the molecular mechanism underlying the regulation of MTP in the response to salt stress and abundant salt-tolerance-related unigenes. These findings will aid the retrieval of lost alleles in modern maize and provide a new approach for using T. dactyloides and Z. perennis to improve maize.
•Obtain a high-quality Ginkgo seed development-related transcript data set.•Reveals the diversity of TTLs regulation modes in ginkgo seed development.•Further optimized the structure and function ...annotations of the ginkgo genome.
Full-length transcriptome sequencing based on the PacBio sequencing platform could significantly optimize the annotation of gene structures. As an ancient relic gymnosperm in the monotypic order Ginkgoales, Ginkgo biloba L. contains rich terpenoids that are medicinally valuable. The seeds have abundant edible endosperm, which is delicious and of high nutritional value. However, existing molecular studies on the developmental process of ginkgo seeds are relatively weak, and the biosynthesis of terpenoids in seeds has received little attention. Therefore, single-molecule real-time (SMRT) technology and Illumina sequencing were combined to sequence six tissues related to the reproductive growth and development of ginkgo in order to generate a high-quality full-length transcription database. In total, 20.98 Gb of clean reads containing 178,548 full-length non-chimeric (FLNC) sequences were obtained. From these data, 4019 novel genes and 22,845 novel isoforms were predicted, 52.32 % of the novel genes were annotated, and three novel isoforms were annotated in terpene synthesis related pathways. The enrichment analysis of differentially expressed genes (DEGs) showed that, 95 genes were enriched into 21 categories related to seed development, and 47 DEGs were enriched in the skeletal pathway of terpene synthesis. Combined with the real-time quantitative reverse transcription PCR (qRT-PCR), the phosphosynthase family members synthesizing terpene precursors have diverse and complex expression trends during seed development. Our findings confirm the advantages of SMRT, which facilitated the construction a rich transcript data-set for research on the development of ginkgo seeds, enriching the annotation of the ginkgo genome, and enhancing our understanding of gene regulation of terpene biosynthesis in ginkgo seeds.
Introduction
The α‐globin fusion gene between the HBA2 and HBAP1 genes becomes clinically important in thalassemia screening because this fusion gene can cause severe hemoglobin (Hb) H disease when ...combining with α0‐thalassemia (α0‐thal). Due to its uncommon rearrangement in the α gene cluster without dosage changes, this fusion gene is undetectable by common molecular testing approaches used for α‐thal diagnosis.
Methods
In this study, we used the single‐molecule real‐time (SMRT) sequencing technique to detect this fusion gene in 23 carriers identified by next‐generation sequencing (NGS) among 16,504 screened individuals. Five primers for α and β thalassemia were utilized.
Results
According to the NGS results, the 23 carriers include 14 pure heterozygotes, eight compound heterozygotes with common α‐thal alleles, and one homozygote. By using SMRT, the fusion mutant was successfully detected in all 23 carriers. Furthermore, SMRT corrected the diagnosis in two “pure” heterozygotes: one was compound heterozygote with anti‐3.7 triplication, and the other was homozygote.
Conclusion
Our results indicate that SMRT is a superior method compared to NGS in detecting the α fusion gene, attributing to its efficient, accurate, and one‐step properties.
Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these ...fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species.
A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus.
This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.
It is widely known that transcriptional diversity contributes greatly to biological regulation in eukaryotes. With the development of next-generation sequencing (NGS) technologies, several studies on ...RNA sequencing have considerably improved our understanding of transcriptome complexity. However, obtaining full-length (FL) transcripts remains a considerable challenge because of difficulties in short read-based assembly. In the present study, single-molecule real-time (SMRT) sequencing and NGS were combined to generate the complete and FL transcriptome of Manis javanica. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of the M. javanica genome. We obtained 45,530 high-confidence transcripts from 19,109 genic loci, of which 8014 genes have not yet been annotated within the M. javanica genome. Furthermore, we revealed 8824 long-chain noncoding RNAs (lncRNAs). A total of 30,199 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events were identified in the sequencing data. The structure and expression level of 59 digestive enzyme genes, including 13 carbohydrase genes, 28 lipase genes and 18 protease genes, were analyzed, which might provide original data for further research on M. javanica.
•It was the first time to report that single-molecule real-time (SMRT) sequencing of Manis javanica.•SMRT sequencing was firstly combined with next-generation sequencing to generate the complete and FL transcriptome of Manis javanica.•We analyzed the expression level and structure of the digestive enzyme genes in the stomach, salivary glands, pancreas, large intestine and liver of Manis javanica.
Posttranscriptional processing of precursor mRNAs contributes to transcriptome and protein diversity and gene regulatory mechanisms in eukaryotes. However, this posttranscriptional mechanism has not ...been studied in the marine macroalgae Gracilariopsis lemaneiformis, which is the most cultivated red seaweed species in China.
In the present study, third-generation sequencing (Pacific Biosciences single-molecule real-time long-read sequencing, SMRT-Seq) was used to sequence the full-length transcriptome of G. lemaneiformis to identify alternatively spliced transcripts and alternative polyadenylation (APA) sites in this species. RNAs were isolated from G. lemaneiformis under various treatments including abiotic stresses and exogenous phytohormones, and then equally pooled for SMRT-Seq. In summary, 346,544 full-length nonchimeric reads were generated, from which 13,630 unique full-length transcripts were obtained in G. lemaneiformis. Compared with the known splicing events in the gene models, more than 3000 new alternative splicing (AS) events were identified in the SMRT-Seq reads. Additionally, 810 genes were found to have poly (A) sites and 91 microRNAs (miRNAs), 961 long noncoding RNAs and 1721 novel genes were identified in G. lemaneiformis. Moreover, validation experiments showed that abiotic stresses and phytohormones could induce some specific AS events, especially intron retain isoforms, cause some alterations to the relative ratios of transcripts annotated to the same gene, and generate novel 3' ends because of differential APA. The growth of G. lemaneiformis was inhibited by Cu stress, while this inhibition was alleviated by ACC treatment. RNA-Seq analysis further revealed that 211 differential alternative splicing (DAS) events and 142 DAS events was obtained in CK vs Cu and Cu vs Cu + ACC, respectively, suggesting that AS of functional genes could be regulated by Cu stress and ACC. Compared with Cu stress, the expression of transcripts with DAS events mainly involved in the carbon fixation in photosynthetic organisms and oxidative phosphorylation pathway was upregulated in Cu + ACC treatment, revealing that ACC alleviated the growth inhibition by Cu stress by increasing carbon fixation and oxidative phosphorylation.
Our results provide the first comprehensive picture of the full-length transcriptome and posttranscriptional mechanism in red macroalgae, including transcripts that appeared in the presence of common abiotic stresses and phytohormones, which will improve the gene annotations of Gracilariopsis and contribute to the study of gene regulation in this important cultivated seaweed.