Long nanopore reads are advantageous in de novo genome assembly. However, nanopore reads usually have broad error distribution and high-error-rate subsequences. Existing error correction tools cannot ...correct nanopore reads efficiently and effectively. Most methods trim high-error-rate subsequences during error correction, which reduces both the length of the reads and contiguity of the final assembly. Here, we develop an error correction, and de novo assembly tool designed to overcome complex errors in nanopore reads. We propose an adaptive read selection and two-step progressive method to quickly correct nanopore reads to high accuracy. We introduce a two-stage assembler to utilize the full length of nanopore reads. Our tool achieves superior performance in both error correction and de novo assembling nanopore reads. It requires only 8122 hours to assemble a 35X coverage human genome and achieves a 2.47-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line shows an NG50 of 22 Mbp. The high-quality assembly of nanopore reads can significantly reduce false positives in structure variation detection.
We present a tool that combines fast mapping, error correction, and de novo assembly (MECAT; accessible at https://github.com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. ...MECAT's computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer.
Calamansi or Philippine lime (Citrofortunella macrocarpa) is an important crop for local economic in Hainan Island. There is no study about Calamansi germplasm evaluation and cultivar development. In ...this study, Calamansi data were collected from 151 of Calamansi seedling trees, and 37 phenotypic traits were analyzed to investigate their genetic diversities. The cluster analysis and principal component analysis were conducted aiming to provide a theoretical basis for the Calamansi genetic improvement. The results of the diversity analysis revealed: (1) the diversity indexes for qualitative traits were ranged from 0.46-1.39, and the traits with the highest genetic diversity level were fruit shaped and pulp colored (H' > 1.20); and the diversity indexes for quantitative traits ranged from 0.67-2.10, with the exception of a lower in fruit juice rate (1.08) and lower in number of petals (0.67). (2) The clustering analysis of phenotypic traits have arranged the samples into 4 categories: the first group characterized by fewer flesh Segment number per fruit (SNF) and more Oil cell number (OCN); the second group had 7 samples, all characterized with larger Crown breadth (CB), higher Yield per tree (YPT), the lager leaf, the higher Ascorbic acid (AA), and less Seed number per fruit (SNPF); the third group had 25 samples characterized by smaller Tree foot diameter (TFD),smaller Fruit shape index (FSI) and higher Total soluble solids (TSS) contain; the fourth group had 87 samples, they were characterized by shorter Petiole length (PEL), larger fruit, higher Juice ratio (JR), multiple Stamen number (SN) and longer Pistil length (PIL). (3) The principal component analysis showed the values of the first 9 major components characteristic vectors were all greater than 3, the cumulative contribution rate reach 72.20%, including the traits of single fruit weight, fruit diameter, tree height, tree canopy width etc. Finally, based on the comprehensive main component value of all samples, the Calamansi individuals with higher testing scores were selected for further observation. This study concludes that Calamansi seedling populations in the Hainan Island holds great genetic diversity in varies traits, and can be useful for the Calamansi variety improvements.
Translational control is crucial in the regulation of gene expression and deregulation of translation is associated with a wide range of cancers and human diseases. Ribosome profiling is a technique ...that provides genome wide information of mRNA in translation based on deep sequencing of ribosome protected mRNA fragments (RPF). RPFdb is a comprehensive resource for hosting, analyzing and visualizing RPF data, available at www.rpfdb.org or http://sysbio.sysu.edu.cn/rpfdb/index.html. The current version of database contains 777 samples from 82 studies in 8 species, processed and reanalyzed by a unified pipeline. There are two ways to query the database: by keywords of studies or by genes. The outputs are presented in three levels. (i) Study level: including meta information of studies and reprocessed data for gene expression of translated mRNAs; (ii) Sample level: including global perspective of translated mRNA and a list of the most translated mRNA of each sample from a study; (iii) Gene level: including normalized sequence counts of translated mRNA on different genomic location of a gene from multiple samples and studies. To explore rich information provided by RPF, RPFdb also provides a genome browser to query and visualize context-specific translated mRNA. Overall our database provides a simple way to search, analyze, compare, visualize and download RPF data sets.
Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a ...method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.
The protein-coding genes and pseudogenes of Cuscuta australis had the diverse contribution to the formation and evolution of parasitism. The codon usage pattern analysis of these two type genes could ...be used to understand the gene transcription and translation. In this study, we systematically analyzed the codon usage patterns of protein-coding sequences and pseudogenes sequences in C. australis. The results showed that the high frequency codons of protein coding sequences and pseudogenes had the same A/U bias in the third position. However, these two sequences had converse bias at the third base in optimal codons: the protein coding sequences preferred G/C-ending codons while pseudogene sequences preferred A/U-ending codons. Neutrality plot and effective number of codons plot revealed that natural selection played a more important role than mutation pressure in two sequences codon usage bias. Furthermore, the gene expression level had a significant positive correlation with codon usage bias in C. australis. Highly-expressed protein coding genes exhibited a higher codon bias than lowly-expressed genes. Meanwhile, the high-expression genes tended to use G/C-ending synonymous codons. This result further verified the optimal codons usage bias and its correlation with the gene expression in C. australis.
•The codon usage bias between protein-coding sequences and pseudogenes was compared in C. australis.•The protein-coding sequences preferred G/C-ending codons while pseudogene sequences preferred A/U-ending codons.•The highly-expressed protein-coding genes exhibited a higher codon bias than lowly-expressed genes.
Some plant growth-promoting rhizobacteria (PGPR) regulated plant growth and elicited plant basal immunity by volatiles. The response mechanism to the Bacillus amyloliquefaciens volatiles in plant has ...not been well studied. We conducted global gene expression profiling in Arabidopsis after treatment with Bacillus amyloliquefaciens FZB42 volatiles by Illumina Digital Gene Expression (DGE) profiling of different growth stages (seedling and mature) and tissues (leaves and roots). Compared with the control, 1,507 and 820 differentially expressed genes (DEGs) were identified in leaves and roots at the seedling stage, respectively, while 1,512 and 367 DEGs were identified in leaves and roots at the mature stage. Seventeen genes with different regulatory patterns were validated using quantitative RT-PCR. Numerous DEGs were enriched for plant hormones, cell wall modifications, and protection against stress situations, which suggests that volatiles have effects on plant growth and immunity. Moreover, analyzes of transcriptome difference in tissues and growth stage using DGE profiling showed that the plant response might be tissue-specific and/or growth stage-specific. Thus, genes encoding flavonoid biosynthesis were downregulated in leaves and upregulated in roots, thereby indicating tissue-specific responses to volatiles. Genes related to photosynthesis were downregulated at the seedling stage and upregulated at the mature stage, respectively, thereby suggesting growth period-specific responses. In addition, the emission of bacterial volatiles significantly induced killing of cells of other organism pathway with up-regulated genes in leaves and the other three pathways (defense response to nematode, cell morphogenesis involved in differentiation and trichoblast differentiation) with up-regulated genes were significantly enriched in roots. Interestingly, some important alterations in the expression of growth-related genes, metabolic pathways, defense response to biotic stress and hormone-related genes were firstly founded response to FZB42 volatiles.
Background
Phalaenopsis
is an important ornamental plant that has great economic value in the world flower market as one of the most popular flower resources.
Objective
To investigate the flower ...colour formation of
Phalaenopsis
at the transcription level, the genes involved in flower color formation were identified from RNA-seq in this study.
Methods
In this study, white and purple petals of Phalaenopsis were collected and analyzed to obtained (1) differential expression genes (DEGs) between white and purple flower color and (2) the association between single nucleotide polymorphisms (SNP) mutations and DEGs at the transcriptome level.
Results
The results indicated that a total of 1,175 DEGs were identified, and 718 and 457 of them were up- and down-regulated genes, respectively. Gene Ontology and pathway enrichment showed that the biosynthesis of the secondary metabolites pathway was key to color formation, and the expression of 12 crucial genes (C4H, CCoAOMT, F3’H, UA3’5’GT, PAL, 4CL, CCR, CAD, CALDH, bglx, SGTase, and E1.11.17) that are involved in the regulation of flower color in
Phalaenopsis.
Conclusion
This study reported the association between the SNP mutations and DEGs for color formation at RNA level, and provides a new insight to further investigate the gene expression and its relationship with genetic variants from RNA-seq data in other species.
Understanding the genetic function of the forage quality-related traits, including crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), hemicellulose (HC), and cellulose ...(CL) contents, is essential for the identification of forage quality genes and selection of effective molecular markers in sorghum. In this study, we genotyped 245 sorghum accessions by 85,585 single-nucleotide polymorphisms (SNPs) and obtained the phenotypic data from four environments. The SNPs and phenotypic data were applied to multi-locus genome-wide association studies (GWAS) with the mrMLM software. A total of 42 SNPs were identified to be associated with the five forage quality-related traits. Moreover, three and two quantitative trait nucleotides (QTNs) were simultaneously detected among them by three and two multi-locus methods, respectively. One QTN on chromosome 5 was found to be associated simultaneously with CP, NDF, and ADF. Furthermore, 3, 2, 2, 5, and 2 candidate genes were identified to be responsible for CP, NDF, ADF, HC, and CL contents, respectively. These results provided insightful information of the forage quality-related traits and would facilitate the genetic improvement of sorghum forage quality in the future.