Alternative splicing (AS) is an important mechanism of posttranscriptional modification and dynamically regulates multiple physiological processes in plants, including fruit ripening. However, little ...is known about alternative splicing during fruit development in fleshy fruits. We studied the alternative splicing at the immature and ripe stages during fruit development in cucumber, melon, papaya and peach. We found that 14.96-17.48% of multiexon genes exhibited alternative splicing. Intron retention was not always the most frequent event, indicating that the alternative splicing pattern during different developmental process differs. Alternative splicing was significantly more prevalent at the ripe stage than at the immature stage in cucumber and melon, while the opposite trend was shown in papaya and peach, implying that developmental stages adopt different alternative splicing strategies for their specific functions. Some genes involved in fruit ripening underwent stage-specific alternative splicing, indicating that alternative splicing regulates fruits ripening. Conserved alternative splicing events did not appear to be stage-specific. Clustering fruit developmental stages across the four species based on alternative splicing profiles resulted in species-specific clustering, suggesting that diversification of alternative splicing contributes to lineage-specific evolution in fleshy fruits. We obtained high quality transcriptomes and alternative splicing events during fruit development across the four species. Dynamics and nonconserved alternative splicing were discovered. The candidate stage-specific AS genes involved in fruit ripening will provide valuable insight into the roles of alternative splicing during the developmental processes of fleshy fruits.
Abstract
Background
With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic ...sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge.
Results
We developed a new method, a colored superbubble (
cSupB
), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally,
cSupB
provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that
cSupB
can adapt to the complex cycle structure.
Conclusions
Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the
cSupB
method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at
https://github.com/eggleader/cSupB
.
Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated ...with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species. We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS (https://github.com/CMB-BNU/PLAINS.git), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing. Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.
Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental ...questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.
Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in ...large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC).
Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS). HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.
Population-specific, positive selection promotes the diversity of populations, and drives local adaptations in the population. However, little is known about population-specific, recent positive ...selection in the populations of cultivated cucumber (Cucumis sativus L.). Based on a genomic variation map of individuals worldwide, we implemented a Fisher's combination method by combining four haplotype-based approaches: integrated haplotype score (iHS), number of segregating sites by length (nSL), cross-population extended haplotype homozygosity (XP-EHH), and Rsb. Overall, we detected 331, 2,147, and 3,772 population-specific, recent positive selective sites in the East Asian, Eurasian, and Xishuangbanna populations, respectively. Moreover, we found these sites were related to processes for reproduction, response to abiotic and biotic stress, and regulation of developmental processes, indicating adaptations to their microenvironments. Meanwhile, the selective genes associated with traits of fruits were also observed, such as the gene related to the shorter fruit length in Eurasian and the gene controlling flesh thickness in the Xishuangbanna. In addition, we noticed that soft sweeps were common in the East Asian and Xishuangbanna populations. Genes involved in hard or soft sweeps were related to developmental regulation and abiotic and biotic stress resistance. Our study offers a comprehensive candidate dataset of population-specific, selective signatures in cultivated cucumber populations. Our methods provide guidance for the analysis of population-specific, positive selection. These findings will help explore the biological mechanisms of adaptation and domestication of cucumber.
The mechanisms underlying the organization and evolution of the telencephalic pallium are not yet clear.. To address this issue, we first performed comparative analysis of genes critical for the ...development of the pallium (Emx1/2 and Pax6) and subpallium (Dlx2 and Nkx1/2) among 500 vertebrate species. We found that these genes have no obvious variations in chromosomal duplication/loss, gene locus synteny or Darwinian selection. However, there is an additional fragment of approximately 20 amino acids in mammalian Emx1 and a poly-(Ala)
in Emx2. Lentiviruses expressing mouse or chick Emx2 (m-Emx2 or c-Emx2 Lv) were injected into the ventricle of the chick telencephalon at embryonic Day 3 (E3), and the embryos were allowed to develop to E12-14 or to posthatchling. After transfection with m-Emx2 Lv, the cells expressing Reelin, Vimentin or GABA increased, and neurogenesis of calbindin cells changed towards the mammalian inside-out pattern in the dorsal pallium and mesopallium. In addition, a behavior test for posthatched chicks indicated that the passive avoidance ratio increased significantly. The study suggests that the acquisition of an additional fragment in mammalian Emx2 is associated with the organization and evolution of the mammalian pallium.
Alternative splicing (AS) plays a critical regulatory role in modulating transcriptome and proteome diversity. In particular, it increases the functional diversity of proteins. Recent genome-wide ...analysis of AS using RNA-Seq has revealed that AS is highly pervasive in plants. Furthermore, it has been suggested that most AS events are subject to tissue-specific regulation.
To reveal the functional characteristics induced by AS and tissue-specific splicing events, a database for exploring these characteristics is needed, especially in plants. To address these goals, we constructed a database of annotated transcripts generated by alternative splicing in cucumbers (CuAS: http://cmb.bnu.edu.cn/alt_iso/index.php) that integrates genomic annotations, isoform-level functions, isoform-level features, and tissue-specific AS events among multiple tissues. CuAS supports a retrieval system that identifies unique IDs (gene ID, isoform ID, UniProt ID, and gene name), chromosomal positions, and gene families, and a browser for visualization of each gene.
We believe that CuAS could be helpful for revealing the novel functional characteristics induced by AS and tissue-specific AS events in cucumbers. CuAS is freely available at http://cmb.bnu.edu.cn/alt_iso/index.php.
Alternative splicing (AS) is an important post-transcriptional process. It has been suggested that most AS events are subject to tissue-specific regulation. However, the global dynamics of AS in ...different tissues are poorly explored.
To analyse global changes in AS in multiple tissues, we identified the AS events and constructed a comprehensive catalogue of AS events within each tissue based on the genome-wide RNA-seq reads from ten tissues in cucumber. First, we found that 58% of the multi-exon genes underwent AS. We further obtained 565 genes with significantly more AS events compared with random genes. These genes were found significant enrichment in biological processes related to the regulation of actin filament length. Second, significantly different AS event profiles among ten tissues were found. The tissues with the same origin of development are more likely to have a relatively similar AS profile. Moreover, 7370 genes showed tissue-specific AS events and were highly enriched in biological processes related to the positive regulation of cellular component organization. Root-specificity AS genes were related to the cellular response to DNA damage stimulus. Third, the genes with different intron retention (IR) patterns among the ten tissues showed significant difference in GC percentages of the retained intron, and the number of exons and FPKM of the major transcripts.
Our study provided a comprehensive view of AS in multiple tissues. We revealed novel insights into the patterns of AS in multiple tissues and the tissue-specific AS in cucumber.
Alternative splicing is crucial for a wide range of biological processes. However, limited by the availability of reference genomes, genome-wide patterns of alternative splicing remain unknown in ...most nonmodel organisms. We present an attention-based convolutional neural network model, DeepASmRNA, for predicting alternative splicing events using only transcriptomic data. DeepASmRNA consists of two parts: identification of alternatively spliced transcripts and classification of alternative splicing events, which outperformed the state-of-the-art method, AStrap, and other deep learning models. Then, we utilize transfer learning to increase the performance in species with limited training data and use an interpretation method to decipher splicing codes. Finally, applying Amborella, DeepASmRNA can identify more AS events than AStrap while maintaining the same level of precision, suggesting that DeepASmRNA has superior sensitivity to identify alternative splicing events. In summary, DeepASmRNA is scalable and interpretable for detecting genome-wide patterns of alternative splicing in species without a reference genome.
Display omitted
•DeepASmRNA uses only the transcriptome to predict alternative splicing events•DeepASmRNA identifies adjacent HSPs to greatly improve the recall•DeepASmRNA uses attention-based convolutional neural network to classify AS events•Transfer learning is used to increase the predictive power of a target species
Biological sciences; Molecular biology; Molecular biology experimental approach; Artificial intelligence; Artificial intelligence applications