Abstract
Motivation
Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq ...has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3′ ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites.
Results
We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3′ UTRs and 3′ UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome.
Availability and implementation
Freely available for download at https://apatrap.sourceforge.io.
Supplementary information
Supplementary data are available at Bioinformatics online.
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome ...complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3′ untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Historically, sharing T cell receptors (TCRs) between individuals has been speculated to be impossible, consider- ing the dramatic discrepancy between the potential enormity of the TCR repertoire and ...the limited number of T cells generated in each individual. However, public T cell response, in which multiple individuals share identical TCRs in responding to a same antigenic epitope, has been extensively observed in a variety of immune responses across many species. Public T cell responses enable individuals within a population to generate similar antigen-specific TCRs against certain ubiquitous pathogens, leading to favorable biological outcomes. However, the relatively concentrated feature of TCR repertoire may limit T cell response in a population to some other pathogens. It could be a great ben- efit for human health if public T cell responses can be manipulated. Therefore, the mechanistic insight of public TCR generation is important to know. Recently, high-throughput DNA sequencing has revolutionized the study of immune receptor repertoires, which allows a much better understanding of the factors that determine the overlap of TCR repertoire among individuals. Here, we summarize the current knowledge on public T-cell response and discuss fu- ture challenges in this field.
Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures--hairpins and cruciforms that are involved in many important biological ...processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Vivipary is a rare sexual reproduction phenomenon where embryos germinate directly on the maternal plants. However, it is a common genetic event of woody mangroves in the Rhizophoraceae family. The ...ecological benefits of vivipary in mangroves include the nurturing of seedlings in harsh coastal and saline environments, but the genetic and molecular mechanisms of vivipary remain unclear. Here we investigate the viviparous embryo development and germination processes in mangrove
by a transcriptomic approach. Many key biological pathways and functional genes were enriched in different tissues and stages, contributing to vivipary. Reduced production of abscisic acid set a non-dormant condition for the embryo to germinate directly. Genes involved in the metabolism of and response to other phytohormones (gibberellic acid, brassinosteroids, cytokinin, and auxin) are expressed precociously in the axis of non-vivipary stages, thus promoting the embryo to grow through the seed coat. Network analysis of these genes identified the central regulatory roles of
and
, which maintain embryo identity in Arabidopsis. Moreover, photosynthesis related pathways were significantly up-regulated in viviparous embryos, and substance transporter genes were highly expressed in the seed coat, suggesting a partial self-provision and maternal nursing. We conclude that the viviparous phenomenon is a combinatorial result of precocious loss of dormancy and enhanced germination potential during viviparous seed development. These results shed light on the relationship between seed development and germination, where the continual growth of the embryo replaces a biphasic phenomenon until a mature propagule is established.
Restoration through planting is the dominant strategy to conserve mangrove ecosystems. However, many of the plantations fail to survive. Site and seeding selection matters for planting. Combined ...phenotypic analyses and next-generation sequencing, we found phenotypic discrepancies among individuals from different populations in the common garden and genetic differentiation among populations. The central population with abundant genetic diversity and high phenotypic plasticity had a wide plantable range. But its biomass was reduced after being transferred to other latitudes. The suppressed expression of lignin biosynthesis genes revealed by RNA-seq was responsible for the biomass reduction. Moreover, using whole-genome bisulfite sequencing, we observed modification of DNA methylation in MADS-box genes that involved in the regulation of flowering time, which might contribute to the adaptation to new environments. Taking advantage of classical ecological experiments as well as multi-omics analyses, our work observed morphology differences and genetic differentiation among different populations of K. obovata, offering scientific advice for the development of restoration strategy with long-term efficacy, also explored phenotypic, transcript, and epigenetic responses of plants to transplanting events between latitudes.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes. They are known to critically influence the process of genome evolution and play a role in gene regulation. ...As the first study concentrated in the transposition activities of MITEs among different ecotype accessions within a species, we conducted a genome-wide comparative analysis by characterizing and comparing MITEs in 19 Arabidopsis thaliana accessions. A total of 343485 MITE putative sequences, including canonical, diverse and partial ones, were delineated from all 19 accessions. Within the entire population of MITEs sequences, 80.7% of them were previously unclassified MITEs, demonstrating a different genomic distribution and functionality compared to the classified MITEs. The interactions between MITEs and homologous genes across 19 accessions provided a fine source for analyzing MITE transposition activities and their impacts on genome evolution. Moreover, a significant proportion of MITEs were found located in the last exon of genes besides the ordinary intron locality, thus potentially modifying the end of genes. Finally, analysis of the impact of MITEs on gene expression suggests that migrations of MITEs have no detectable effect on the expression level for host genes across accessions.
Transcriptional networks are tightly controlled in plant development and stress responses. Alternative polyadenylation (APA) has been found to regulate gene expression under abiotic stress by ...increasing the heterogeneity at mRNA 3'-ends. Heavy metals like cadmium pollute water and soil due to mining and industry applications. Understanding how plants cope with heavy metal stress remains an interesting question. The Arabidopsis root hair was chosen as a single cell model to investigate the functional role of APA in cadmium stress response. Primary root growth inhibition and defective root hair morphotypes were observed. Poly(A) tag (PAT) libraries from single cell types, i.e., root hair cells, non-hair epidermal cells, and whole root tip under cadmium stress were prepared and sequenced. Interestingly, a root hair cell type-specific gene expression under short term cadmium exposure, but not related to the prolonged treatment, was detected. Differentially expressed poly(A) sites were identified, which largely contributed to altered gene expression, and enriched in pentose and glucuronate interconversion pathways as well as phenylpropanoid biosynthesis pathways. Numerous genes with poly(A) site switching were found, particularly for functions in cell wall modification, root epidermal differentiation, and root hair tip growth. Our findings suggest that APA plays a functional role as a potential stress modulator in root hair cells under cadmium treatment.
Rapid growth of single-cell sequencing techniques enables researchers to investigate almost millions of cells with diverse properties in a single experiment. Meanwhile, it also presents great ...challenges for selecting representative samples from massive single-cell populations for further experimental characterization, which requires a robust and compact sampling with balancing diverse properties of different priority levels. The conventional sampling methods fail to generate representative and generalizable subsets from a massive single-cell population or more complicated ensembles. Here, we present a toolkit called Cookie which can efficiently select out the most representative samples from a massive single-cell population with diverse properties. This method quantifies the relationships/similarities among samples using their Manhattan distances by vectorizing all given properties and then determines an appropriate sample size by evaluating the coverage of key properties from multiple candidate sizes, following by a k-medoids clustering to group samples into several clusters and selects centers from each cluster as the most representatives. Comparison of Cookie with conventional sampling methods using a single-cell atlas dataset, epidemiology surveillance data, and a simulated dataset shows the high efficacy, efficiency, and flexibly of Cookie. The Cookie toolkit is implemented in R and is freely available at
https://wilsonimmunologylab.github.io/Cookie/
.
Alternative polyadenylation (APA) occurs in the process of mRNA maturation by adding a poly(A) tail at different locations, resulting increased diversity of mRNA isoforms and contributing to the ...complexity of gene regulatory network. Benefit from the development of high-throughput sequencing technologies, we could now delineate APA profiles of transcriptomes at an unprecedented pace. Especially the single cell RNA sequencing (scRNA-seq) technologies provide us opportunities to interrogate biological details of diverse and rare cell types. Despite increasing evidence showing that APA is involved in the cell type-specific regulation and function, efficient and specific laboratory methods for capturing poly(A) sites at single cell resolution are underdeveloped to date. In this review, we summarize existing experimental and computational methods for the identification of APA dynamics from diverse single cell types. A future perspective is also provided.