With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions. Towards ...this goal, a series of sequence-based methods have been proposed and applied to analyze various character-unknown DNA/RNA sequences in order for in-depth understanding their action mechanisms and processes. Compared with the classical sequence-based methods, the pseudo nucleotide composition or PseKNC approach developed very recently has the following advantages: (1) it can convert length-different DNA/RNA sequences into dimension-fixed digital vectors that can be directly handled by all the existing machine-learning algorithms or operation engines; (2) it can contain the desired features and properties according to the selection or definition of users; (3) it can cover considerable sequence pattern information, both local and global. This minireview is focused on the concept of pseudo nucleotide composition, its development and applications.
Generalized TAP Free Energy Chen, Wei‐Kuo; Panchenko, Dmitry; Subag, Eliran
Communications on pure and applied mathematics,
July 2023, 2023-07-00, 20230701, Letnik:
76, Številka:
7
Journal Article
There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for ...in-depth understanding of their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, each one could only be used to deal with some type of modification individually. To our knowledge, no method has thus far been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called “iRNA-PseColl” has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform’s web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide that will allow users to easily achieve their desired results without the need to go through the mathematical details involved in this paper.
Lipids perform multiple biological functions and reflect the physiology and pathology of cells, tissues, and organs. Here, we sought to understand lipid content in relation to tumor pathology by ...characterizing phospholipids and sphingolipids in the orthotopic mouse glioma using MALDI MS imaging (MSI) and LC-MS/MS. Unsupervised clustering analysis of the MALDI-MSI data segmented the coronal tumoral brain section into 10 histopathologically salient regions. Heterogeneous decrease of the common saturated phosphatidylcholines (PCs) in the tumor was accompanied by the increase of analogous PCs with one or two additional fatty acyl double bonds and increased lyso-PCs. Polyunsaturated fatty acyl-PCs and ether PCs highlighted the striatal tumor margins, whereas the distributions of other PCs differentiated the cortical and striatal tumor parenchyma. We detected a reduction of SM d18:1/18:0 and the heterogeneous mild increase of SM d18:1/16:0 in the tumor, whereas ceramides accumulated only in a small patch deep in the tumoral parenchyma. LC-MS/MS analyses of phospholipids and sphingolipids complemented the MALDI-MSI observation, providing a snapshot of these lipids in the tumor. Finally, the proposed mechanisms responsible for the tumoral lipid changes were contrasted with our interrogation of gene expression in human glioma. Together, these lipidomic results unveil the aberrant and heterogeneous lipid metabolism in mouse glioma where multiple lipid-associated signaling pathways underline the tumor features, promote the survival, growth, proliferation, and invasion of different tumor cell populations, and implicate the management strategy of a multiple-target approach for glioma and related brain malignancies.
RNA modifications are additions of chemical groups to nucleotides or their local structural changes. Knowledge about the occurrence sites of these modifications is essential for in-depth ...understanding of the biological functions and mechanisms and for treating some genomic diseases as well. With the avalanche of RNA sequences generated in the post-genomic age, many computational methods have been proposed for identifying various types of RNA modifications one by one. However, so far no method whatsoever has been developed for simultaneously identifying several different types of RNA modifications. To address such a challenge, we developed a predictor called “iRNA-3typeA,” by which we can simultaneously identify the occurrence sites of the following three most frequently observed modifications in RNA: (1) N1-methyladenosine (m1A), (2) N6-methyladenosine (m6A), and (3) adenosine to inosine (A-to-I). It has been shown via rigorous cross-validations for the RNA sequences from Homo sapiens and Mus musculus transcriptomes that the success rates achieved by the powerful new predictor are quite high. For the convenience of broad experimental scientists, a user-friendly web server for iRNA-3typeA has been established at http://lin-group.cn/server/iRNA-3typeA/. It is anticipated that iRNA-3typeA may become a useful high throughput tool for genome analysis.
DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in ...the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the "grey model" and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.
Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein–protein interactions such ...as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson’s, Alzheimer’s, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ⩾40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http://lin.uestc.edu.cn/server/iHSP-PseRAAAC.