MicroRNAs (miRNAs) are small non-coding RNAs of approximately 22 nucleotides, which negatively regulate the gene expression at the post-transcriptional level. This study describes an update of the ...miRTarBase (http://miRTarBase.mbc.nctu.edu.tw/) that provides information about experimentally validated miRNA-target interactions (MTIs). The latest update of the miRTarBase expanded it to identify systematically Argonaute-miRNA-RNA interactions from 138 crosslinking and immunoprecipitation sequencing (CLIP-seq) data sets that were generated by 21 independent studies. The database contains 4966 articles, 7439 strongly validated MTIs (using reporter assays or western blots) and 348 007 MTIs from CLIP-seq. The number of MTIs in the miRTarBase has increased around 7-fold since the 2014 miRTarBase update. The miRNA and gene expression profiles from The Cancer Genome Atlas (TCGA) are integrated to provide an effective overview of this exponential growth in the miRNA experimental data. These improvements make the miRTarBase one of the more comprehensively annotated, experimentally validated miRNA-target interactions databases and motivate additional miRNA research efforts.
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs (typically consisting of 18–25 nucleotides) that negatively control expression of target genes at the post-transcriptional level. Owing to the ...biological significance of miRNAs, miRTarBase was developed to provide comprehensive information on experimentally validated miRNA–target interactions (MTIs). To date, the database has accumulated >13,404 validated MTIs from 11,021 articles from manual curations. In this update, a text-mining system was incorporated to enhance the recognition of MTI-related articles by adopting a scoring system. In addition, a variety of biological databases were integrated to provide information on the regulatory network of miRNAs and its expression in blood. Not only targets of miRNAs but also regulators of miRNAs are provided to users for investigating the up- and downstream regulations of miRNAs. Moreover, the number of MTIs with high-throughput experimental evidence increased remarkably (validated by CLIP-seq technology). In conclusion, these improvements promote the miRTarBase as one of the most comprehensively annotated and experimentally validated miRNA–target interaction databases. The updated version of miRTarBase is now available at http://miRTarBase.cuhk.edu.cn/.
Personal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and ...variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants.
In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy.
As more genome sequences become available, the demand for genome comparison is increasing. Therefore an efficient and robust algorithm is most desirable. We believe GSAlign can be a useful tool. It exhibits the abilities of ultra-fast alignment as well as high accuracy and sensitivity for detecting sequence variations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The programmable one-pot oligosaccharide synthesis method was designed to enable the rapid synthesis of a large number of oligosaccharides, using the software Optimer to search Building BLocks (BBLs) ...with defined relative reactivity values (RRVs) to be used sequentially in the one-pot reaction. However, there were only about 50 BBLs with measured RRVs in the original library and the method could only synthesize small oligosaccharides due to the RRV ordering requirement. Here, we increase the library to include 154 validated BBLs and more than 50,000 virtual BBLs with predicted RRVs by machine learning. We also develop the software Auto-CHO to accommodate more data handling and support hierarchical one-pot synthesis using fragments as BBLs generated by the one-pot synthesis. This advanced programmable one-pot method provides potential synthetic solutions for complex glycans with four successful examples demonstrated in this work.
Abstract
Background
Antimicrobial peptides (AMPs) are oligopeptides that act as crucial components of innate immunity, naturally occur in all multicellular organisms, and are involved in the first ...line of defense function. Recent studies showed that AMPs perpetuate great potential that is not limited to antimicrobial activity. They are also crucial regulators of host immune responses that can modulate a wide range of activities, such as immune regulation, wound healing, and apoptosis. However, a microorganism's ability to adapt and to resist existing antibiotics triggered the scientific community to develop alternatives to conventional antibiotics. Therefore, to address this issue, we proposed Co-AMPpred, an in silico-aided AMP prediction method based on compositional features of amino acid residues to classify AMPs and non-AMPs.
Results
In our study, we developed a prediction method that incorporates composition-based sequence and physicochemical features into various machine-learning algorithms. Then, the boruta feature-selection algorithm was used to identify discriminative biological features. Furthermore, we only used discriminative biological features to develop our model. Additionally, we performed a stratified tenfold cross-validation technique to validate the predictive performance of our AMP prediction model and evaluated on the independent holdout test dataset. A benchmark dataset was collected from previous studies to evaluate the predictive performance of our model.
Conclusions
Experimental results show that combining composition-based and physicochemical features outperformed existing methods on both the benchmark training dataset and a reduced training dataset. Finally, our proposed method achieved 80.8% accuracies and 0.871 area under the receiver operating characteristic curve by evaluating on independent test set. Our code and datasets are available at
https://github.com/onkarS23/CoAMPpred
.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical ...domain, relation detection between bio-entities known as the Bio-Entity Relation Extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or use external knowledge-based pre-/post-processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence-level classification tasks.
This article presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein interaction (PPI), drug-drug interaction and protein-bio-entity relation classification tasks by 0.02%, 11.2% and 41.4%, respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context.
Github. https://github.com/warikoone/LBERT.
Supplementary data are available at Bioinformatics online.
N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, ...glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE's final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/ .
Carbohydrates make up one of the four major classes of biomolecules, often conjugated with proteins as glycoproteins or with lipids as glycolipids, and participate in many important biochemical ...functions in living species. However, glycoproteins or glycolipids often exist as mixtures, and as a consequence, it is difficult to isolate individual glycoproteins or glycolipids as pure forms to understand the role carbohydrates play in the glycoconjugate. Currently, the only feasible way to obtain pure glycoconjugates is through synthesis, and of the many methods developed for the synthesis of oligosaccharides, those with automatic and programmable potential are considered to be more effective for addressing the issues of carbohydrate diversity and related functions. In this Perspective, we describe how data science, including algorithm and machine learning, can be used to assist the chemical synthesis of oligosaccharide in a programmable and one-pot manner and how the programmable method can be used to accelerate the construction of diverse oligosaccharides to facilitate our understanding of glycosylation in biology.
Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and ...accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads.
We develop a divide-and-conquer algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that the average size of fragments requiring the more time-consuming gapped alignment is around 20 bp regardless of the original read length. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%.
Kart is available at https://github.com/hsinnan75/Kart/ .
hsu@iis.sinica.edu.tw.
Supplementary data are available at Bioinformatics online.