Presently, there are numerous bioinformatics databases available on different websites. Although RDF was proposed as a standard format for the web, these databases are still available in various ...formats. With the increasing popularity of the semantic web technologies and the ever growing number of databases in bioinformatics, there is a pressing need to develop mashup systems to help the process of bioinformatics knowledge integration. Bio2RDF is such a system, built from rdfizer programs written in JSP, the Sesame open source triplestore technology and an OWL ontology. With Bio2RDF, documents from public bioinformatics databases such as Kegg, PDB, MGI, HGNC and several of NCBI’s databases can now be made available in RDF format through a unique URL in the form of
http://bio2rdf.org/namespace:id. The Bio2RDF project has successfully applied the semantic web technology to publicly available databases by creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Bio2RDF is based on a three-step approach to build mashups of bioinformatics data. The present article details this new approach and illustrates the building of a mashup used to explore the implication of four transcription factor genes in Parkinson’s disease. The Bio2RDF repository can be queried at
http://bio2rdf.org.
Circular RNAs (circRNAs) play important roles in regulating gene expression through binding miRNAs and RNA binding proteins. Genetic variation of circRNAs may affect complex traits/diseases by ...changing their binding efficiency to target miRNAs and proteins. There is a growing demand for investigations of the functions of genetic changes using large-scale experimental evidence. However, there is no online genetic resource for circRNA genes.
We performed extensive genetic annotation of 295,526 circRNAs integrated from circBase, circNet and circRNAdb. All pre-computed genetic variants were presented at our online resource, circVAR, with data browsing and search functionality. We explored the chromosome-based distribution of circRNAs and their associated variants. We found that, based on mapping to the 1000 Genomes and ClinVAR databases, chromosome 17 has a relatively large number of circRNAs and associated common and health-related genetic variants. Following the annotation of genome wide association studies (GWAS)-based circRNA variants, we found many non-coding variants within circRNAs, suggesting novel mechanisms for common diseases reported from GWAS studies. For cancer-based somatic variants, we found that chromosome 7 has many highly complex mutations that have been overlooked in previous research.
We used the circVAR database to collect SNPs and small insertions and deletions (INDELs) in putative circRNA regions and to identify their potential phenotypic information. To provide a reusable resource for the circRNA research community, we have published all the pre-computed genetic data concerning circRNAs and associated genes together with data query and browsing functions at http://soft.bioinfo-minzhao.org/circvar .
Introduction:
Treatment of rheumatoid arthritis (RA) has advanced with the introduction of biological disease-modifying antirheumatic drugs. However, more than 20% of patients with RA still have ...moderate or severe disease activity. Hence, novel antirheumatic drugs are required. Recently, drug repurposing, a process of identifying new indications for existing drugs, has received great attention. Furthermore, a few reports have shown that antipsychotics are capable of affecting several cytokines that are also modulated by existing antirheumatic drugs. Therefore, we investigated the association between antipsychotics and RA by data mining using real-world data and bioinformatics databases.
Methods:
Disproportionality and sequence symmetry analyses were employed to identify the associations between the investigational drugs and RA using the US Food and Drug Administration Adverse Event Reporting System (2004–2016) and JMDC administrative claims database (January 2005–April 2017; JMDC Inc., Tokyo, Japan), respectively. The reporting odds ratio (ROR) and information component (IC) were used in the disproportionality analysis to indicate a signal. The adjusted sequence ratio (SR) was used in the sequence symmetry analysis to indicate a signal. The bioinformatics analysis suite, BaseSpace Correlation Engine (Illumina, CA, USA) was employed to explore the molecular mechanisms associated with the potential candidates identified by the drug-repurposing approach.
Results:
A potential inverse association between the antipsychotic haloperidol and RA, which exhibited significant inverse signals with ROR, IC, and adjusted SR, was found. Furthermore, the results suggested that haloperidol may exert antirheumatic effects by modulating various signaling pathways, including cytokine and chemokine signaling, major histocompatibility complex class-II antigen presentation, and Toll-like receptor cascade pathways.
Conclusion:
Our drug-repurposing approach using data mining techniques identified haloperidol as a potential antirheumatic drug candidate.
Protein phosphorylation is one of the most pervasive protein post-translational modification events in plant cells. It is involved in many plant biological processes, such as plant growth, organ ...development, and plant immunology, by regulating or switching signaling and metabolic pathways. High-throughput experimental methods like mass spectrometry can easily characterize hundreds to thousands of phosphorylation events in a single experiment. With the increasing volume of the data sets, Plant Protein Phosphorylation DataBase (P3DB, http://p3db.org ) provides a comprehensive, systematic, and interactive online platform to deposit, query, analyze, and visualize these phosphorylation events in many plant species. It stores the protein phosphorylation sites in the context of identified mass spectra, phosphopeptides, and phosphoproteins contributed from various plant proteome studies. In addition, P3DB associates these plant phosphorylation sites to protein physicochemical information in the protein charts and tertiary structures, while various protein annotations from hierarchical kinase phosphatase families, protein domains, and gene ontology are also added into the database. P3DB not only provides rich information, but also interconnects and provides visualization of the data in networks, in systems biology context. Currently, P3DB includes the KiC (Kinase Client) assay network, the protein-protein interaction network, the kinase-substrate network, the phosphatase-substrate network, and the protein domain co-occurrence network. All of these are available to query for and visualize existing phosphorylation events. Although P3DB only hosts experimentally identified phosphorylation data, it provides a plant phosphorylation prediction model for any unknown queries on the fly. P3DB is an entry point to the plant phosphorylation community to deposit and visualize any customized data sets within this systems biology framework. Nowadays, P3DB has become one of the major bioinformatics platforms of protein phosphorylation in plant biology.
New Challenges for Biological Text-Mining in the Next Decade Dai, Hong-Jie; Chang, Yen-Ching; Tzong-Han Tsai, Richard ...
Journal of computer science and technology,
2010, 1-2010, 2010-1-00, 20100101, Letnik:
25, Številka:
1
Journal Article
Recenzirano
The massive flow of scholarly publications from traditional paper journals to online outlets has benefited biologists because of its ease to access. However, due to the sheer volume of available ...biological literature, researchers are finding it increasingly difficult to locate needed information. As a result, recent biology contests, notably JNLPBA and BioCreAtIvE, have focused on evaluating various methods in which the literature may be navigated. Among these methods, text-mining technology has shown the most promise. With recent advances in text-mining technology and the fact that publishers are now making the full texts of articles available in XML format, TMSs can be adapted to accelerate literature curation, maintain the integrity of information, and ensure proper linkage of data to other resources. Even so, several new challenges have emerged in relation to full text analysis, life-science terminology, complex relation extraction, and information fusion. These challenges must be overcome in order for text-mining to be more effective. In this paper, we identify the challenges, discuss how they might be overcome, and consider the resources that may be helpful in achieving that goal.
Optimizing multiple seeds for protein homology search Brown, D.G.
IEEE/ACM transactions on computational biology and bioinformatics,
2005-Jan.-March, 2005 Jan-Mar, 2005-01-00, 20050101, Letnik:
2, Številka:
1
Journal Article
Recenzirano
Odprti dostop
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds ...to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.
The traditional fish information database website has various resources, but it has some disadvantages including inconsistent data storage structure, scattered resource distribution, low query and ...utilization efficiency, which is not conducive to the analysis and research of fish species big data. Therefore, based on MySQL data management system, the Chinese freshwater fish information database (CFFID) is constructed. Through internet worm skill, rfishbase library, dismo library in R and python language, all information of freshwater fish in China is obtained from multiple databases, and stored in the CFFID, such as species Latin name, ChineseName, maxlength, environment and so on. The distribution range of Vanmanenia pingchowensis was predicted with the collected information of freshwater fish in China, further proving that the constructed CFFID is of great significance to the study of the distribution and evolution of freshwater fish.
Gastric Cancer (GC) is the third most common cause of cancer-related deaths in the world. Due to the lack of enough symptoms in early stages, it is diagnosed in advanced stages in the majority of ...patients and causes great rate of mortality. Early recognition of GC significantly raises the lucks for successful treatment. Molecular mechanisms of GC are still poorly understood. MiRNAs are small non-coding RNAs which regulate gene expression in post-transcription levels. In cancer cells, miRNAs have been found to be severely dysregulated. Using high-throughput (HTP) technologies such as RNA- Seq, the effects of miRNAs on cancers can be investigated. In this study, we retrieved miRNAs obtained by HTP method from OncoLnc database. Consequently, retrieved miRNAs were compared in literature-based databases such as PubMed. As a result, two lists including experimentally validated and predicted miRNAs were provided. We found 28 predicted miRNAs that so far had not been experimentally validated in GC. In the following, further bioinformatics analyses were performed to obtain the expression profile of both validated and predicted miRNAs in tumor and normal tissues. Also, the role of predicted miRNAs in other cancers, and their possible targets in apoptosis, metastasis and angiogenesis retrieved from related databases. By recognizing the miRNAs involved in initiation and progression of GC, they may be considered as potential biomarkers in GC early diagnosis or targeted treatment and lead to novel therapeutic strategies. We introduce 28 predicted miRNAs involved in GC pathogenesis by in silico analysis.