Enzyme annotation in UniProtKB using Rhea Morgat, Anne; Lombardot, Thierry; Coudert, Elisabeth ...
Bioinformatics,
2020-Mar-01, Letnik:
36, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Abstract
Motivation
To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes ...reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology.
Results
We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide.
Availability and implementation
UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.
Motivation: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the ...European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.
Availability:
http://www.ebi.ac.uk/rdf
Contact:
jupp@ebi.ac.uk
Abstract
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and ...industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.
Graphical Abstract
Graphical Abstract
ABSTRACT
During the last few years, next‐generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease‐associated genes. ...However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss‐Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss‐Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss‐Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
UniProtKB/Swiss‐Prot is a freely accessible, expertly curated knowledgebase that offers reliable information on proteins. This manuscript describes the manual curation ofgenetic variants and diseases. The variant/disease‐specific annotations are presented against the background of current physiological knowledge. This juxtaposition of physiological and pathological data can provide the researchers with a tool to elucidate the mechanisms leading from a molecular defect to a disease phenotype.
Abstract
Background
Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal ...annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.
Results
Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.
Conclusions
HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
The principles of the cavitation criteria for rubber particles in polymeric matrices are briefly reviewed. Although these criteria are based on a linear elastic analysis, it is shown that it is ...possible to extend them to take into account the elastic-plastic behaviour of the matrix. In this objective, the representative volume element of a periodic material was meshed and computations were performed using a finite element method. The results reported in this paper focus mainly on cavitation under uniaxial tension and examine the influence on the hydrostatic stress in the rubber particles of different parameters such as the volume fraction of rubber, the plastic behaviour of the matrix or the ratio of the elastic moduli. In all cases, plastic yielding in the matrix leads to saturation of the hydrostatic stress in the rubber phase. It is also shown that the history of cavitation barely influences the progression of plasticity in the matrix.
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of ...thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.
The equivalent inclusion method (EIM) assuming linear elasticity is used to calculate the mechanical interactions between spherical rubber particles in an amorphous matrix, as in a rubber toughened ...polymer. The influences of the various calculation parameters are examined and it is shown that the method can provide reliable results with regard to the level of hydrostatic stress in the particles. Damage of the material is simulated by replacing the most stressed particles by voids. Numerical simulations for several hundreds of interacting particles give information on the kinetics and spatial organisation of the damage. It appears that, as the volume fraction of particles increases from 10 to 20%, the spatial configuration of the damage evolves from a localised to a diffuse mode. These results are discussed in relation to the efficiency of rubber toughening.
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with ...extensive cross-references to external resources, that is freely available to the scientific community. To enable users of the knowledgebase to accurately assess the reliability of the information contained in this resource, the evidence for and provenance of the information must be recorded. This paper discusses the user requirements for this kind of metadata and the manner in which UniProtKB records it.