The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open ...Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query ...and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate "promoter" parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.
Abstract
Motivation
Annotations of biochemical models provide details of chemical species, documentation of chemical reactions, and other essential information. Unfortunately, the vast majority of ...biochemical models have few, if any, annotations, or the annotations provide insufficient detail to understand the limitations of the model. The quality and quantity of annotations can be improved by developing tools that recommend annotations. For example, recommender tools have been developed for annotations of genes. Although annotating genes is conceptually similar to annotating biochemical models, there are important technical differences that make it difficult to directly apply this prior work.
Results
We present AMAS, a system that predicts annotations for elements of models represented in the Systems Biology Markup Language (SBML) community standard. We provide a general framework for predicting model annotations for a query element based on a database of annotated reference elements and a match score function that calculates the similarity between the query element and reference elements. The framework is instantiated to specific element types (e.g. species, reactions) by specifying the reference database (e.g. ChEBI for species) and the match score function (e.g. string similarity). We analyze the computational efficiency and prediction quality of AMAS for species and reactions in BiGG and BioModels and find that it has subsecond response times and accuracy between 80% and 95% depending on specifics of what is predicted. We have incorporated AMAS into an open-source, pip-installable Python package that can run as a command-line tool that predicts and adds annotations to species and reactions to an SBML model.
Availability and implementation
Our project is hosted at https://github.com/sys-bio/AMAS, where we provide examples, documentation, and source code files. Our source code is licensed under the MIT open-source license.
Abstract
Summary
As the number and complexity of biosimulation models grows, so do demands for tools that can help users better understand models and make those models more findable, shareable and ...reproducible. Consistent model annotation is a step toward these goals. Both models and tools are written in a variety of different languages; thus, the community has recognized the need for standard, language-independent methods for annotation. Based on the Computational Modeling in Biology Network community consensus, we introduce an open-source, cross-platform software library for semantic annotation of models.
Availability and implementation
libOmexMeta is freely available at https://github.com/sys-bio/libOmexMeta under the Apache License 2.0. A live demonstration is at github.com/sys-bio/pyomexmeta-binder-notebook.
Supplementary information
Supplementary data are available at Bioinformatics online.
Mathematics and Phy sics-based simulation models have the potential to help interpret and encapsulate biological phenomena in a computable and reproducible form. Similarly, comprehensive descriptions ...of such models help to ensure that such models are accessible, discoverable, and reusable. To this end, researchers have developed tools and standards to encode mathematical models of biological systems enabling reproducibility and reuse, tools and guidelines to facilitate semantic description of mathematical models, and repositories in which to archive, share, and discover models. Scientists can leverage these resources to investigate specific questions and hypotheses in a more efficient manner.
We have comprehensively annotated a cohort of models with biological semantics. These annotated models are freely available in the Physiome Model Repository (PMR). To demonstrate the benefits of this approach, we have developed a web-based tool which enables users to discover models relevant to their work, with a particular focus on epithelial transport. Based on a semantic query, this tool will help users discover relevant models, suggesting similar or alternative models that the user may wish to explore or use.
The semantic annotation and the web tool we have developed is a new contribution enabling scientists to discover relevant models in the PMR as candidates for reuse in their own scientific endeavours. This approach demonstrates how semantic web technologies and methodologies can contribute to biomedical and clinical research. The source code and links to the web tool are available at https://github.com/dewancse/model-discovery-tool.
The Protégé project has come a long way since Mark Musen first built the Protégé meta-tool for knowledge-based systems in 1987. The original tool was a small application, aimed at building ...knowledge-acquisition tools for a few specialized programs in medical planning. From this initial tool, the Protégé system has evolved into a durable, extensible platform for knowledge-based systems development and research. The current version, Protégé-2000, can be run on a variety of platforms, supports customized user-interface extensions, incorporates the Open Knowledge-Base Connectivity (OKBC) knowledge model, interacts with standard storage formats such as relational databases, XML, and RDF, and has been used by hundreds of individuals and research groups. In this paper, we follow the evolution of the Protégé project through three distinct re-implementations. We describe our overall methodology, our design decisions, and the lessons we have learned over the duration of the project. We believe that our success is one of infrastructure: Protégé is a flexible, well-supported, and robust development environment. Using Protégé, developers and domain experts can easily build effective knowledge-based systems, and researchers can explore ideas in a variety of knowledge-based domains.
Abstract
Motivation
Developing biochemical models in systems biology is a complex, knowledge-intensive activity. Some modelers (especially novices) benefit from model development tools with a ...graphical user interface. However, as with the development of complex software, text-based representations of models provide many benefits for advanced model development. At present, the tools for text-based model development are limited, typically just a textual editor that provides features such as copy, paste, find, and replace. Since these tools are not “model aware,” they do not provide features for: (i) model building such as autocompletion of species names; (ii) model analysis such as hover messages that provide information about chemical species; and (iii) model translation to convert between model representations. We refer to these as BAT features.
Results
We present VSCode-Antimony, a tool for building, analyzing, and translating models written in the Antimony modeling language, a human readable representation of Systems Biology Markup Language (SBML) models. VSCode-Antimony is a source editor, a tool with language-aware features. For example, there is autocompletion of variable names to assist with model building, hover messages that aid in model analysis, and translation between XML and Antimony representations of SBML models. These features result from making VSCode-Antimony model-aware by incorporating several sophisticated capabilities: analysis of the Antimony grammar (e.g. to identify model symbols and their types); a query system for accessing knowledge sources for chemical species and reactions; and automatic conversion between different model representations (e.g. between Antimony and SBML).
Availability and implementation
VSCode-Antimony is available as an open source extension in the VSCode Marketplace https://marketplace.visualstudio.com/items?itemName=stevem.vscode-antimony. Source code can be found at https://github.com/sys-bio/vscode-antimony.
The purpose of this study is to design and develop a probabilistic network for detecting errors in radiotherapy plans for use at the time of initial plan verification. Our group has initiated a ...multi-pronged approach to reduce these errors. We report on our development of Bayesian models of radiotherapy plans. Bayesian networks consist of joint probability distributions that define the probability of one event, given some set of other known information. Using the networks, we find the probability of obtaining certain radiotherapy parameters, given a set of initial clinical information. A low probability in a propagated network then corresponds to potential errors to be flagged for investigation. To build our networks we first interviewed medical physicists and other domain experts to identify the relevant radiotherapy concepts and their associated interdependencies and to construct a network topology. Next, to populate the network's conditional probability tables, we used the Hugin Expert software to learn parameter distributions from a subset of de-identified data derived from a radiation oncology based clinical information database system. These data represent 4990 unique prescription cases over a 5 year period. Under test case scenarios with approximately 1.5% introduced error rates, network performance produced areas under the ROC curve of 0.88, 0.98, and 0.89 for the lung, brain and female breast cancer error detection networks, respectively. Comparison of the brain network to human experts performance (AUC of 0.90 ± 0.01) shows the Bayes network model performs better than domain experts under the same test conditions. Our results demonstrate the feasibility and effectiveness of comprehensive probabilistic models as part of decision support systems for improved detection of errors in initial radiotherapy plan verification procedures.
...biological information from a new experiment may raise the importance of a variable hidden within a component. ...it becomes difficult to anticipate what information in a component should be ...exposed or hidden by an interface.\n For example, while one annotator may refer to the concept "D-glucose" from the ChEBI knowledge base 19, another may refer to the term "Glucose" from the Systematized Nomenclature of Medicine--Clinical Terms resource (SNOMED-CT 20). ...critical challenges for SAIM modeling include incentivizing model curators to annotate at this level of detail and developing intuitive software that minimizes annotation costs.
Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a ...machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen's semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the "Pandit-Hinch-Niederer" (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach.