As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to ...apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years’ worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.
The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a ...number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.
A central strategy of synthetic biology is to understand the basic processes of living creatures through engineering organisms using the same building blocks. Biological machines described in terms ...of parts can be studied by computer simulation in any of several languages or robotically assembled in vitro. In this paper we present a language, the Genetic Circuit Description Language (GCDL) and a compiler, the Genetic Circuit Compiler (GCC). This language describes genetic circuits at a level of granularity appropriate both for automated assembly in the laboratory and deriving simulation code. The GCDL follows Semantic Web practice and the compiler makes novel use of the logical inference facilities that are therefore available. We present the GCDL and compiler structure as a study of a tool for generating \(\kappa\)-language simulations from semantic descriptions of genetic circuits.
The nucleotide sequence of
clpX, which is localized between the
tig (trigger factor) and the
lon (ATP-dependent protease) genes at 245° on the standard
Bacillus subtilis (Bs) genetic map, was ...determined. The putative
clpX gene codes for a 46-kDa protein of 421 amino acid (aa) residues. A comparison of the deduced aa sequence with those of the recently described bacterial
clpX gene products from
Synechocystis sp.,
Escherichia coli (Ec), Haemophilus influenzae and
Azotobacter vinelandii revealed strong similarities. However, in contrast to
Ec, clpX and
clpP of
Bs are located at different loci on the chromosome and are transcribed as monocistronic genes. A heat-inducible
σ
A-like promoter was mapped upstream of the
clpX structural gene, but no CIRCE element, characteristic of class-I heat-shock genes (e.g.,
groESL and
dnaK), was found between the transcriptional and translational start sites. Although the majority of the heat-inducible general stress genes in
Bs are under the control of the alternative sigma factor,
σ
B, the heat induction of
clpX appears to be
σ
B-independent. The latter indicates that
clpX belongs to class-III heat-inducible genes.
Engineering bacterial populations for pattern formation Sutantyo, Daniel; Walker, Christopher; deBono, Nicholas ...
2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB),
2016-Oct.
Conference Proceeding
The automated design of synthetic biological circuits is an active area of research. A particularly promising area of research is the engineering of populations of communicating bacteria, in order to ...produce behaviour more complex than is possible with the engineering of individual bacteria. We present a computational approach to the engineering of communicating bacterial populations, using a multi-level approach. Circuits are designed using an evolutionary algorithm, at a high level of abstraction, with an agent-based model. Evolved agents can then be mapped onto previously-defined, lower-level components such as Standard Virtual Parts. This approach is applied to the evolution of a two-dimensional pattern, the French Flag.
Genome comparison and analysis can reveal the structures and junctions of genome sequences of different species. As more genomes are sequenced, genomic data sources are rapidly increasing such that ...their analysis is beyond the processing capabilities of most research institutes. The grid is a powerful solution to support large-scale genomic data processing and genome analysis. This paper presents the Microbase project that is developing a grid-based system for genome comparison and analysis, and discusses the first implementation of the system (called MicrobaseLite). MicrobaseLite uses a scalable computing environment to support computationally intensive microbial genome comparison and analysis, employing state-of-the-art technologies of Web services, notification, comparative genomics and parallel computing. Microbase will support not only system-defined genome comparison and analysis but also user-defined, remotely conceived genome analysis.
Severe acute respiratory syndrome coronavirus two (SARS-CoV-2), the virus responsible for the coronavirus disease 2019 (COVID-19) pandemic, represents an unprecedented global health challenge. ...Consequently, a large amount of research into the disease pathogenesis and potential treatments has been carried out in a short time frame. However, developing novel drugs is a costly and lengthy process, and is unlikely to deliver a timely treatment for the pandemic. Drug repurposing, by contrast, provides an attractive alternative, as existing drugs have already undergone many of the regulatory requirements. In this work we used a combination of network algorithms and human curation to search integrated knowledge graphs, identifying drug repurposing opportunities for COVID-19. We demonstrate the value of this approach, reporting on eight potential repurposing opportunities identified, and discuss how this approach could be incorporated into future studies.
The rate at which entire microbial genomes are being sequenced has accelerated rapidly over the past two years, promising to revolutionise our understanding of microbial molecular biology and ...genetics. The Bacillus subtilis genome sequence is the first complete genome of a free-living soil and rhizosphere bacterium. Data derived from the genome sequence and the systematic functional analysis programme, together with the wealth of knowledge already available for this organism, open up new opportunities to study the behaviour and ecology of this soil and plant growth-promoting rhizobacterium at the molecular level. In this review we examine the Bacillus subtilis 168 genome sequence in the light of clues it might provide for the role of this species in natural environments and discuss suitable methods for applying the available data and resources to the study of this and related organisms in natural systems.