What is systems biology? Breitling, Rainer
Frontiers in physiology,
01/2010, Letnik:
1
Journal Article
Recenzirano
Odprti dostop
Systems biology is increasingly popular, but to many biologists it remains unclear what this new discipline actually encompasses. This brief personal perspective starts by outlining the asthetic ...qualities that motivate systems biologists, discusses which activities do not belong to the core of systems biology, and finally explores the crucial link with synthetic biology. It concludes by attempting to define systems biology as the research endeavor that aims at providing the scientific foundation for successful synthetic biology.
Motivation: The proliferation of public data repositories creates a need for meta-analysis methods to efficiently evaluate, integrate and validate related datasets produced by independent groups. A ...t-based approach has been proposed to integrate effect size from multiple studies by modeling both intra- and between-study variation. Recently, a non-parametric ‘rank product’ method, which is derived based on biological reasoning of fold-change criteria, has been applied to directly combine multiple datasets into one meta study. Fisher's Inverse χ2 method, which only depends on P-values from individual analyses of each dataset, has been used in a couple of medical studies. While these methods address the question from different angles, it is not clear how they compare with each other. Results: We comparatively evaluate the three methods; t-based hierarchical modeling, rank products and Fisher's Inverse χ2 test with P-values from either the t-based or the rank product method. A simulation study shows that the rank product method, in general, has higher sensitivity and selectivity than the t-based method in both individual and meta-analysis, especially in the setting of small sample size and/or large between-study variation. Not surprisingly, Fisher's χ2 method highly depends on the method used in the individual analysis. Application to real datasets demonstrates that meta-analysis achieves more reliable identification than an individual analysis, and rank products are more robust in gene ranking, which leads to a much higher reproducibility among independent studies. Though t-based meta-analysis greatly improves over the individual analysis, it suffers from a potentially large amount of false positives when P-values serve as threshold. We conclude that careful meta-analysis is a powerful tool for integrating multiple array studies. Contact: fxhong@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To ...find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.
The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics ...studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable.
We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results.
RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ).
rainer.breitling@manchester.ac.uk.
Supplementary data are available at Bioinformatics online.
Large microarray datasets have enabled gene regulation to be studied through coexpression analysis. While numerous methods have been developed for identifying differentially expressed genes between ...two conditions, the field of differential coexpression analysis is still relatively new. More specifically, there is so far no sensitive and untargeted method to identify gene modules (also known as gene sets or clusters) that are differentially coexpressed between two conditions. Here, sensitive and untargeted means that the method should be able to construct de novo modules by grouping genes based on shared, but subtle, differential correlation patterns.
We present DiffCoEx, a novel method for identifying correlation pattern changes, which builds on the commonly used Weighted Gene Coexpression Network Analysis (WGCNA) framework for coexpression analysis. We demonstrate its usefulness by identifying biologically relevant, differentially coexpressed modules in a rat cancer dataset.
DiffCoEx is a simple and sensitive method to identify gene coexpression differences between multiple conditions.
As the field of synthetic biology is developing, the prospects for de novo design of biosynthetic pathways are becoming more and more realistic. Hence, there is an increasing need for computational ...tools that can support these efforts. A range of algorithms has been developed that can be used to identify all possible metabolic pathways and their corresponding enzymatic parts. These can then be ranked according to various properties and modelled in an organism-specific context. Finally, design software can aid the biologist in the integration of a selected pathway into smartly regulated transcriptional units. Here, we review key existing tools and offer suggestions for how informatics can help to shape the future of synthetic microbiology.
One of the main objectives in the analysis of microarray experiments is the identification of genes that are differentially expressed under two experimental conditions. This task is complicated by ...the noisiness of the data and the large number of genes that are examined simultaneously. Here, we present a novel technique for identifying differentially expressed genes that does not originate from a sophisticated statistical model but rather from an analysis of biological reasoning. The new technique, which is based on calculating rank products (RP) from replicate experiments, is fast and simple. At the same time, it provides a straightforward and statistically stringent way to determine the significance level for each gene and allows for the flexible control of the false-detection rate and familywise error rate in the multiple testing situation of a microarray experiment. We use the RP technique on three biological data sets and show that in each case it performs more reliably and consistently than the non-parametric
t-test variant implemented in Tusher et al.'s significance analysis of microarrays (SAM). We also show that the RP results are reliable in highly noisy data. An analysis of the physiological function of the identified genes indicates that the RP approach is powerful for identifying biologically relevant expression changes. In addition, using RP can lead to a sharp reduction in the number of replicate experiments needed to obtain reproducible results.