A series of genome-scale algorithms and high-performance implementations is described and shown to be useful in the genetic analysis of gene transcription. With them it is possible to address common ...questions such as: “are the sets of genes co-expressed under one type of conditions the same as those sets co-expressed under another?” A new noise-adaptive graph algorithm, dubbed “paraclique,” is introduced and analyzed for use in biological hypotheses testing. A notion of vertex coverage is also devised, based on vertex-disjoint paths within correlation graphs, and used to determine the identity, proportion and number of transcripts connected to individual phenotypes and quantitative trait loci (QTL) regulatory models. A major goal is to identify which, among a set of candidate genes, are the most likely regulators of trait variation. These methods are applied in an effort to identify multiple-QTL regulatory models for large groups of genetically co-expressed genes, and to extrapolate the consequences of this genetic variation on phenotypes observed across levels of biological scale through the evaluation of vertex coverage. This approach is furthermore applied to definitions of homology-based gene sets, and the incorporation of categorical data such as known gene pathways. In all these tasks discrete mathematics and combinatorial algorithms form organizing principles upon which methods and implementations are based.
Biologists hope to address grand scientific challenges by exploring the abundance of data made available through modern microarray technology and other high-throughput techniques. The impact of this ...data, however, is limited unless researchers can effectively assimilate such complex information and integrate it into their daily research; interactive visualization tools are called for to support the effort. Specifically, typical studies of gene co-expression require novel visualization tools that enable the dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These tools should allow biologists to develop an intuitive understanding of the structure of biological networks and discover genes residing in critical positions in networks and pathways. By using a graph as a universal representation of correlation in gene expression, our system employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool for interacting with gene co-expression data integrates techniques such as: graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by compound queries, dynamic level-of-detail abstraction, and template-based fuzzy classification. We demonstrate our system using a real-world workflow from a large-scale, systems genetics study of mammalian gene coexpression.
A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad array of approaches, from conventional techniques such as k-means and hierarchical ...clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray data that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.Clusters are scored using Jaccard similarity coefficients for the analysis of the positive match of clusters to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.
ISBRA Topics of Interest: gene expression analysis, software tools and applications.
With the goal of discovering genes that contribute to late-onset neurological and ocular disorders and also genes that extend the healthy life span in mammals, we are phenotyping mice carrying new ...mutations induced by the chemical N-ethyl-N-nitrosourea (ENU). The phenotyping plan includes basic behavioral, neurohistological, and vision testing in sibling cohorts of mice aged to 18 months, and then evaluation for markers of growth trajectory and stress response in these same cohorts aged up to 28 months. Statistical outliers are identified by comparison to test results of similar aged cohorts, and potential mutants are recovered for re-aging to confirm heritability of the phenotype.