Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical ...networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.
Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the ...phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.
Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations ...between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O(3n/3)-time procedure was devised to enumerate all maximal k-partite cliques in a k-partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.