Biological ontologies are used to organize, curate and interpret the vast quantities of data arising from biological experiments. While this works well when using a single ontology, integrating ...multiple ontologies can be problematic, as they are developed independently, which can lead to incompatibilities. The Open Biological and Biomedical Ontologies (OBO) Foundry was created to address this by facilitating the development, harmonization, application and sharing of ontologies, guided by a set of overarching principles. One challenge in reaching these goals was that the OBO principles were not originally encoded in a precise fashion, and interpretation was subjective. Here, we show how we have addressed this by formally encoding the OBO principles as operational rules and implementing a suite of automated validation checks and a dashboard for objectively evaluating each ontology's compliance with each principle. This entailed a substantial effort to curate metadata across all ontologies and to coordinate with individual stakeholders. We have applied these checks across the full OBO suite of ontologies, revealing areas where individual ontologies require changes to conform to our principles. Our work demonstrates how a sizable, federated community can be organized and evaluated on objective criteria that help improve overall quality and interoperability, which is vital for the sustenance of the OBO project and towards the overall goals of making data Findable, Accessible, Interoperable, and Reusable (FAIR). Database URL http://obofoundry.org/.
ToxoDB (http://ToxoDB.org) is a genome and functional genomic database for the protozoan parasite Toxoplasma gondii. It incorporates the sequence and annotation of the T. gondii ME49 strain, as well ...as genome sequences for the GT1, VEG and RH (Chr Ia, Chr Ib) strains. Sequence information is integrated with various other genomic-scale data, including community annotation, ESTs, gene expression and proteomics data. ToxoDB has matured significantly since its initial release. Here we outline the numerous updates with respect to the data and increased functionality available on the website.
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. ...Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.
PlasmoDB (http://PlasmoDB.org) is a functional genomic database for Plasmodium spp. that provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB ...belongs to a family of genomic resources that are housed under the EuPathDB (http://EuPathDB.org) Bioinformatics Resource Center (BRC) umbrella. The latest release, PlasmoDB 5.5, contains numerous new data types from several broad categories--annotated genomes, evidence of transcription, proteomics evidence, protein function evidence, population biology and evolution. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page. Search results can be downloaded with associated functional data and registered users can store their query history for future retrieval or analysis.
FungiDB (fungidb.org) is a free online resource for data mining and functional genomics analysis for fungal and oomycete species. FungiDB is part of the Eukaryotic Pathogen Genomics Database Resource ...(EuPathDB, eupathdb.org) platform that integrates genomic, transcriptomic, proteomic, and phenotypic datasets, and other types of data for pathogenic and nonpathogenic, free-living and parasitic organisms. FungiDB is one of the largest EuPathDB databases containing nearly 100 genomes obtained from GenBank,
Genome Database (AspGD), The Broad Institute, Joint Genome Institute (JGI), Ensembl, and other sources. FungiDB offers a user-friendly web interface with embedded bioinformatics tools that support custom in silico experiments that leverage FungiDB-integrated data. In addition, a Galaxy-based workspace enables users to generate custom pipelines for large-scale data analysis (e.g., RNA-Seq, variant calling, etc.). This review provides an introduction to the FungiDB resources and focuses on available features, tools, and queries and how they can be used to mine data across a diverse range of integrated FungiDB datasets and records.
Endothelial function and dysfunction are central to the focal origin and regional development of atherosclerosis; however, an in vivo endothelial phenotypic footprint of susceptibility to ...atherosclerosis preceding pathological change remains elusive.
To conduct a comparative multi-site genomics study of arterial endothelial phenotype in atherosusceptible and atheroprotected regions.
Transcript profiles of freshly isolated endothelial cells from 7 discrete arterial regions in normal swine were analyzed to determine the steady state in vivo endothelial phenotypes in regions of varying susceptibilities to atherosclerosis. The most abundant common feature of the endothelium of all atherosusceptible regions was the upregulation of genes associated with endoplasmic reticulum (ER) stress. The unfolded protein response pathway, induced by ER stress, was therefore investigated in detail in endothelium of the atherosusceptible aortic arch and was found to be partially activated. ER transmembrane signal transducers IRE1alpha and ATF6alpha and their downstream effectors, but not PERK, were activated concomitant with a higher transcript expression of protein folding enzymes and chaperones, indicative of ER stress in vivo.
The findings demonstrate the prevalence of chronic endothelial ER stress and activated unfolded protein response in vivo at atherosusceptible arterial sites. We propose that chronic localized biological stress is linked to spatial susceptibility of the endothelium to the initiation of atherosclerosis.
The OrthoMCL database (http://orthomcl.cbil.upenn.edu) houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, ...and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. A total of 511 797 proteins (81.6% of the total dataset) were clustered into 70 388 ortholog groups. The ortholog database may be queried based on protein or group accession numbers, keyword descriptions or BLAST similarity. Ortholog groups exhibiting specific phyletic patterns may also be identified, using either a graphical interface or a text-based Phyletic Pattern Expression grammar. Information for ortholog groups includes the phyletic profile, the list of member proteins and a multiple sequence alignment, a statistical summary and graphical view of similarities, and a graphical representation of domain architecture. OrthoMCL software, the entire FASTA dataset employed and clustering results are available for download. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available.
The transcriptional mechanisms by which temporary exposure to developmental signals instigates adipocyte differentiation are unknown. During early adipogenesis, we find transient enrichment of the ...glucocorticoid receptor (GR), CCAAT/enhancer-binding protein beta (CEBPbeta), p300, mediator subunit 1, and histone H3 acetylation near genes involved in cell proliferation, development, and differentiation, including the gene encoding the master regulator of adipocyte differentiation, peroxisome proliferator-activated receptor gamma2 (PPARgamma2). Occupancy and enhancer function are triggered by adipogenic signals, and diminish upon their removal. GR, which is important for adipogenesis but need not be active in the mature adipocyte, functions transiently with other enhancer proteins to propagate a new program of gene expression that includes induction of PPARgamma2, thereby providing a memory of the earlier adipogenic signal. Thus, the conversion of preadipocyte to adipocyte involves the formation of an epigenomic transition state that is not observed in cells at the beginning or end of the differentiation process.
In the arterial circulation, regions of disturbed flow (DF), which are characterized by flow separation and transient vortices, are susceptible to atherogenesis, whereas regions of undisturbed ...laminar flow (UF) appear protected. Coordinated regulation of gene expression by endothelial cells (EC) may result in differing regional phenotypes that either favor or inhibit atherogenesis. Linearly amplified RNA from freshly isolated EC of DF (inner aortic arch) and UF (descending thoracic aorta) regions of normal adult pigs was used to profile differential gene expression reflecting the steady state in vivo. By using human cDNA arrays, ≈2,000 putatively differentially expressed genes were identified through false-discovery-rate statistical methods. A sampling of these genes was validated by quantitative realtime PCR and/or immunostaining en face. Biological pathway analysis revealed that in DF there was up-regulation of several broad-acting inflammatory cytokines and receptors, in addition to elements of the NF-κB system, which is consistent with a proinflammatory phenotype. However, the NF-κB complex was predominantly cytoplasmic (inactive) in both regions, and no significant differences were observed in the expression of key adhesion molecules for inflammatory cells associated with early atherogenesis. Furthermore, there was no histological evidence of inflammation. Protective profiles were observed in DF regions, notably an enhanced antioxidative gene expression. This study provides a public database of regional EC gene expression in a normal animal, implicates hemodynamics as a contributory mechanism to athero-susceptibility, and reveals the coexistence of pro- and antiatherosclerotic transcript profiles in susceptible regions. The introduction of additional risk factors may shift this balance to favor lesion development.
Serum response factor (SRF) binds a 1216-fold degenerate cis element known as the CArG box. CArG boxes are found primarily in muscle- and growth-factor-associated genes although the full spectrum of ...functional CArG elements in the genome (the CArGome) has yet to be defined. Here we describe a genome-wide screen to further define the functional mammalian CArGome. A computational approach involving comparative genomic analyses of human and mouse orthologous genes uncovered >100 hypothetical SRF-dependent genes, including 10 previously identified SRF targets, harboring a conserved CArG element within 4000 bp of the annotated transcription start site (TSS). We PCR-cloned 89 hypothetical SRF targets and subjected each of them to at least two of several validations including luciferase reporter, gel shift, chromatin immunoprecipitation, and mRNA expression following RNAi knockdown of SRF; 60/89 (67%) of the targets were validated. Interestingly, 26 of the validated SRF target genes encode for cytoskeletal/contractile or adhesion proteins. RNAi knockdown of SRF diminishes expression of several SRF-dependent cytoskeletal genes and elicits an attending perturbation in the cytoarchitecture of both human and rodent cells. These data illustrate the power of integrating existing algorithms to interrogate the genome in a relatively unbiased fashion for cis-regulatory element discovery. In this manner, we have further expanded the mammalian CArGome with the discovery of an array of cyto-contractile genes that coordinate normal cytoskeletal homeostasis. We suggest one function of SRF is that of an ancient master regulator of the actin cytoskeleton.