Interacting proteins can contact with each other at three different levels: by a domain binding to another domain, by a domain binding to a short protein motif, or by a motif binding to another ...motif. In our previous work, we proposed an approach to predict motif-motif binding sites for the yeast interactome by contrasting high-quality positive interactions and high-quality non-interactions using a simple statistical analysis. Here, we extend this idea to more comprehensively infer binding sites, including domain-domain, domain-motif, and motif-motif interactions. In this study, we integrated 2854 yeast proteins that undergo 13 531 high-quality interactions and 3491 yeast proteins undergoing 578 459 high-quality non-interactions. Overall, we found 6315 significant binding site pairs involving 2371 domains and 637 motifs. Benchmarked using the iPfam, DIP CORE, and MIPS, our inferred results are reliable. Interestingly, some of our predicted binding site pairs may, at least in the yeast genome, guide researchers to assay novel protein-protein interactions by mutagenesis or other experiments. Our work demonstrates that by inferring significant protein-protein binding sites at an aggregate level combining domain-domain, domain-motif and motif-motif levels based on high-quality positive and negative datasets, this method may be capable of identifying the binding site pairs that mediate protein-protein interactions.
Here, we tried to infer protein binding sites at an aggregate aspect combining domain-domain, domain-motif, and motif-motif levels.
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments ...at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Domain-domain interactions are a critical type of the mechanisms mediating protein-protein interactions (PPIs). For a given protein domain, its ability to combine with distinct domains is usually ...referred to as promiscuity or versatility. Interestingly, a previous study has reported that a domain's promiscuity may reflect its ability to interact with other domains in human proteins. In this work, promiscuous domains were first identified from the yeast genome. Then, we sought to determine what roles promiscuous domains might play in the PPI network. Mapping the promiscuous domains onto the proteins in this network revealed that, consistent with the previous knowledge, the hub proteins were significantly enriched with promiscuous domains. We also found that the set of hub proteins were not the same set as those proteins with promiscuous domains, although there was some overlap. Analysis of the topological properties of this yeast PPI network showed that the characteristic path length of the network increased significantly after deleting proteins with promiscuous domains. This indicated that communication between two proteins was longer and the network stability decreased. These observations suggested that, as the hub proteins, proteins with promiscuous domains might play a role in maintaining network stability. In addition, functional analysis revealed that proteins with promiscuous domains mainly participated in the "Folding, Sorting, and Degradation" and "Replication and Repair" biological pathways, and that they significantly execute key molecular functions, such as "nucleoside-triphosphatase activity (GO:0017111)."
Domaindomain interactions are a critical type of the mechanisms mediating proteinprotein interactions (PPIs). For a given protein domain, its ability to combine with distinct domains is usually ...referred to as promiscuity or versatility. Interestingly, a previous study has reported that a domain's promiscuity may reflect its ability to interact with other domains in human proteins. In this work, promiscuous domains were first identified from the yeast genome. Then, we sought to determine what roles promiscuous domains might play in the PPI network. Mapping the promiscuous domains onto the proteins in this network revealed that, consistent with the previous knowledge, the hub proteins were significantly enriched with promiscuous domains. We also found that the set of hub proteins were not the same set as those proteins with promiscuous domains, although there was some overlap. Analysis of the topological properties of this yeast PPI network showed that the characteristic path length of the network increased significantly after deleting proteins with promiscuous domains. This indicated that communication between two proteins was longer and the network stability decreased. These observations suggested that, as the hub proteins, proteins with promiscuous domains might play a role in maintaining network stability. In addition, functional analysis revealed that proteins with promiscuous domains mainly participated in the Folding, Sorting, and Degradation and Replication and Repair biological pathways, and that they significantly execute key molecular functions, such as nucleoside-triphosphatase activity (GO:0017111).
An attempt to illuminate the roles that promiscuous domains may play in the yeast proteinprotein interaction network.
Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental ...questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.
Background Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in ...large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS), which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC). Results and Conclusions Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS). HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.
Escherichia coli lab strains K-12 GM4792 Lac(+) and GM4792 Lac(-) carry opposite lactose markers, which are useful for distinguishing evolved lines as they produce different colored colonies. The two ...closely related strains are chosen as ancestors for our ongoing studies of experimental evolution. Here, we describe the genome sequences, annotation, and features of GM4792 Lac(+) and GM4792 Lac(-). GM4792 Lac(+) has a 4,622,342-bp long chromosome with 4,061 protein-coding genes and 83 RNA genes. Similarly, the genome of GM4792 Lac(-) consists of a 4,621,656-bp chromosome containing 4,043 protein-coding genes and 74 RNA genes. Genome comparison analysis reveals that the differences between GM4792 Lac(+) and GM4792 Lac(-) are minimal and limited to only the targeted lac region. Moreover, a previous study on competitive experimentation indicates the two strains are identical or nearly identical in survivability except for lactose utilization in a nitrogen-limited environment. Therefore, at both a genetic and a phenotypic level, GM4792 Lac(+) and GM4792 Lac(-), with opposite neutral markers, are ideal systems for future experimental evolution studies.