Abstract
Motivation
Protein–protein interactions (PPIs) play a key role in diverse biological processes but only a small subset of the interactions has been experimentally identified. Additionally, ...high-throughput experimental techniques that detect PPIs are known to suffer various limitations, such as exaggerated false positives and negatives rates. The semantic similarity derived from the Gene Ontology (GO) annotation is regarded as one of the most powerful indicators for protein interactions. However, while computational approaches for prediction of PPIs have gained popularity in recent years, most methods fail to capture the specificity of GO terms.
Results
We propose TransformerGO, a model that is capable of capturing the semantic similarity between GO sets dynamically using an attention mechanism. We generate dense graph embeddings for GO terms using an algorithmic framework for learning continuous representations of nodes in networks called node2vec. TransformerGO learns deep semantic relations between annotated terms and can distinguish between negative and positive interactions with high accuracy. TransformerGO outperforms classic semantic similarity measures on gold standard PPI datasets and state-of-the-art machine-learning-based approaches on large datasets from Saccharomyces cerevisiae and Homo sapiens. We show how the neural attention mechanism embedded in the transformer architecture detects relevant functional terms when predicting interactions.
Availability and implementation
https://github.com/Ieremie/TransformerGO.
Supplementary information
Supplementary data are available at Bioinformatics online.
Abstract
Motivation
Protein language models (PLMs), which borrowed ideas for modelling and inference from natural language processing, have demonstrated the ability to extract meaningful ...representations in an unsupervised way. This led to significant performance improvement in several downstream tasks. Clustering amino acids based on their physical–chemical properties to achieve reduced alphabets has been of interest in past research, but their application to PLMs or folding models is unexplored.
Results
Here, we investigate the efficacy of PLMs trained on reduced amino acid alphabets in capturing evolutionary information, and we explore how the loss of protein sequence information impacts learned representations and downstream task performance. Our empirical work shows that PLMs trained on the full alphabet and a large number of sequences capture fine details that are lost in alphabet reduction methods. We further show the ability of a structure prediction model(ESMFold) to fold CASP14 protein sequences translated using a reduced alphabet. For 10 proteins out of the 50 targets, reduced alphabets improve structural predictions with LDDT-Cα differences of up to 19%.
Availability and implementation
Trained models and code are available at github.com/Ieremie/reduced-alph-PLM.
Endometriosis is a frequently occurring disease in women, which seriously affects their quality of life. However, its etiology and pathogenesis are still unclear.
To identify key genes/pathways ...involved in the pathogenesis of endometriosis, we recruited 3 raw microarray datasets (GSE11691, GSE7305, and GSE12768) from Gene Expression Omnibus database (GEO), which contain endometriosis tissues and normal endometrial tissues. We then performed in-depth bioinformatic analysis to determine differentially expressed genes (DEGs), followed by gene ontology (GO), Hallmark pathway enrichment and protein-protein interaction (PPI) network analysis. The findings were further validated by immunohistochemistry (IHC) staining in endometrial tissues from endometriosis or control patients.
We identified 186 DEGs, of which 118 were up-regulated and 68 were down-regulated. The most enriched DEGs in GO functional analysis were mainly associated with cell adhesion, inflammatory response, and extracellular exosome. We found that epithelial-mesenchymal transition (EMT) ranked first in the Hallmark pathway enrichment. EMT may potentially be induced by inflammatory cytokines such as CXCL12. IHC confirmed the down-regulation of E-cadherin (CDH1) and up-regulation of CXCL12 in endometriosis tissues.
Utilizing bioinformatics and patient samples, we provide evidence of EMT in endometriosis. Elucidating the role of EMT will improve the understanding of the molecular mechanisms involved in the development of endometriosis.
Phosphatase and tensin homolog (PTEN) is a tumor suppressor gene and has a role in inhibiting the oncogenic AKT signaling pathway by dephosphorylating phosphatidylinositol 3,4,5-triphosphate (PIP3) ...into phosphatidylinositol 4,5-bisphosphate (PIP2). The function of PTEN is regulated by different mechanisms and inactive PTEN results in aggressive tumor phenotype and tumorigenesis. Identifying targeted therapies for inactive tumor suppressor genes such as PTEN has been challenging as it is difficult to restore the tumor suppressor functions. Therefore, focusing on the downstream signaling pathways to discover a targeted therapy for inactive tumor suppressor genes has highlighted the importance of synthetic lethality studies. This review focused on the potential synthetic lethality genes discovered in PTEN-inactive cancer types. These discovered genes could be potential targeted therapies for PTEN-inactive cancer types and may improve the treatment response rates for aggressive types of cancer.
Genetic P300/CBP-associated factor (PCAF) variation affects restenosis-risk in patients. PCAF has lysine acetyltransferase activity and promotes nuclear factor kappa-beta (NFκB)-mediated ...inflammation, which drives post-interventional intimal hyperplasia development. We studied the contributing role of PCAF in post-interventional intimal hyperplasia.
PCAF contribution to inflammation and intimal hyperplasia was assessed in leukocytes, macrophages and vascular smooth muscle cells (vSMCs) in vitro and in a mouse model for intimal hyperplasia, in which a cuff is placed around the femoral artery. PCAF deficiency downregulate CCL2, IL-6 and TNF-alpha expression, as demonstrated on cultured vSMCs, leukocytes and macrophages. PCAF KO mice showed a 71.8% reduction of vSMC-rich intimal hyperplasia, a 73.4% reduction of intima/media ratio and a 63.7% reduction of luminal stenosis after femoral artery cuff placement compared to wild type (WT) mice. The association of PCAF and vascular inflammation was further investigated using the potent natural PCAF inhibitor garcinol. Garcinol treatment reduced CCL2 and TNF-alpha expression, as demonstrated on cultured vSMCs and leukocytes. To assess the effect of garcinol treatment on vascular inflammation we used hypercholesterolemic ApoE*3-Leiden mice. After cuff placement, garcinol treatment resulted in reduced arterial leukocyte and macrophage adherence and infiltration after three days compared to untreated animals.
These results identify a vital role for the lysine acetyltransferase PCAF in the regulation of local inflammation after arterial injury and likely the subsequent vSMC proliferation, responsible for intimal hyperplasia.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Ubiquitin-specific protease (USP7), also known as Herpesvirus-associated ubiquitin-specific protease (HAUSP), is a deubiquitinase. There has been significant recent attention on USP7 following the ...discovery that USP7 is a key regulator of the p53-MDM2 pathway. The USP7 protein is 130 kDa in size and has multiple domains which bind to a diverse set of proteins. These interactions mediate key developmental and homeostatic processes including the cell cycle, immune response, and modulation of transcription factor and epigenetic regulator activity and localization. USP7 also promotes carcinogenesis through aberrant activation of the Wnt signalling pathway and stabilization of HIF-1α. These findings have shown that USP7 may induce tumour progression and be a therapeutic target. Together with interest in developing USP7 as a target, several studies have defined new protein interactions and the regulatory networks within which USP7 functions. In this review, we focus on the protein interactions of USP7 that are most important for its cancer-associated roles.
PIK3CA, which encodes the p110α catalytic subunit of phosphatidylinositol 3-kinase α, is frequently mutated in human cancers. Most of these mutations occur at two hot-spots: E545K and H1047R located ...in the helical domain and the kinase domain, respectively. Here, we report that p110α E545K, but not p110α H1047R, gains the ability to associate with IRS1 independent of the p85 regulatory subunit, thereby rewiring this oncogenic signaling pathway. Disruption of the IRS1-p110α E545K interaction destabilizes the p110α protein, reduces AKT phosphorylation, and slows xenograft tumor growth of a cancer cell line expressing p110α E545K. Moreover, a hydrocarbon-stapled peptide that disrupts this interaction inhibits the growth of tumors expressing p110α E545K.
► p110α E545K helical domain mutant protein directly interacts with IRS1 ► IRS1-p110α E545K interaction stabilizes p110α E545K and brings it to the membrane ► IRS1 mutants that do not interact with p110α E545K reduce oncogenicity ► A peptide that disrupts IRS1-p110α E545K interaction inhibits tumor growth in vivo
High-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the observation that the products of genes associated with ...similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Information flow based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths.
We demonstrate that existing methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Motivated by this observation, we propose several statistical adjustment methods to account for the degree distribution of known disease and candidate genes, using a PPI network with associated confidence scores for interactions. We show that the proposed methods can detect loosely connected disease genes that are missed by existing approaches, however, this improvement might come at the price of more false negatives for highly connected genes. Consequently, we develop a suite called DADA, which includes different uniform prioritization methods that effectively integrate existing approaches with the proposed statistical adjustment strategies. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that DADA outperforms existing methods in prioritizing candidate disease genes.
These results demonstrate the importance of employing accurate statistical models and associated adjustment methods in network-based disease gene prioritization, as well as other network-based functional inference applications. DADA is implemented in Matlab and is freely available at http://compbio.case.edu/dada/.