Link prediction is a crucial aspect of graph machine learning, with applications as diverse as disease prediction, social network recommendations, and drug discovery. It involves the prediction of ...potential new links between nodes within a network. Despite its importance, current models for link prediction exhibit notable limitations. Graph Convolutional Networks have shown high efficiency in link prediction across various datasets. However, they face significant challenges when applied to short-path networks and ego networks, resulting in poor performance. This issue represents a critical area of concern that our work seeks to address. This paper introduces the Node Centrality and Similarity-based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information. The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.
•Introduced Node Centrality and Similarity based Parameterised Model (NCSM) for link prediction.•NCSM uses node centrality and similarity as edge features in a custom Graph Neural Network.•Evaluated NCSM on five benchmark datasets, surpassing models like Graph Convolutional Networks and Variational Graph Autoencoder.•NCSM excels by uniquely combining node measures and effectively utilising network topology.
Unsupervised multi-view bipartite graph clustering (MVBGC) is a fast-growing research, due to promising scalability in large-scale tasks. Although many variants are proposed by various strategies, a ...common design is to construct the bipartite graph directly from the input data, i.e., only consider the unidirectional "encoding" process. However, "encoding-decoding" mechanism is a popular design for deep learning, the most representative one is auto-encoder (AE). Enlightened by this, this paper rethinks existing MVBGC paradigms and transfers the "encoding-decoding" design into graph machine learning, and proposes a novel framework termed auto-encoding multi-view bipartite graph clustering (BGAE), which integrates encoding, bipartite graph construction, and decoding modules in a self-supervised learning manner. The encoding module extracts a latent joint representation from the input data, the bipartite graph construction module learns a bipartite graph with connectivity constraint in latent semantic space, and the decoding module recreates the input data via the bipartite graph. Therefore, our novel BGAE combines representation learning, bipartite graph learning, reconstruction learning, and label inference into a unified framework. All the modules are seamlessly integrated and mutually reinforcing for clustering-friendly purposes. Extensive experiments verify the superiority of our novel design and the significance of "decoding" process. To the best of our knowledge, this is the first attempt to explore "encoding-decoding" design in traditional MVBGC.
Recently, deep clustering utilizing Graph Neural Networks has shown good performance in the graph clustering. However, the structure information of graph was underused in existing deep clustering ...methods. Particularly, the lack of concern on mining different types structure information simultaneously. To tackle with the problem, this paper proposes a Graph Clustering Network with Structure Embedding Enhanced (GC-SEE) which extracts nodes importance-based and attributes importance-based structure information via a feature attention fusion graph convolution module and a graph attention encoder module respectively. Additionally, it captures different orders-based structure information through multi-scale feature fusion. Finally, a self-supervised learning module has been designed to integrate different types structure information and guide the updates of the GC-SEE. The comprehensive experiments on benchmark datasets commonly used demonstrate the superiority of the GC-SEE. The results showcase the effectiveness of the GC-SEE in exploiting multiple types of structure for deep clustering.
•The focus on structural information learned by GCN and GAE varies.•We propose a Graph Clustering Network with Structure Embedding Enhanced (GC-SEE).•GC-SEE integrates different types structure information to enrich the embedding.•A self-supervised loss is designed to achieve clear boundaries and high accuracy.•GC-SEE outperforms the methods using single structure information.
With recent advancements, graph neural networks (GNNs) have shown considerable potential for various graph-related tasks, and their applications have gained considerable attention. However, ...adversarial attacks can significantly degrade the performance of GNNs, hindering their deployment in critical real-world tasks. GNNs must be robust against adversarial attacks, in which imperceptible adversarial perturbations are introduced to induce serious security issues. To achieve this goal, we propose a robust graph convolutional network, ERGCN, for node classification via data enhancement. ERGCN simultaneously utilizes properties from the “data domain” and “model space” as guidance. Based on the feature smoothness assumption, a graph structure enhancement (GSE) mechanism is proposed to improve the structural reliability of input graphs. Moreover, inspired by self-training methods that assign pseudo-labels to unlabeled training samples and use them to optimize the target model iteratively, a reliable node selection metric, model boundary distance (MBD), is defined based on the distance from training samples to model decision boundary. Finally, a self-training-based robust graph convolutional network is proposed for node classification. Extensive experiments on three public datasets demonstrate the superiority of our model over existing state-of-the-art methods. Our study provides a solution for trustworthy graph machine learning systems in adversarial environments. The code is available at https://github.com/star4455/ERGCN.
Numerous studies of emerging species have identified genomic “islands” of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, ...with some signs pointing toward “speciation genes” that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last ∼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation.
Graph Neural Networks (GNNs) have been drawing significant attention to representation learning on graphs. Recent works developed frameworks to train very deep GNNs and showed impressive results in ...tasks like point cloud learning and protein interaction prediction. In this work, we study the performance of such deep models in large-scale graphs. In particular, we look at the effect of adequately choosing an aggregation function on deep models. We find that GNNs are very sensitive to the choice of aggregation functions (e.g. mean, max, and sum) when applied to different datasets. We systematically study and propose to alleviate this issue by introducing a novel class of aggregation functions named Generalized Aggregation Functions. The proposed functions extend beyond commonly used aggregation functions to a wide range of new permutation-invariant functions. Generalized Aggregation Functions are fully differentiable, where their parameters can be learned in an end-to-end fashion to yield a suitable aggregation function for each task. We show that equipped with the proposed aggregation functions, deep residual GNNs outperform state-of-the-art in several benchmarks from Open Graph Benchmark (OGB) across tasks and domains. The code and models for reproducing our experiments are available at https://github.com/lightaime/deep_gcns_torch/tree/master/examples/ogb.
Ecological security pattern (ESP) provide a solution to balance ecosystems and urban development. The modifiable areal unit problem (MAUP) is a well-known geographical issue that affects the ...understanding of ecological processes and the results of spatial modeling. However, only a few studies have explicitly clarified the effects of MAUP on ESP. Herein, we propose new methods for identifying ecological sources and mapping the resistance surface, as well as a framework for analyzing the effects of MAUP on ESP. Our results indicate that (1) the datasets used for identifying ecological sources should possess a grain size of smaller than 300 × 300 m2. (2) The resistance surface directly influenced the results of the ecological corridors and nodes. (3) MAUP could generate uncertainties in ESP policymaking. Additionally, this study developed comprehensive network analysis tools based on graph machine learning to analyze the effect of MAUP on ESP policymaking, which can provide important landscape planning. The network analysis approaches may help policymakers gain a comprehensive understanding of the ESP network to improve the credibility of policies.
•Identifying ecological sources considering ecological quality and human disturbance.•Comprehensive network analysis tools were developed using graph machine learning.•Analysing the effects of modifiable areal unit problem on ecological security pattern.•Modifiable areal unit problem creates uncertainty in ecological planning policy.
Accessing unstructured information through BIM-based platforms is essential for achieving integrated analytics especially, for facilities management where a wide range of unstructured data is ...required for effective decision-making. Previous research has explored linking textual data with BIM through the use of relational schemas, concept mapping, and ontologies. However, these methods are structured and static, failing to match the non-parametric and evolutionary nature of unstructured data. This study proposes an alternative approach where concept networks are used to represent the IFC data model. Using graph theory and natural language processing a classifier is trained for assigning text documents to their relevant IFC classes based on their conceptual network distances. Given that the classifier is trained on conceptual distances rather than the concepts themselves, it has the potential to be generalized to unseen classes with unseen concepts. Both the performance and the generalizability of the approach are evaluated in a case study.
•A new approach for linking text to BIM is proposed.•The approach is based on concept networks, and text classification.•The concept networks can dynamically change and capture unstructured data patterns.•The classifier uses graph-distances as features to learn relatedness.•The classifier can be generalized to unseen classes without retraining.
•Automated algorithms have been proposed for retrofitting HENs.•The relation between internal heat recovery and graph metrics has been studied.•The Kalina cycle's thermal efficiency increased by 9.7% ...compared to the basic cycle.•The verification of this method has been conducted.
Unsupervised graph machine learning provides a powerful framework for modeling and analyzing heat exchanger networks (HENs). This paper proposes a graph-based thermally guided path search (TGPS) method that systematically identifies and evaluates retrofit options to enhance the thermal performance of HENs. The method represents the HEN as a bipartite graph and uses automated algorithms to search for feasible heat integration paths. Thermodynamically inconsistent paths are filtered out based on temperature feasibility rules. The resulting retrofit options are evaluated using graphical metrics like betweenness centrality and cluster coefficients, as well as thermal performance indicators. A re-routing technique is introduced to address temperature mismatch issues for serial heat exchanger connections. When applied to a Kalina power cycle, the thermal efficiency of the optimum configuration is increased by 9.7%. This method is compared with both pinch analysis and the Energy Transfer Diagram approach, and it is thoroughly tested and verified for an ammonia-water absorption refrigeration cycle as well. This methodology provides an effective tool for retrofitting existing HENs while adhering to thermodynamic principles.
With the recent COVID-19 outbreak, we have assisted to the development of new epidemic models or the application of existing methodologies to predict the virus spread and to analyze how the different ...lock-down strategies can effectively influence the epidemic diffusion. In this paper, we propose a novel machine learning based framework able to estimate the parameters of any epidemiological model, such as contact rates and recovery rates, based on static and dynamic features of places. In particular, we model mobility data through a graph series whose spatial and temporal features are investigated by combining Graph Convolutional Neural Networks (GCNs) and Long short-term memories (LSTMs) in order to infer the parameters of SIR and SIRD models. We evaluate the proposed approach using data related to the COVID-19 dynamics in Italy and we compare the forecasts of the trained model with available data about the epidemic spread.