Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is
NP
...-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.
We introduce a nonlocal discrete regularization framework on weighted graphs of the arbitrary topologies for image and manifold processing. The approach considers the problem as a variational one, ...which consists of minimizing a weighted sum of two energy terms: a regularization one that uses a discrete weighted -Dirichlet energy and an approximation one. This is the discrete analogue of recent continuous Euclidean nonlocal regularization functionals. The proposed formulation leads to a family of simple and fast nonlinear processing methods based on the weighted -Laplace operator, parameterized by the degree of regularity, the graph structure and the graph weight function. These discrete processing methods provide a graph-based version of recently proposed semi-local or nonlocal processing methods used in image and mesh processing, such as the bilateral filter, the TV digital filter or the nonlocal means filter. It works with equal ease on regular 2-D and 3-D images, manifolds or any data. We illustrate the abilities of the approach by applying it to various types of images, meshes, manifolds, and data represented as graphs.
•Definition of the equivalence between assignments and edit paths.•Graph edit distance formulation as a quadratic assignment problem.•New quadratic cost function for computing graph edit ...distance.•Improvement of the accuracy of the approximation of graph edit distance.•Approximation computable in reasonable time.
The Graph Edit Distance (GED) is a flexible measure of dissimilarity between graphs which arises in error-correcting graph matching. It is defined from an optimal sequence of edit operations (edit path) transforming one graph into another. Unfortunately, the exact computation of this measure is NP-hard. In the last decade, several approaches were proposed to approximate the GED in polynomial time, mainly by solving linear programming problems. Among them, the bipartite GED received much attention. It is deduced from a linear sum assignment of the nodes of the two graphs, which can be efficiently computed by Hungarian-type algorithms. However, edit operations on nodes and edges are not handled simultaneously, which limits the accuracy of the approximation. To overcome this limitation, we propose to extend the linear assignment model to a quadratic one. This is achieved through the definition of a family of edit paths induced by assignments between nodes. We formally show that the GED, restricted to the paths in this family, is equivalent to a quadratic assignment problem. Since this problem is NP-hard, we propose to compute an approximate solution by adapting two algorithms: Integer Projected Fixed Point method and Graduated Non Convexity and Concavity Procedure. Experiments show that the proposed approach is generally able to reach a more accurate approximation of the exact GED than the bipartite GED, with a computational cost that is still affordable for graphs of non trivial sizes.
Matchings between objects from two datasets, domains, or ontologies have to be computed in various application scenarios. One often used meta-approach—which we call bipartite data matching—is to ...leverage domain knowledge for defining costs between the objects that should be matched, and to then use the classical Hungarian algorithm to compute a minimum cost bipartite matching. In this paper, we introduce and study the problem of enumerating K dissimilar minimum cost bipartite matchings. We formalize this problem, prove that it is NP-hard, and present heuristics based on greedy dynamic programming. The presented enumeration techniques are not only interesting in themselves, but also mitigate an often overlooked shortcoming of bipartite data matching, namely, that it is sensitive w.r.t. the storage order of the input data. Extensive experiments show that our enumeration heuristics clearly outperform existing algorithms in terms of dissimilarity of the obtained matchings, that they are effective at rendering bipartite data matching approaches more robust w.r.t. random storage order, and that they significantly improve the upper bounds of state-of-the art algorithms for graph edit distance computation that are based on bipartite data matching.
Improved local search for graph edit distance Boria, Nicolas; Blumenthal, David B.; Bougleux, Sébastien ...
Pattern recognition letters,
January 2020, 2020-01-00, 20200101, 2020-01, Volume:
129
Journal Article
Peer reviewed
Open access
•We present K-REFINE, a new local search algorithm for upper bounding the graph edit distance.•We present RANDPOST, a framework that generates good initial solutions for local search.•We ...experimentally show that K-REFINE and RANDPOST perform excellently in practice.
The graph edit distance (GED) measures the dissimilarity between two graphs as the minimal cost of a sequence of elementary operations transforming one graph into another. This measure is fundamental in many areas such as structural pattern recognition or classification. However, exactly computing GED is NP-hard. Among different classes of heuristic algorithms that were proposed to compute approximate solutions, local search based algorithms provide the tightest upper bounds for GED. In this paper, we present K-REFINE and RANDPOST. K-REFINE generalizes and improves an existing local search algorithm and performs particularly well on small graphs. RANDPOST is a general warm start framework that stochastically generates promising initial solutions to be used by any local search based GED algorithm. It is particularly efficient on large graphs. An extensive empirical evaluation demonstrates that both K-REFINE and RANDPOST perform excellently in practice.
•The context of Graph Edit Distance Contest (GDC), organized during ICPR2016, is presented.•Eight methods from three research groups are evaluated.•The evaluation Metrics, methods and datasets of GDC ...are described in detail.•A crystal clear picture of the accuracy and speed of each method is provided.•Future challenges and possible tracks in graph edit distance are highlighted.
Graph Distance Contest (GDC) was organized in the context of ICPR 2016. Its main challenge was to inspect and report performances and effectiveness of exact and approximate graph edit distance methods by comparison with a ground truth. This paper presents the context of this competition, the metrics and datasets used for evaluation, and the results obtained by the eight submitted methods. Results are analyzed and discussed in terms of computation time and accuracy. We also highlight the future challenges in graph edit distance regarding both future methods and evaluation metrics. The contest was supported by the Technical Committee on Graph-Based Representations in Pattern Recognition (TC-15) of the International Association of Pattern Recognition (IAPR).
•We propose an efficient procedure for solving the linear sum assignment problem with error-correction.•In contrast to most efficient competitors, our procedure does not impose constraints on the ...costs.•Our procedure is more stable than existing competitors.•Our procedure can be implemented more easily than existing competitors.
We propose an algorithm that efficiently solves the linear sum assignment problem with error-correction and no cost constraints. This problem is encountered for instance in the approximation of the graph edit distance. The fastest currently available solvers for the linear sum assignment problem require the pairwise costs to respect the triangle inequality. Our algorithm is as fast as these algorithms, but manages to drop the cost constraint. The main technical ingredient of our algorithm is a cost-dependent factorization of the node substitutions.
Minimum cost paths have been extensively studied theoretical tools for interactive image segmentation. The existing geodesically linked active contour (GLAC) model, which basically consists of a set ...of vertices connected by paths of minimal cost, blends the benefits of minimal paths and region-based active contours. This results in a closed piecewise-smooth curve, over which an edge or region energy functional can be formulated. As an important shortcoming, the GLAC in its initial formulation does not guarantee the curve to be simple, consistent with respect to the purpose of segmentation. In this paper, we draw our inspiration from the GLAC and other boundary-based interactive segmentation algorithms, in the sense that we aim to extract a contour given a set of user-provided points, by connecting these points using paths. The key idea is to select a combination among a set of possible paths, such that the resulting structure represents a relevant closed curve. Instead of considering minimal paths only, we switch to a more general formulation, which we refer to as
admissible
paths. These basically correspond to the roads travelling along the bottom of distinct valleys between given endpoints. We introduce a novel term to favor the simplicity of the generated contour, as well as a local search method to choose the best combination among possible paths.
In this paper, we present GMG-BCU—a local search algorithm based on block coordinate update for estimating a generalized median graph for a given collection of labeled or unlabeled input graphs. ...Unlike all competitors, GMG-BCU is designed for both discrete and continuous label spaces and can be configured to run in linear time w.r.t. the size of the graph collection whenever median node and edge labels are computable in linear time. These properties make GMG-BCU usable for applications such as differential microbiome data analysis, graph classification, clustering, and indexing. We also prove theoretical properties of generalized median graphs, namely, that they exist under reasonable assumptions which are met in almost all application scenarios, that they are in general non-unique, that they are NP-hard to compute and APX-hard to approximate, and that no polynomial α-approximation exists for any α unless the graph isomorphism problem is in P. Extensive experiments on six different datasets show that our heuristic GMG-BCU always outperforms the state of the art in terms of runtime or quality (on most datasets, both w.r.t. runtime and quality), that it is the only available heuristic which can cope with collections containing several thousands of graphs, and that it shows very promising potential when used for the aforementioned applications. GMG-BCU is freely available on GitHub: https://github.com/dbblumenthal/gedlib/.