Towards continuous consistency axiom Kłopotek, Mieczysław A.; Kłopotek, Robert A.
Applied intelligence (Dordrecht, Netherlands),
03/2023, Letnik:
53, Številka:
5
Journal Article
Recenzirano
Odprti dostop
It is shown for the first time in this paper, that Kleinberg’s (
2002
) (self-contradictory) axiomatic system for distance-based clustering fails (that is one of the data transforming axioms, ...consistency axiom, turns out to be identity transformation) in fixed-dimensional Euclidean space due to the consistency axiom limitations and that its replacement with inner-consistency or outer consistency does not help if continuous data transformations are required. Therefore we formulate a new, sound axiomatic framework for cluster analysis in the fixed dimensional Euclidean space, suitable for
k
-means like algorithms. The system incorporates centric consistency axiom and motion consistency axiom which induce clustering preserving transformations useful e.g. for deriving new labelled sets for testing clustering procedures. It is suitable for continuous data transformations so that labelled data with small perturbations can be derived. Unlike Kleinberg’s consistency, the new axioms do not lead the data outside of Euclidean space nor cause increase in data dimensionality. Our cluster preserving transformations have linear complexity in data transformation and checking. They are in practice less restrictive, less rigid than Kleinberg’s consistency as they do not enforce inter-cluster distance increase and inner cluster distance decrease when performing clustering preserving transformation.
Kleinberg introduced an axiomatic system for clustering functions. Out of three axioms, he proposed, two (scale invariance and consistency) are concerned with data transformations that should produce ...the same clustering under the same clustering function. The so-called consistency axiom provides the broadest range of transformations of the data set. Kleinberg claims that one of the most popular clustering algorithms,
k
-means does not have the property of consistency. We challenge this claim by pointing at invalid assumptions of his proof (infinite dimensionality) and show that in one dimension in Euclidean space the
k
-means algorithm has the consistency property. We also prove that in higher dimensional space,
k
-means is, in fact, inconsistent. This result is of practical importance when choosing testbeds for implementation of clustering algorithms while it tells under which circumstances clustering after consistency transformation shall return the same clusters. Two types of remedy are proposed: gravitational consistency property and dataset consistency property which both hold for
k
-means and hence are suitable when developing the mentioned testbeds.
In a former paper 1 the concept of Bipartite PageRank was introduced and a theorem on the limit of authority flowing between nodes for personalized PageRank has been generalized. In this paper we ...want to extend those results to multimodal networks. In particular we deal with a hypergraph type that may be used for describing multimodal network where a hyperlink connects nodes from each of the modalities. We introduce a generalisation of PageRank for such graphs and define the respective random walk model that can be used for computations. We state and prove theorems on the limit of outflow of authority for cases where individual modalities have identical and distinct damping factors.
This paper poses the question of whether or not the usage of the kernel trick is justified. We investigate it for the special case of its usage in the kernel
-means algorithm. Kernel-
-means is a ...clustering algorithm, allowing clustering data in a similar way to
-means when an embedding of data points into Euclidean space is not provided and instead a matrix of “distances” (dissimilarities) or similarities is available. The kernel trick allows us to by-pass the need of finding an embedding into Euclidean space. We show that the algorithm returns wrong results if the embedding actually does not exist. This means that the embedding must be found prior to the usage of the algorithm. If it is found, then the kernel trick is pointless. If it is not found, the distance matrix needs to be repaired. But the reparation methods require the construction of an embedding, which first makes the kernel trick pointless, because it is not needed, and second, the kernel-
-means may return different clusterings prior to repairing and after repairing so that the value of the clustering is questioned. In the paper, we identify a distance repairing method that produces the same clustering prior to its application and afterwards and does not need to be performed explicitly, so that the embedding does not need to be constructed explicitly. This renders the kernel trick applicable for kernel-
-means.
This paper performs an investigation of Kleinberg’s axioms (from both an intuitive and formal standpoint) as they relate to the well-known
k
-mean clustering method. The axioms, as well as a novel ...variations thereof, are analyzed in Euclidean space. A few natural properties are proposed, resulting in
k
-means satisfying the intuition behind Kleinberg’s axioms (or, rather, a small, and natural variation on that intuition). In particular, two variations of Kleinberg’s consistency property are proposed, called centric consistency and motion consistency. It is shown that these variations of consistency are satisfied by k-means.
Model selection and variable importance assessment in high-dimensional regression are among the most important tasks in contemporary applied statistics. In our procedure, implemented in the package ...regRSM, the Random Subspace Method (RSM) is used to construct a variable importance measure. The variables are ordered with respect to the measures computed in the first step using the RSM and then, from the hierarchical list of models given by the ordering, the final subset of variables is chosen using information criteria or validation set. Modifications of the original method such as the weighted Random Subspace Method and the version with initial screening of redundant variables are discussed. We developed parallel implementations which enable to reduce the computation time significantly. In this paper, we give a brief overview of the methodology, demonstrate the package’s functionality and present a comparative study of the proposed algorithm and the competitive methods like lasso or CAR scores. In the performance tests the computational times for parallel implementations are compared.
The paper points at the grieving problems implied by the richness axiom in the Kleinberg's axiomatic system and suggests resolutions. The richness induces learnability problem in general and leads to ...conflicts with consistency axiom. As a resolution, learnability constraints and usage of centric consistency or restriction of the domain of considered clusterings to super-ball-clusterings is proposed.
This paper describes the modeling of social networks subject to a recommendation. The Cold Start User-Item Model (CSUIM) of a bipartite graph is considered, which simulates bipartite graph growth ...based on several parameters. An algorithm is proposed to compute parameters of this model with desired properties. The primary desired property is that the generated graph has similar graph metrics. The next is a change in our graph growth process due to recommendations. The meaning of CSUI model parameters in the recommendation process is described. We make several simulations generating networks from the CSUI model to verify theoretical properties. Also, proposed methods are tested on real-life networks. We prove that the CSUIM model of bipartite graphs is very flexible and can be applied to many different problems. We also show that the parameters of this model can be easily obtained from an unknown bipartite graph.
This paper describes the modeling of social networks subject to a recommendation. The Cold Start User-Item Model (CSUIM) of a bipartite graph is considered, which simulates bipartite graph growth ...based on several parameters. An algorithm is proposed to compute parameters of this model with desired properties. The primary desired property is that the generated graph has similar graph metrics. The next is a change in our graph growth process due to recommendations. The meaning of CSUI model parameters in the recommendation process is described. We make several simulations generating networks from the CSUI model to verify theoretical properties. Also, proposed methods are tested on real-life networks. We prove that the CSUIM model of bipartite graphs is very flexible and can be applied to many different problems. We also show that the parameters of this model can be easily obtained from an unknown bipartite graph.