Several conventional clustering methods use the squared
L
2
-norm as the dissimilarity. The squared
L
2
-norm is calculated from only the object coordinates and obtains a linear cluster boundary. To ...extract meaningful cluster partitions from a set of massive objects, it is necessary to obtain cluster partitions that consisting of complex cluster boundaries. In this study, a JS-divergence-based
k
-medoids (JSKMdd) is proposed. In the proposed method, JS-divergence, which is calculated from the object distribution, is considered as the dissimilarity. The object distribution is estimated from kernel density estimation to calculate the dissimilarity based on both the object coordinates and their neighbors. Numerical experiments were conducted using five artificial datasets to verify the effectiveness of the proposed method. In the numerical experiments, the proposed method was compared with the
k
-means clustering,
k
-medoids clustering, and spectral clustering. The results show that the proposed method yields better results in terms of clustering performance than other conventional methods.
Fuzzified Even-Sized Clustering Based on Optimization Kitajima, Kei; Endo, Yasunori; Hamasuna, Yukihiro
Journal of advanced computational intelligence and intelligent informatics,
07/2018, Volume:
22, Issue:
4
Journal Article
Peer reviewed
Open access
Clustering is a method of data analysis without the use of supervised data. Even-sized clustering based on optimization (ECBO) is a clustering algorithm that focuses on cluster size with the ...constraints that cluster sizes must be the same. However, this constraints makes ECBO inconvenient to apply in cases where a certain margin of cluster size is allowed. It is believed that this issue can be overcome by applying a fuzzy clustering method. Fuzzy clustering can represent the membership of data to clusters more flexible. In this paper, we propose a new even-sized clustering algorithm based on fuzzy clustering and verify its effectiveness through numerical examples.
The Louvain method is a method of agglomerative hierarchical clustering (AHC) that uses modularity as the merging criterion. Modularity is an evaluation measure for network partitions. Cluster ...validity measures are also used to evaluate cluster partitions and to determine the optimal number of clusters. Several cluster validity measures are constructed considering the geometric features of clusters. These measures and modularity are considered to be the same concept in the viewpoint of evaluating cluster partitions. In this paper, cluster validity measures based agglomerative hierarchical clustering (CVAHC) is proposed as a novel clustering method for network data. The cluster validity measures are used as a merging criterion and an evaluation measure for network data in the proposed method. Numerical experiments show that Dunn’s and Xie-Beni’s indices for network partitions are useful for network clustering.
Two-Stage Clustering Based on Cluster Validity Measures Hamasuna, Yukihiro; Ozaki, Ryo; Endo, Yasunori
Journal of advanced computational intelligence and intelligent informatics,
01/2018, Volume:
22, Issue:
1
Journal Article
Peer reviewed
Open access
To handle a large-scale object, a two-stage clustering method has been previously proposed. The method generates a large number of clusters during the first stage and merges clusters during the ...second stage. In this paper, a novel two-stage clustering method is proposed by introducing cluster validity measures as the merging criterion during the second stage. The significant cluster validity measures used to evaluate cluster partitions and determine the suitable number of clusters act as the criteria for merging clusters. The performance of the proposed method based on six typical indices is compared with eight artificial datasets. These experiments show that a trace of the fuzzy covariance matrix
W
tr
and its kernelization
KW
tr
are quite effective when applying the proposed method, and obtain better results than the other indices.
Cluster Validity Measures for Network Data Hamasuna, Yukihiro; Kobayashi, Daiki; Ozaki, Ryo ...
Journal of advanced computational intelligence and intelligent informatics,
07/2018, Volume:
22, Issue:
4
Journal Article
Peer reviewed
Open access
Modularity is one of the evaluation measures for network partitions and is used as the merging criterion in the Louvain method. To construct useful cluster validity measures and clustering methods ...for network data, network cluster validity measures are proposed based on the traditional indices. The effectiveness of the proposed measures are compared and applied to determine the optimal number of clusters. The network cluster partitions of various network data which are generated from the Polaris dataset are obtained by
k
-medoids with Dijkstra’s algorithm and evaluated by the proposed measures as well as the modularity. Our numerical experiments show that the Dunn’s index and the Xie-Beni’s index-based measures are effective for network partitions compared to other indices.
A clustering method referred to as
K
-member clustering classifies a dataset into certain clusters, the size of which is more than a given constant
K
. Even-sized clustering, which classifies a ...dataset into even-sized clusters, is also considered along with
K
-member clustering. In our previous study, we proposed Even-sized Clustering Based on Optimization (ECBO) to output adequate results by formulating an even-sized clustering problem as linear programming. The simplex method is used to calculate the belongingness of each object to clusters in ECBO. In this study, ECBO is extended by introducing ideas that were introduced in
K
-means or fuzzy
c
-means to resolve problems of initial-value dependence, robustness against outliers, calculation costs, and nonlinear boundaries of clusters. We also reconsider the relation between the dataset size, the cluster number, and
K
in ECBO. Moreover, we verify the effectiveness of the variants of ECBO based on experimental results using synthetic datasets and a benchmark dataset.
On a Family of New Sequential Hard Clustering Hamasuna, Yukihiro; Endo, Yasunori
Journal of advanced computational intelligence and intelligent informatics,
11/2015, Volume:
19, Issue:
6
Journal Article
Peer reviewed
Open access
This paper presents a new algorithm of sequential cluster extraction based on hard
c
-means and hard
c
-medoids clustering. Sequential cluster extraction means that the algorithm extracts ‘one ...cluster at a time.’ A characteristic parameter, called a noise parameter, is used in noise clustering based sequential clustering. We propose a novel sequential clustering method called new sequential clustering, extracts an arbitrary number of objects as one cluster by considering the noise parameter as a variable to be optimized. Experimental results with four data sets confirm the effectiveness of our proposal. These results also show that classification results strongly depend on parameter ν and that our proposal is applicable to the first stage in a two-stage clustering algorithm.
Comparison of Cluster Validity Measures Based x -Means Hamasuna, Yukihiro; Kinoshita, Naohiko; Endo, Yasunori
Journal of advanced computational intelligence and intelligent informatics,
09/2016, Volume:
20, Issue:
5
Journal Article
Peer reviewed
The
x
-means determines the suitable number of clusters automatically by executing
k
-means recursively. The Bayesian Information Criterion is applied to evaluate a cluster partition in the
x
-means. ...A novel type of
x
-means clustering is proposed by introducing cluster validity measures that are used to evaluate the cluster partition and determine the number of clusters instead of the information criterion. The proposed
x
-means uses cluster validity measures in the evaluation step, and an estimation of the particular probabilistic model is therefore not required. The performances of a conventional
x
-means and the proposed method are compared for crisp and fuzzy partitions using eight datasets. The comparison shows that the proposed method obtains better results than the conventional method, and that the cluster validity measures for a fuzzy partition are effective in the proposed method.
The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be ...calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzy
c
-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.