Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” 335. After only one year, the first effects ...of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh 33, in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory. Starting from this paper, several uncertain clustering methods based on different theoretical approaches for modeling the uncertainty have been proposed. The present paper presents a systematic literature review of these clustering approaches. In particular, with respect to the Statistical Reasoning System, we first illustrate the connection between Information and Uncertainty from the perspective of the so-called Informational Paradigm, according to which Information is constituted by “Informational ingredients”, specifically the “Empirical Information,” represented by statistical data, and “Theoretical information” consisting of background knowledge and basic modeling assumptions. We then describe different kinds of uncertainty affecting the Information. Focusing on the uncertainty associated with a particular statistical methodology, i.e. Cluster Analysis, and adopting as theoretical platform the Informational Paradigm, we present a systematic literature review of different uncertainty-based clustering approaches -i.e. Fuzzy clustering, Possibilistic clustering, Shadowed clustering, Rough sets-based clustering, Intuitionistic fuzzy clustering, Evidential clustering, Credibilistic clustering, Type-2 fuzzy clustering, Neutrosophic clustering, Hesitant fuzzy clustering, Interval-based fuzzy clustering, and Picture fuzzy clustering. We thus show how all these clustering approaches are able of managing in different ways the uncertainty associated with the two components of the Informational Paradigm, i.e. the Empirical and Theoretical Information.
The present study aimed to present a new algorithm called Semi-supervised Multiple Kernel Fuzzy Clustering based on Entropy and Relative entropy (SMKFC-ER) by focusing on external knowledge related ...to the labeled data. In the proposed method, entropy coefficient and relative entropy divergence measure are applied instead of fuzzifier for unsupervised section and the geometric distance measure for semi-supervised section respectively, by emphasizing on combining unsupervised and semi-supervised sections explicitly. The use of relative entropy and entropy in the objective function results in sharing more consistent concepts for the semi-supervised section, controlling the fuzziness of the extracted clusters, and determining the kernel weights regularly for the unsupervised section. Finally, using relative entropy with entropy simultaneously derives a closed-form solution. The performance and supremacy of the proposed method on non-spherical synthetic and real-world datasets are shown by comparing unsupervised and semi-supervised fuzzy clustering methods.
As an unsupervised learning method, clustering does not need to know prior knowledge of the datasets in advance. How determining the optimal number of clusters becomes an important method to judge ...the quality of clustering results. For fuzzy clustering algorithms, the introduction to fuzzy partition makes it more consistent with the structure of real datasets than hard clustering algorithms. Therefore, it is necessary to carry out the research on the validity evaluation methods of fuzzy clustering. At present, the research on fuzzy clustering validity mainly focuses on the fuzzy clustering validity index (FCVI) and the combined fuzzy clustering validity evaluation method (CFCVE). From these two aspects, this paper reviews fuzzy clustering validity functions and combined fuzzy clustering validity evaluation methods. Then FCVI and CFCVE are discussed in details from different points on fuzzy clustering validity functions, and the research status and construction strategies of different fuzzy clustering validity evaluation methods are analyzed. The accuracy and stability of each fuzzy clustering validity evaluation method are analyzed through comparative experiments. Finally, the paper summarizes the shortcomings and advantages of the current research on fuzzy clustering validity and looks forward to the research direction and improved methods of the evaluation methods.
The fuzzy-C-means (FCM) algorithm is one of the most famous fuzzy clus-tering algorithms, but it gets stuck in local optima. In addition, this algo-rithm requires the number of clusters. Also, the ...density-based spatial of the application with noise (DBSCAN) algorithm, which is a density-based clus-tering algorithm, unlike the FCM algorithm, should not be pre-numbered. If the clusters are specific and depend on the number of clusters, then it can determine the number of clusters. Another advantage of the DBSCAN clus-tering algorithm over FCM is its ability to cluster data of different shapes. In this paper, in order to overcome these limitations, a hybrid approach for clustering is proposed, which uses FCM and DBSCAN algorithms. In this method, the optimal number of clusters and the optimal location for the centers of the clusters are determined based on the changes that take place according to the data set in three phases by predicting the possibility of the problems stated in the FCM algorithm. With this improvement, the values of none of the initial parameters of the FCM algorithm are random, and in the first phase, it has been tried to replace these random values to the optimal in the FCM algorithm, which has a significant effect on the convergence of the algorithm because it helps to reduce iterations. The proposed method has been examined on the Iris flower and compared the results with basic FCM algorithm and another algorithm. Results shows the better performance of the proposed method.
An Internet of things (IOT) based plant diseased leaf segmentation and recognition method is proposed based on Fusion of Super-pixel clustering, K-mean clustering and pyramid of histograms of ...orientation gradients (PHOG) algorithms. Firstly, the color diseased leaf image is divided into a few compact super-pixels by super-pixel clustering algorithm. Then K-means clustering algorithm is employed to segment the lesion image from each super-pixel. Finally, the PHOG features are extracted from three color components of each segmented lesion image and its grayscale image, and concatenate four PHOG descriptors as a vector. The experiment results on two plant diseased leaf image databases indicate that the proposed method is effective. This paper provides a feasible solution for plant diseased leaf image segmentation and plant disease recognition.
Based on picture fuzzy set theory, picture fuzzy clustering has achieved good results on some data as more information is involved in the clustering process. However, current picture fuzzy clustering ...methods still suffer from two common weaknesses, i.e., the sensitivity to outliers and the neglect of the uncertainty caused by different fuzzy degrees, which influence their performance in practical applications like medical image segmentation. To solve these issues, we present two new picture fuzzy clustering methods in this paper. First, to improve immunity to outliers, we propose an outlier-robust picture fuzzy clustering method named ORPFC by using a robust distance measurement, which treats the data objects far away from cluster prototypes as outliers and limits their effects on the prototype update. Second, to handle the uncertainty caused by fuzzy degrees, we further present an interval type-2 enhanced method called IT2ORPFC by incorporating the interval type-2 fuzzy set theory into ORPFC. In each iteration, IT2ORPFC estimates positive memberships, neutral memberships, and refusal memberships according to different fuzzification coefficients and then conducts type reduction for reliable type-1 clustering results. In the experiments, the proposed methods obtain robust and reliable results on eleven datasets. Specifically, ORPFC and IT2ORPFC achieve rewarding performance on segmenting medical images with noise.
•This paper proposes an outlier-robust picture fuzzy clustering method named ORPFC.•This paper further presents an interval type-2 enhanced method called IT2ORPFC.•The superiorities of proposed methods are verified by the comprehensive experiments.
Fuzzy clustering ensemble that combines multiple fuzzy clustering results can obtain more robust, novel, stable, and consistent clustering result. The research about fuzzy clustering ensemble is ...still in the initial stage. Due to the special information expression, excellent clustering ideas are not well-practiced in fuzzy clustering ensemble and the performance of fuzzy clustering ensemble still has a large improvement space. In data clustering, prototype-based clustering is effective and efficient. The main idea of prototype-based clustering is discovering prototype samples to represent clusters and assigning samples to the represented clusters. In this paper, we draw the idea of prototype-based clustering to fuzzy clustering ensemble and handle the problems of how to discover prototype samples based on a set of fuzzy clustering results and how to assign the samples without accessing the original data features. Firstly, we propose a self co-association measure of a sample and discover its natural ability to evaluate the sample's local density. The rationality of the prototype samples discovered based on self co-association is theoretically analyzed and visually shown on eight artificial data sets. Then, we propose a prototype propagation method to assign data samples gradually. The working mechanism of the proposed sample assignment method is visually shown in the image segmentation scene. Finally, we develop a fuzzy clustering ensemble method based on self co-association and prototype propagation. The effectiveness of the proposed method is illustrated by comparing it with eight representative methods on benchmark data sets.
Time series analysis models, understands, and predicts phenomena from different domains such as meteorology, medicine, and economics. In this context, Fuzzy Time Series has been standing out due to ...its capacity of using mathematical functions to represent linguistic variables, resulting in interpretative and more accurate models. Several studies aim at improving time series forecasting using fuzzy set theory, however such efforts were executed solely to obtain modeling improvements in the fuzzification stage, disregarding the stochastic and deterministic influences composing time series to assist the modeling process. In an attempt to fill out such gap, this manuscript employs the Empirical Mode Decomposition (EMD) to extract the deterministic influences which are next modeled using fuzzy clustering to improve time series forecasting. EMD reduces the data imprecision and uncertainty thus helping the fuzzy clustering stage that automatically finds an adequate space partitioning to produce the fuzzy sets. We use Fuzzy C-means to generate fuzzy sets with different characteristics, contributing to improvements in the definition of the number of sets, usually barely explored in the fuzzification stage. Our method was assessed using different validation indices while forecasting time series, which has confirmed promising results and supported the interpretability of data ranges along time.
•FTS stands out for using mathematical functions to represent linguistic variables.•Our approach models deterministic influences using fuzzy to improve TS forecasting.•We use FC-means to estimate the number of sets barely explored in fuzzification stage.•Our experiments were designed using the same TS and setup proposed in the literature.•Our approach overcomes studies from the literature while reducing noise influences.