The collaborative annealing power k-means++ (CAPKM++) clustering algorithm has been recently proposed based on multiple modules by minimizing annealed power-mean functions. This paper presents an ...upgraded version of CAPKM++ called CAPKM++2.0. Different from CAPKM++ where the anchor points of surrogate functions for majorizing the power-mean functions are re-initialized and minimized repeatedly after annealing, CAPKM++2.0 re-initializes the weights of the majorization function during annealing. In addition, unlike CAPKM++ that minimizes the majorization function of the power-mean sum, CAPKM++2.0 adds an inner loop to minimize the power-mean sum iteratively and locally at every annealing step. Ablation study results are discussed to justify the adoption of the power-mean and the collaboration of multiple modules. Experimental results on sixteen benchmark datasets are elaborated to demonstrate the superior clustering performance of the upgraded algorithm compared with its predecessor and six other mainstream algorithms in terms of cluster validity indices and algorithmic complexities.
Advances in recent techniques for scientific data collection in the era of big data allow for the systematic accumulation of large quantities of data at various data-capturing sites. Similarly, ...exponential growth in the development of different data analysis approaches has been reported in the literature, amongst which the K-means algorithm remains the most popular and straightforward clustering algorithm. The broad applicability of the algorithm in many clustering application areas can be attributed to its implementation simplicity and low computational complexity. However, the K-means algorithm has many challenges that negatively affect its clustering performance. In the algorithm’s initialization process, users must specify the number of clusters in a given dataset apriori while the initial cluster centers are randomly selected. Furthermore, the algorithm's performance is susceptible to the selection of this initial cluster and for large datasets, determining the optimal number of clusters to start with becomes complex and is a very challenging task. Moreover, the random selection of the initial cluster centers sometimes results in minimal local convergence due to its greedy nature. A further limitation is that certain data object features are used in determining their similarity by using the Euclidean distance metric as a similarity measure, but this limits the algorithm’s robustness in detecting other cluster shapes and poses a great challenge in detecting overlapping clusters. Many research efforts have been conducted and reported in literature with regard to improving the K-means algorithm’s performance and robustness. The current work presents an overview and taxonomy of the K-means clustering algorithm and its variants. The history of the K-means, current trends, open issues and challenges, and recommended future research perspectives are also discussed.
This paper proposes as an element of novelty the Unified Form (UF) clustering algorithm, which treats Fuzzy C-Means (FCM) and K-Means (KM) algorithms as a single configurable algorithm. UF algorithm ...was designed to facilitate the FCM and KM algorithms software implementation by offering a solution to implement a single algorithm, which can be configured to work as FCM or KM. The second element of novelty of this paper is the Partitional Implementation of Unified Form (PIUF) algorithm, which is built upon the UF algorithm and designed to solve in an elegant manner the challenges of processing large datasets in a sequential manner and the scalability of the UF algorithm for processing datasets of any size. PIUF algorithm has the advantage of overcoming any possible hardware limitations that can occur if large volumes of data are processed (required to be stored, loaded in memory and processed by a certain specified computational system). PIUF algorithm is designed and formulated to be used on a single machine if the processed dataset is very big and it cannot be entirely loaded in the memory; at the same time it can be scaled to multiple processing nodes for reducing the processing time required to find the optimal solution. UF and PIUF algorithms are implemented and validated in BigTim platform, which is a distributed platform developed by the authors, and offers support for processing various datasets in a parallel manner but they can be implemented in any other data processing platforms. The Iris dataset is considered and next modified to obtain different datasets of different sizes in order to test the algorithms implementations in BigTim platform in different configurations. The analysis of PIUF algorithm and the comparison with FCM, KM and DBSCAN clustering algorithms are carried out using two performance indices; three performance indices are employed to evaluate the quality of the obtained clusters.
•A unified form (UF) to treat Fuzzy C-means and K-means algorithms is proposed.•UF algorithm reduces the effort required for the software implementation.•UF algorithm runs as a distributed algorithm.•UF algorithm is implemented and validated using BigTim distributed platform.•The results are analyzed and compared using several performance indices.
Many algorithms designed to accelerate the fuzzy c-means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the ...impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, i.e., geometric progressive fuzzy c-means (GOFCM) and minimum sample estimate random fuzzy c-means (MSERFCM), that use a statistical method to estimate the subsample size. GOFCM, which is a variant of single-pass fuzzy c-means (SPFCM), also leverages progressive sampling. MSERFCM, which is a variant of random sampling plus extension fuzzy c-means, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared with FCM and four accelerated variants of FCM. GOFCM's speedup was four-47 times that of FCM and faster than SPFCM on each of the six datasets that are used in the experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was five-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM.
The purpose of this Special Issue is to pay tribute to the significant contributions made by Professor Feng Qi in these fields and to provide some important recent advances in theory, methods, and ...applications.
Let us consider the second trigonometric mean T defined by Seiffert and the hyperbolic mean M defined by Neuman and Sándor. There are some known inequalities between these means and some power means ...Ap. We prove that the evaluationsAln2/ln(ln(3+22))<M<A4/3andAln2/ln(π/2)<T<A5/3are optimal. In some details of the proofs we have used the computer algebra Mathematica.
Summary
The procedure for revelation and extraction is known as the data mining (DM) where immense measure of information is included. The information mining along with the human services industry ...has developed solid frameworks and different social insurance that are related to the frameworks from the clinical and finding information. Numerous types of assistance have been offered by healthcare. This includes diagnosing, treatment, avoidance of maladies, sicknesses, wounds and other physical and mental issue. Enormous scopes that are appropriated for preparing applications in healthcare have essential idea that works on a lot of information. Large information application capacities are the primary piece of medicinal services activities. However, no far reaching and methodical review regarding the examining and assessing the significant systems in the field. The different researches associated with the field of healthcare are considered or different strategies, calculations and results. Enormous information object has countless traits, representing a wonderful test on fuzzy C‐means (FCM) for huge information constant grouping. An effective FCM depends on the tensor accepted polyadic deterioration for huge information clustering. The advanced and enhanced fuzzy C‐means algorithm for healthcare (AEFCH) calculation which is an introduced term is changed over to a high‐request tensor FCM calculation by means of a bijection work. Tensor authoritative polyadic deterioration is used to decrease qualities of the items for improving the grouping productivity. Outcomes obtained accomplish fundamentally higher productivity with a slight cluster exactness drop contrasted and the customary calculation, demonstrating the capability of plot that has been created for information of smart data from Internet of Things.
✓
The procedure for revelation and extraction is known as the Data mining (DM) where immense measure of information is included.
✓
This includes, diagnosing, treatment, avoidance of maladies, sicknesses, wounds, and other physical and mental issue.
✓
The different researches associated with the field of healthcare are considered or different strategies, calculations and results.
✓
The Advanced and Enhanced fuzzy c‐means algorithm for healthcare (AEFCH) calculation which is an introduced term is changed over to a high‐request tensor FCM calculation by means of a bijection work.
✓
Tensor authoritative polyadic deterioration is used to decrease qualities of the items for improving the grouping productivity.
✓
Outcomes obtained accomplishes fundamentally higher productivity with a slight cluster exactness drop contrasted and the customary calculation, demonstrating the capability of plot that has been created for information of smart data.
In this paper, using the monotone form of L'Hôspital's rule and the criterion for the monotonicity of the quotient of power series the exponential inequalities are established for two Seiffert–like ...means (tangent mean and hyperbolic sine mean) bounded by arithmetic mean and harmonic mean.
The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to ...clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.