Clustering is a fundamental research field and plays an important role in data analysis. To better address the relationship between an element and a cluster, a Three-Way clustering method based on an ...Improved DBSCAN (3W-DBSCAN) algorithm is proposed in this paper. 3W-DBSCAN represents a cluster by a pair of nested sets called lower bound and upper bound respectively. The two bounds classify objects into three status: belong-to, not belong-to and ambiguity. Objects in lower bound certainly belong to the cluster. Objects in upper bound while not in the lower bound are ambiguous because they are in a boundary region and might belong to one or more clusters. Objects beyond the upper bound certainly do not belong to the same cluster. This clustering representation can well explain the clustering result and consist with human cognitive thinking. By improving similarity calculation, improved DBSCAN is presented to obtain initial clustering results, then three-way decision strategies are used to acquire the positive and boundary regions of a cluster. Three benchmarks Accuracy (Acc), F-measure (F1), NMI and ten datasets including three synthetic datasets, three UCI datasets and four shape datasets are used in experiments to evaluate the effectiveness of 3W-DBSCAN. Experimental results suggest that 3W-DBSCAN has a good performance and is effective in clustering.
•A three-way clustering method 3W-DBSCAN is proposed.•The representation of clustering results is consistent with human cognitive thinking.•Experiments show that 3W-DBSCAN has a good performance in clustering.
In this paper, we propose a real-time image superpixel segmentation method with 50 frames/s by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. In order to ...decrease the computational costs of superpixel algorithms, we adopt a fast two-step framework. In the first clustering stage, the DBSCAN algorithm with color-similarity and geometric restrictions is used to rapidly cluster the pixels, and then, small clusters are merged into superpixels by their neighborhood through a distance measurement defined by color and spatial features in the second merging stage. A robust and simple distance function is defined for obtaining better superpixels in these two steps. The experimental results demonstrate that our real-time superpixel algorithm (50 frames/s) by the DBSCAN clustering outperforms the state-of-the-art superpixel segmentation methods in terms of both accuracy and efficiency.
•Two new sampling approaches for applying DBSCAN to large dataset are presented.•One approach is an improvement of the Rough-DBSCAN algorithm•Other approach is a heuristic capable of fastly ...generating good result.•The heuristic approach does not require tuning any additional parameter.
DBSCAN is a classic clustering method for identifying clusters of different shapes and isolate noisy patterns. Despite these qualities, many articles in the literature address the scalability problem of DBSCAN. This work presents two methods to generate a good sample for the DBSCAN algorithm. The execution time decreases due to the reduction in the number of patterns presented to DBSCAN. One method is an improvement of the Rough-DBSCAN and presented consistently better results. The second is a new heuristic called I-DBSCAN capable of adapting and generating good results for all datasets without the need of any additional parameter.
As a kind of widely used switchgear in power system, the reliability of gas insulated switchgear (GIS) is very important for the safe operation of power systems. However, there is a lack of research ...on intelligent detection technology of mechanical state of GIS at present. A new method is urgently needed to improve the operability, effectiveness, and accuracy of fault detection in GIS. Aiming at the abnormal vibration signals generated by GIS faults, this article presents a fault diagnosis method (GA-DBSCAN) consisting of a feature selection method based on genetic algorithm (GA) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and a fault diagnosis method based on DBSCAN. First, this article analyzes the incentive force of GIS and discusses the characteristic frequency of response signal combining with the non-linear characteristics of a GIS system. Second, GA and DBSCAN are used to screen features for dimension reduction and get the optimized feature space, and DBSCAN-based classification is used to classify faults. Finally, optimized feature space is verified to be superior to the original feature space by typical classification method; the superiority and reliability of DBSCAN-based classification method under optimized feature space is verified by comparing with other classification methods. The proposed GA-DBSCAN approach can substantially increase the performance of the fault diagnosis method, which indicates that the method promotes development of intelligent detection technology of mechanical state in GIS.
<graphic position="float" orientation="portrait" xlink:href="wu15-2942618.eps"/>
The distributed state estimation problem of fault sensors with decreased detection accuracy is investigated. The biggest challenge is to reduce or even eliminate the influence of local state ...estimation of fault sensor on the fused state estimation in consensus fusion. The clustering-based distributed cubature information filter (DCIF) is proposed to deal with the challenge. To improve the accuracy of distributed state estimation, density-based spatial clustering of applications with noise (DBSCAN) clustering is used to distinguish the fault sensors with decreased detection accuracy from the normal sensors. After the normal sensors are selected by DBSCAN clustering, an improved consensus-based fusion method is designed to obtain distributed consensus estimations on the premise of keeping the communication topology of wireless sensor network (WSN) unchanged. The proposed method can delete the local state estimations of the fault sensors and realize the distributed consensus fusion. The boundedness of the estimated error is proved by the stochastic stability theory. Finally, the effectiveness of the proposed clustering-based DCIF algorithm is shown by simulation example.
•Our method automatically detects the correct number of clusters.•Our method does not need any prior knowledge to choose the cluster centers.•Our method retains only one input parameter to process ...the data.•Our method has a robust performance on the DGF problem.
Recently a delta-density based clustering (DDC) algorithm was proposed to cluster data efficiently by fast searching density peaks. In the DDC method, the density and a new-defined criterion delta-distance are utilized. The examples with anomalously large delta-density values are treated as cluster centers, then the remaining are assigned the same cluster label as their neighbor with higher density. However there are two challenges for the DDC algorithm. First, no rules are available to judge density-delta values as “anomalously large” or not. Second, the decision graph might produce the redundant examples with “anomalous large” density-delta values, as we define as the “decision graph fraud” problem. In this paper, an improved and automatic version of the DDC algorithm, named as 3DC clustering, is proposed to overcome those difficulties. The 3DC algorithm is motivated by the divide-and-conquer strategy and the density-reachable concept in the DBSCAN framework. It can automatically find the correct number of clusters in a recursive way. Experiments on artificial and real world data show that the 3DC clustering algorithm has a comparable performance with the supervised-clustering baselines and outperforms the unsupervised DDCs, which utilize the novelty detection strategies to select the “anomalously large” density-delta examples for cluster centers.
Polyploidization plays a critical role in producing new gene functions and promoting species evolution. Effective identification of polyploid types can be helpful in exploring the evolutionary ...mechanism. However, current methods for detecting polyploid types have some major limitations, such as being time-consuming and strong subjectivity, etc. In order to objectively and scientifically recognize collinearity fragments and polyploid types, we developed PolyReco method, which can automatically label collinear regions and recognize polyploidy events based on the
K
S
dotplot. Combining with whole-genome collinearity analysis, PolyReco uses DBSCAN clustering method to cluster
K
S
dots. According to the distance information in the
x
-axis and
y
-axis directions between the categories, the clustering results are merged based on certain rules to obtain the collinear regions, automatically recognize and label collinear fragments. According to the information of the labeled collinear regions on the
y
-axis, the polyploidization recognition algorithm is used to exhaustively combine and obtain the genetic collinearity evaluation index of each combination, and then draw the genetic collinearity evaluation index graph. Based on the inflection point on the graph, polyploid types and related chromosomes with polyploidy signal can be detected. The validation experiments showed that the conclusions of PolyReco were consistent with the previous study, which verified the effectiveness of this method. It is expected that this approach can become a reference architecture for other polyploid types classification methods.
•A new incremental clustering and density-based outlier detection method is proposed that simultaneously performs both clustering and outlier detection.•To the best of our knowledge, this is the ...first study to combine the concepts of incremental DBSCAN (iDBSCAN) and iLOF to detect outliers from streaming data.•To minimize the negative effects of the selection of parameters, iLDCBOF automatically adjusts its own hyperparameters for different, real-time applications.•To detect outliers from data streams and prevent their clustering, a newly-developed, core kNN (CkNN) concept is introduced.•The incremental Mahalanobis metric is used in all distance computations to reduce the impact of the data dimensions in both iLOF and iDBSCAN.
In this paper, a novel, parameter-free, incremental local density and cluster-based outlier factor (iLDCBOF) method is presented that unifies incremental versions of local outlier factor (LOF) and density-based spatial clustering of applications with noise (DBSCAN) to detect outliers efficiently in data streams. The iLDCBOF has many advanced advantages compared to previously reported iLOF-based studies: (1) it is based on a newly-developed core k-nearest neighbor (CkNN) concept to reliably and scalably detect outliers from data streams and prevent the clustering of outliers; 2) it uses a newly-developed algorithm that automatically adjusts the value of the k (number of neighbors) parameter for different real-time applications; and 3) it uses the Mahalanobis distance metric, so its performance is not affected even for large amounts of data. The iLDCBOF method is well suited for different data stream applications because it requires no distribution assumptions, it is parameterless (determined automatically), and it is easy to implement. ROC-AUC and statistical test analysis results from extensive experiments performed on 16 different real-world datasets showed that the iLDCBOF method significantly outperformed benchmark methods.