NUK - logo
E-viri
Celotno besedilo
Recenzirano
  • KNN-BLOCK DBSCAN: Fast Clus...
    Chen, Yewang; Zhou, Lida; Pei, Songwen; Yu, Zhiwen; Chen, Yi; Liu, Xin; Du, Jixiang; Xiong, Naixue

    IEEE transactions on systems, man, and cybernetics. Systems, 2021-June, 2021-6-00, Letnik: 51, Številka: 6
    Journal Article

    Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise). KNN-BLOCK DBSCAN uses a fast approximate kNN algorithm, namely, FLANN, to detect core-blocks (CBs), noncore-blocks, and noise-blocks within which all points have the same type, then a fast algorithm for merging CBs and assigning noncore points to proper clusters is also invented to speedup the clustering process. The experimental results show that KNN-BLOCK DBSCAN is an effective approximate DBSCAN algorithm with high accuracy, and outperforms other current variants of DBSCAN, including <inline-formula> <tex-math notation="LaTeX">\rho </tex-math></inline-formula>-approximate DBSCAN and AnyDBC.