Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • Cost-effective hierarchical...
    Xie, Wen-Bo; Chen, Bin; Fu, Xun; Shi, Jun-Hao; Lee, Yan-Li; Wang, Xin

    Information sciences, August 2024, 2024-08-00, Letnik: 676
    Journal Article

    Hierarchical clustering plays a crucial role in real-world knowledge discovery and data mining applications. This powerful technique provides tree-shaped results that are typically considered data summaries. However, achieving well-organized outputs requires a challenging trade-off between computational complexity (both in time and space) and clustering accuracy, especially in big data scenarios. To address this challenge, we propose a novel agglomerative algorithm for hierarchical clustering. Our algorithm constructs tree-shaped subclusters using a nearest-neighbour chain search. Next, the proxy (root) for each subcluster is identified using a local density peak detection mechanism, which guides the subsequent aggregation. Additionally, we propose a non-parametric variant to facilitate the easy implementation of the algorithm in real-world applications. Comprehensive experimental studies on fourteen real-world and synthetic datasets demonstrate that our algorithm surpasses other benchmarks in terms of clustering accuracy, response time, and memory footprint in most cases. Notably, our proposed algorithm can handle up to two million data points on a personal computer, further verifying its cost-effectiveness. •A novel agglomerative clustering algorithm based on local density peaks is proposed.•A non-parametric variant based on multi-scope cutoff distances is proposed.•A probabilistic analysis is done to establish the theoretical correctness.•Extensive experiments on real-world datasets verify the advantage of our approach.