NUK - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • OneSketch: A Generic and Ac...
    Fan, Zhuochen; Wang, Ruixin; Cai, Yalun; Zhang, Ruwen; Yang, Tong; Wu, Yuhan; Cui, Bin; Uhlig, Steve

    IEEE transactions on knowledge and data engineering, 12/2023, Letnik: 35, Številka: 12
    Journal Article

    In this paper, we propose a generic sketch algorithm capable of achieving more accuracy in the following five tasks: finding top-<inline-formula><tex-math notation="LaTeX">k</tex-math></inline-formula> frequent items, finding heavy hitters, per-item frequency estimation, and heavy changes in the time and spatial dimension. The state-of-the-art (SOTA) sketch solution for multiple measurement tasks is ElasticSketch (ES). However, the accuracy of its frequency estimation has room for improvement. The reason for this is that ES suffers from overestimation errors in the light part, which introduces errors when querying both frequent and infrequent items. To address these problems, we propose a generic sketch, OneSketch, designed to minimize overestimation errors. To achieve the design goal, we propose four key techniques, which embrace hash collisions and minimize possible errors by handling highly recurrent item replacements well. Experimental results show that OneSketch clearly outperforms 12 SOTA schemes. For example, compared with ES, OneSketch achieves more than 10× lower Average Absolute Error on finding top-<inline-formula><tex-math notation="LaTeX">k</tex-math></inline-formula> frequent items and heavy hitters, as well as 48.3% and 38.4% higher F1 Scores on two heavy changes under 200KB memory, respectively.