Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • Continuous kk-Regret Minimi...
    Zheng, Jiping; Ma, Wei; Wang, Yanhao; Wang, Xiaoyang

    IEEE transactions on knowledge and data engineering, 2023-June-1, Letnik: 35, Številka: 6
    Journal Article

    Finding a small set of representative tuples from a large database is an important functionality for supporting multi-criteria decision making. Top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq2-3166835.gif"/> </inline-formula> queries and skyline queries are two widely studied queries to fulfill this task. However, both of them have some limitations: a top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq3-3166835.gif"/> </inline-formula> query requires the user to provide her utility functions for finding the <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq4-3166835.gif"/> </inline-formula> tuples with the highest scores as the result; a skyline query does not need any user-specified utility function but cannot control the result size. To overcome their drawbacks, the <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq5-3166835.gif"/> </inline-formula>-regret minimization query was proposed and received much attention recently, since it does not require any user-specified utility function and returns a fixed-size result set. Specifically, it selects a set <inline-formula><tex-math notation="LaTeX">R</tex-math> <mml:math><mml:mi>R</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq6-3166835.gif"/> </inline-formula> of tuples with a pre-defined size <inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq7-3166835.gif"/> </inline-formula> from a database <inline-formula><tex-math notation="LaTeX">D</tex-math> <mml:math><mml:mi>D</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq8-3166835.gif"/> </inline-formula> such that the maximum <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq9-3166835.gif"/> </inline-formula>-regret ratio , which captures how well the top-ranked tuple in <inline-formula><tex-math notation="LaTeX">R</tex-math> <mml:math><mml:mi>R</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq10-3166835.gif"/> </inline-formula> represents the top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq11-3166835.gif"/> </inline-formula> tuples in <inline-formula><tex-math notation="LaTeX">D</tex-math> <mml:math><mml:mi>D</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq12-3166835.gif"/> </inline-formula> for any possible utility function, is minimized. Although there have been many methods for <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq13-3166835.gif"/> </inline-formula>-regret minimization query processing, most of them are designed for static databases without tuple insertions and deletions. The only known algorithm to process continuous <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq14-3166835.gif"/> </inline-formula>-regret minimization queries (C<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq15-3166835.gif"/> </inline-formula>RMQ) in dynamic databases suffers from suboptimal approximation and high time complexity. In this paper, we propose a novel dynamic coreset-based approach, called DynCore , for C<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq16-3166835.gif"/> </inline-formula>RMQ processing. It achieves the same (asymptotically optimal) upper bound on the maximum <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq17-3166835.gif"/> </inline-formula>-regret ratio as the best-known static algorithm. Meanwhile, its time complexity is sublinear to the database size, which is significantly lower than that of the existing dynamic algorithm. The efficiency and effectiveness of DynCore is confirmed by experimental results on real-world and synthetic datasets.