UNI-MB - logo
UMNIK - logo
 
E-resources
Full text
Peer reviewed
  • An Index for Set Intersecti...
    Wang, Ru; Lu, Shangqi; Tao, Yufei

    IEEE transactions on knowledge and data engineering, 07/2024, Volume: 36, Issue: 7
    Journal Article

    This paper studies how to design an index structure on a collection of sets <inline-formula><tex-math notation="LaTeX">S_{1}, S_{2},{\ldots }, S_{n}</tex-math> <mml:math><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>...</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math><inline-graphic xlink:href="tao-ieq1-3329145.gif"/> </inline-formula> to answer the following queries: given distinct set ids <inline-formula><tex-math notation="LaTeX">a, b \in 1, n</tex-math> <mml:math><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>∈</mml:mo><mml:mo></mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi><mml:mo></mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="tao-ieq2-3329145.gif"/> </inline-formula>, report <inline-formula><tex-math notation="LaTeX">F(S_{a} \cap S_{b})</tex-math> <mml:math><mml:mrow><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>a</mml:mi></mml:msub><mml:mo>∩</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="tao-ieq3-3329145.gif"/> </inline-formula> where <inline-formula><tex-math notation="LaTeX">F(.)</tex-math> <mml:math><mml:mrow><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:mo>.</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="tao-ieq4-3329145.gif"/> </inline-formula> is a filtering function. We present a solution that can support a great variety of filtering functions - range research, skyline, convex hull, nearest neighbor search, quantile (to name just a few) - with attractive performance guarantees. The guarantees are sensitive to the set collection's pseudoarboricity , a new notion for quantifying the density of <inline-formula><tex-math notation="LaTeX">\lbrace S_{1}, S_{2},{\ldots }, S_{n}\rbrace</tex-math> <mml:math><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>...</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="tao-ieq5-3329145.gif"/> </inline-formula>. Our index structures are simple to understand and implement.