Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • Explaining a bag of words w...
    Jiang, Haiyun; Xiao, Yanghua; Wang, Wei

    World wide web (Bussum), 05/2020, Letnik: 23, Številka: 3
    Journal Article

    In natural language processing and information retrieval tasks, the bag-of-words model is widely used to represent the semantics of texts. However, it is difficult for machines to sufficiently understand a bag of words as well as the corresponding text without explicit semantic explanation, thus hindering the power of the bag-of-words model in many scenarios. In this paper, we introduce the task of hierarchical conceptual labeling (HCL), which aims to generate a set of conceptual labels with a hierarchy to explicitly explain the semantics of a bag of words, where the candidate labels are selected from a large-scale knowledge base, i.e., Microsoft Concept Graph. To this end, we first propose a denoising algorithm to filter out the noise in a bag of words in advance. Then the hierarchical conceptual labels are generated for the clean bag of words based on a hierarchical clustering algorithm, i.e., Bayesian rose trees. We conduct extensive experiments and prove that (1) the proposed denoising algorithm can effectively delete the noise words from a bag of words, (2) the Bayesian rose trees based algorithm can generate hierarchical conceptual labels for a bag of words with a high accuracy.