Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Evaluating the construct va...
    Fang, Qixiang; Nguyen, Dong; Oberski, Daniel L.

    EPJ data science, 07/2022, Letnik: 11, Številka: 1
    Journal Article

    Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are high-quality representations of the information needed to be encoded. We view this quality evaluation problem from a measurement validity perspective, and propose the use of the classic construct validity framework to evaluate the quality of text embeddings. First, we describe how this framework can be adapted to the opaque and high-dimensional nature of text embeddings. Second, we apply our adapted framework to an example where we compare the validity of survey question representation across text embedding models.