  • Semi-supervised document categorization framework for database curation [Elektronski vir]
    Kastrin, Andrej ; Povh, Janez, 1973-
    Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interests. A controlled study using two classifiers, linear discriminant analysis ... (LDA) and logistic discrimination (LD), was conducted to examine the problem of assigning a MEDLINEŽ citation to the genetic or nongenetic domain. Performance evaluation was based on bag-of-words representation of MEDLINE citations using title and abstract words as prediction features. Validation was done on a set of 734 manually annotated MEDLINE citations. We achieved best predictive accuracy of 0.92 with 0.86 precision and 0.72 recall for LD classifier using abstract words as feature space. Our method could be easily reimplemented as a module in a general information extraction system and may thus be a powerful tool forthe broader research community.
    Vrsta gradiva - prispevek na konferenci ; neleposlovje za odrasle
    Leto - 2011
    Jezik - angleški
