ALL libraries (COBIB.SI union bibliographic/catalogue database)
  • Semi-supervised document categorization framework for database curation [Elektronski vir]
    Kastrin, Andrej ; Povh, Janez, 1973-
    Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interests. A controlled study using two classifiers, linear discriminant analysis ... (LDA) and logistic discrimination (LD), was conducted to examine the problem of assigning a MEDLINEŽ citation to the genetic or nongenetic domain. Performance evaluation was based on bag-of-words representation of MEDLINE citations using title and abstract words as prediction features. Validation was done on a set of 734 manually annotated MEDLINE citations. We achieved best predictive accuracy of 0.92 with 0.86 precision and 0.72 recall for LD classifier using abstract words as feature space. Our method could be easily reimplemented as a module in a general information extraction system and may thus be a powerful tool forthe broader research community.
    Type of material - conference contribution ; adult, serious
    Publish date - 2011
    Language - english
    COBISS.SI-ID - 1024337473