  • Document categorization based on OCR technology : an overview
    Zelenika, Darko ; Povh, Janez, 1973- ; Dobrovoljc, Andrej, 1967-
    In this paper a brief overview of document categorization process is presented with the focus on documents obtained by OCR (Optical Character Recognition) technology. Work of different authors from ... area of document categorization is described. Text obtained by OCR needs to be prepared in a way that categorization algorithms can use it to provide better categorization accuracy, thus such methods are introduced. A comparison of results of different categorization algorithms is shown. Most authors obtained the best results with SVM (Support Vector Machine) classifier. Two document categorization software programs are introduced both commercial and open source. An invoice recognition project on which authors of this paper are working on is introduced.
    Leto - 2013
    Leto - 2013
    Jezik - angleški
