Akademska digitalna zbirka SLovenije - logo
E-viri
Recenzirano Odprti dostop
  • Out of vocabulary word dete...
    Jemni, Sana Khamekhem; Kessentini, Yousri; Kanoun, Slim

    Pattern recognition, 09/2019, Letnik: 93
    Journal Article

    •A novel two-step OOV words detection and recovery method is proposed.•The proposed method is generic and independent of the recognition engine.•The proposed method uses various sub-lexical modeling to improve the detection step.•The recovery process relies on dynamic lexicons built from large text corpora.•The proposed method significantly improves the recognition results. Today's Arabic Handwriting recognition systems are able to recognize arbitrary words over a large but finite vocabulary. Systems operating with a fixed vocabulary are bound to encounter so-called out-of-vocabulary (OOV) words. The aim of this research is to propose a two-step approach that tackles the problem of OOV words in Arabic handwriting. In the first step, we exploit different types of sub-word units to detect the potential OOVs. In the recovery stage, a dynamic dictionary is built to extend the initial static word lexicon in order to cope with the detected OOVs. The recovery includes a selection step in which the best word candidates extracted from the external resource are kept. Experiments were conducted on the public benchmarking KHATT and AHTID/MW databases. The obtained results revealed that sub-word modeling could give cues for improving the detection and that the use of a dynamic dictionary significantly improves the recognition performance compared to one-step approaches that are based on a large static dictionary or the combination of different sub-word units. We achieve the state of the art results on the KHATT dataset.