In this paper, we present a novel methodology for multiple script identification using Long Short-Term Memory (LSTM) networks' sequence-learning capabilities. Our method is able to identify multiple ...scripts at text-line level, where two or more scripts are present in the same text-line. Unlike traditional techniques, where either shape features or bounding boxes of individual characters are extracted, the LSTM-based system learns a particular script in a supervised learning framework. Moreover, this system neither needs specific features nor other preprocessing steps other than text-line extraction and text-line normalization. The proposed method works on text-line level, where it identifies each character as belonging to a particular script. We have developed a database consisting of English and Greek script, and our system achieved a script recognition accuracy of 98.186% on this dataset.
Identifying Cross-Depicted Historical Motifs Pondenkandath, Vinaychandran; Alberti, Michele; Eichenberger, Nicole ...
2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)
Conference Proceeding
Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners.This is a common problem in handwritten historical document image analysis, for instance ...when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional computer vision methods struggle to cope with it. In this paper we address this problem using state-of-the-art deep learning techniques on a dataset of historical watermarks containing images created with different methods of reproduction, such as hand tracing, rubbing, and radiography.To study the robustness of deep learning based approaches to the cross-depiction problem, we measure their performance on two different tasks: classification and similarity rankings. For the former we achieve a classification accuracy of 96 % using deep convolutional neural networks. For the latter we have a false positive rate at 95% recall of 0.11. These results outperform state-of-the-art methods by a significant margin.
Space Anomalies in OCRs for Arabic Like Scripts Ahmad, Riaz; Afzal, M. Zeshan; Faisal Rashid, S. ...
2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR),
2018-March
Conference Proceeding
This paper investigates and analyses the nature of errors occurring in Optical Character Recognition (OCR) for Arabic-like scripts. Existing research on the area of OCR for Arabic-like scripts often ...focuses on achieving the best performance in terms of character error rates. Only little effort targets at the analysis of the nature of these errors (anomalies) that may occur. One such important anomaly is Space Anomaly. This anomaly is due to the presence of breaker characters that are an essential part of Arabic-like scripts. The spaces introduced by breaker characters are not depicted in the ground truth making it hard for OCR to generalize. The OCR model either learns to inhibit the original spaces or to generate extra spaces at places where they are not correct. Due to this confusion, the rendering looks sub-optimal. This analyses and removes space anomalies. We present a joint approach that does not only perform OCR but also handles the space anomalies in a robust manner, hence significantly outperforming the state-of-the-art. Although the implication of the work is shown by improved character recognition rate, the impact of this research is much higher in terms of the correctness of the OCR for useful purposes, especially for rendering. The claim is supported by empirical evaluation and it is shown that the proposed approach achieved the best results.
In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word ...mappings as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long interested in Artificial Intelligence, Psychology, and Neuroscience. In this work, we extend a recent approach for multimodal sequences (visual and audio) to also cope with missing elements in one or both modalities. Our method uses two parallel Long Short-Term Memories (LSTMs) with a learning rule based on EM-algorithm. It aligns both LSTM outputs via Dynamic Time Warping (DTW). We propose to include an extra step for the combination with the max operation for exploiting the common elements between both sequences. The motivation behind is that the combination acts as a condition selector for choosing the best representation from both LSTMs. We evaluated the proposed extension in the following scenarios: missing elements in one modality (visual or audio) and missing elements in both modalities (visual and sound). The performance of our extension reaches better results than the original model and similar results to individual LSTM trained in each modality.
Many languages use Arabic script for written communication either in basic or augmented form. These languages include Urdu, Pashto, Persian, etc. As the primary characters are shared among all these ...languages, it is possible to take advantage of the visual similarities for Optical Character Recognition (OCR). OCR models optimized for individual languages have been proposed. However, to the best of our knowledge, there is no attempt to develop a single system for more than one language. The contributions of the presented work are: First, it investigates the effect on the recognition accuracy when different languages are combined (A pioneering study). Second, it introduces publicly available synthetic datasets for Arabic and Pashto languages for experimental purposes. Third, this paper provides statistical analysis as clues for transfer learning concerning OCR systems for Arabic, Urdu, and Pashto languages.
Current approaches for text line segmentation often are either very specialized to specific domains or they depend on many parameters. More specifically, the extraction of text-lines with large ...sizes, i.e., headings and titles in the Arabic like script could not be segmented correctly by state-of-the-art methods. In this work, we present a simple and robust text-line segmentation approach. The proposed method is tested on real Pashto scanned images and it outperforms a recent text independent state of the art method with respect to performance and time.
The contribution of this paper is a new strategy of integrating multiple recognition outputs of diverse recognizers. Such an integration can give higher performance and more accurate outputs than a ...single recognition system. The problem of aligning various Optical Character Recognition (OCR) results lies in the difficulties to find the correspondence on character, word, line, and page level. These difficulties arise from segmentation and recognition errors which are produced by the OCRs. Therefore, alignment techniques are required for synchronizing the outputs in order to compare them. Most existing approaches fail when the same error occurs in the multiple OCRs. If the corrections do not appear in one of the OCR approaches are unable to improve the results. We design a Line-to-Page alignment with edit rules using Weighted Finite-State Transducers (WFST). These edit rules are based on edit operations: insertion, deletion, and substitution. Therefore, an approach is designed using Recurrent Neural Networks with Long Short-Term Memory (LSTM) to predict these types of errors. A Character-Epsilon alignment is designed to normalize the size of the strings for the LSTM alignment. The LSTM returns best voting, especially when the heuristic approaches are unable to vote among various OCR engines. LSTM predicts the correct characters, even if the OCR could not produce the characters in the outputs. The approaches are evaluated on OCR's output from the UWIII and historical German Fraktur dataset which are obtained from state-of-the-art OCR systems. The experiments shows that the error rate of the LSTM approach has the best performance with around 0.40%, while other approaches are between 1.26% and 2.31%.
The aim of this work is to investigate Long Short-Term Memory (LSTM) for finding the semantic associations between two parallel text lines of different instances of the same class sequence. In this ...work, we propose a new model called class-less classifier, which is cognitive motivated by a simplified version of the infants learning. The presented model not only learns the semantic association but also learns the relation between the labels and the classes. In addition, our model uses two parallel class-less LSTM networks and the learning rule is based on the alignment of both networks. For testing purposes, a parallel sequence dataset is generated based on MNIST dataset, which is a standard dataset for handwritten digit recognition. The results of our model were similar to the standard LSTM.
We introduce a fast and robust subspace-based approach to appearance-based object tracking. The core of our approach is based on Fast Robust Correlation (FRC), a recently proposed technique for the ...robust estimation of large translational displacements. We show how the basic principles of FRC can be naturally extended to formulate a robust version of Principal Component Analysis (PCA) which can be efficiently implemented incrementally and therefore is particularly suitable for robust real-time appearance-based object tracking. Our experimental results demonstrate that the proposed approach outperforms other state-of-the-art holistic appearance-based trackers on several popular video sequences.