•A Generative Adversarial Network for handwritten document image binarization.•We perform document binarization while ensuring text readability, simultaneously, by integrating a handwritten text ...recognition component within the proposed architecture.•The proposed model enhances different forms of documents, independently of the text language.•We achieve state-of-the-art performance on the public H-DIBCO datasets.
Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover the degraded documents into a clean and readable form. Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable. To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents. Extensive experiments conducted on degraded Arabic and Latin handwritten documents demonstrate the usefulness of integrating the recognizer within the GAN architecture, which improves both the visual quality and the readability of the degraded document images. Moreover, we outperform the state of the art in H-DIBCO challenges, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images, on this task.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
In a classical paper, Chvátal introduced a rounding procedure for strengthening the polyhedral relaxation P of an integer program; applied recursively, the number of iterations needed to obtain the ...convex hull of the integer solutions in P is known as the Chvátal rank. Chvátal showed that this rank can be exponential in the input size L needed to describe P. We give a compact extended formulation of P, described by introducing binary variables, whose rank is polynomial in L.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Document image binarization classifies each pixel in an input document image as either foreground or background under the assumption that the document is pseudo binary in nature. However, noise ...introduced during acquisition or due to aging or handling of the document can make binarization a challenging task. This paper presents a novel game theory inspired binarization technique for degraded document images. A two-player, non-zero-sum, non-cooperative game is designed at the pixel level to extract the local information, which is then fed to a K-means algorithm to classify a pixel as foreground or background. We also present a preprocessing step that is performed to eliminate the intensity variation that often appears in the background and a post-processing step to refine the results. The method is tested on seven publicly available datasets, namely, DIBCO 2009-14 and 2016. The experimental results show that GiB (Game theory Inspired Binarization) outperforms competing state-of-the-art methods in most cases.
•A new domain adaptation method for document image binarization.•Modification of a state-of-the-art approach for unsupervised scenarios.•Adaptation driven by a novel similarity measure between ...domains.•Experiments with all pairwise combinations of five datasets.•Improvement by over 42% with respect to the state of the art.
Binarization is a well-known image processing task, whose objective is to separate the foreground of an image from the background. One of the many tasks for which it is useful is that of preprocessing document images in order to identify relevant information, such as text or symbols. The wide variety of document types, alphabets, and formats makes binarization challenging. There are multiple proposals with which to solve this problem, from classical manually-adjusted methods, to more recent approaches based on machine learning. The latter techniques require a large amount of training data in order to obtain good results; however, labeling a portion of each existing collection of documents is not feasible in practice. This is a common problem in supervised learning, which can be addressed by using the so-called Domain Adaptation (DA) techniques. These techniques take advantage of the knowledge learned in one domain, for which labeled data are available, to apply it to other domains for which there are no labeled data. This paper proposes a method that combines neural networks and DA in order to carry out unsupervised document binarization. However, when both the source and target domains are very similar, this adaptation could be detrimental. Our methodology, therefore, first measures the similarity between domains in an innovative manner in order to determine whether or not it is appropriate to apply the adaptation process. The results reported in the experimentation, when evaluating up to 20 possible combinations among five different domains, show that our proposal successfully deals with the binarization of new document domains without the need for labeled data.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
This paper proposes a nonlinear edge-preserving diffusion equation with an adaptive source term for binarization of degraded document images. The role of nonlinear diffusion term is to smooth images ...with preservation of text edges and corners, while the source term is responsible for the desired binarization. Unlike other binarization techniques (such as clustering-based and threshold-based), the idea behind the proposed method is that a sequence of gradually binarized images is obtained by solving the evolution equation starting with the image to be binarized, and tends to the slightly smoothed version of the desired binary image at infinity. A semi-implicit parallel splitting-up method is developed for solving the proposed model effectively. The proposed model with algorithm is tested on the DIBCO (Document Image Binarization Competitions) series datasets. The results show that it has generally the best performance, compared to four PDE (partial differential equation)-based binarization models, and six recent and benchmark binarization algorithms (non-PDE based).
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•We propose a strict convex variational level set model for document binarization, and it has a unique global minimizer.•It is an initialization-flexible model, and the evolution termination ...condition can be set via level set function.•Our model can deal with many kinds of degradations, such as, low contrast, bleed-through, faint characters, texture.•Our model can achieve comparable or better performance. Moreover, it is efficient and robust to noise to some extent.
Document image binarization is a significant stage in the optical character recognition system. Different from previous binarization approaches, in this paper, we propose a novel and convex variational level set model for document image binarization. Our energy functional is comprised of two terms: data term and fidelity term. We prove that our model is strictly convex and has a unique global minimum solution, which enables us to set the evolution termination criterion by the normalized step difference energy that measures the convergence state of level set function. We experimentally demonstrate the advantage of the fidelity term and show the merit of level set initialization robustness. In addition, the convergence of the alternative minimization algorithm for solving our model is analyzed. Extensive experiments are conducted on JM dataset, representative degraded document images and DIBCO series datasets to evaluate our model qualitatively and quantitatively. The experiment results verify that our model is effective to deal with most kinds of degraded images. Compared with four evolution and six non-evolution based binarization methods, our model could achieve better or competitive performance in terms of four metrics.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Purpose: To compare the accuracy of manual and automated binarization technique for the analysis of choroidal vasculature. Methods: This retrospective study was performed on a total of 98 eyes of 60 ...healthy subjects. Fovea-centered swept source optical coherence tomography (SS-OCT) scans were obtained and choroidal area was binarized using manual and automated image binarization technique separately. Choroidal vessel visualization in the binarized scans were subjectively graded (grades 0-100) by comparing them with the original OCT scan images by two masked graders. The subjective variability and repeatability was compared between two binarization method groups. Intergrader and intragrader variability was estimated using paired t-test. The degree of agreement between the grades for each observer and between the observers was evaluated using Bland-Altman plot. Results: The mean accuracy grades of the automatically binarized images were significantly (P < 0.001) higher (93.38% ± 1.70%) than that of manually binarized images (78.06% ± 2.92%). There was a statistically significant variability and poor agreement between the mean interobserver grades in the manual binarization arm. Conclusion: Automated image binarization technique is faster and appears to be more accurate in comparison to the manual method.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•We propose a supervised binarization method based on the deep supervised networks.•The multi-scale deep supervised network for binarization has not been reported yet.•A hierarchical architecture is ...designed to distinguish text from background noises.•Different feature levels are dealt by the multi-scale architecture.•The performance results are considerably better than state-of-the-art methods.
The binarization of degraded document images is a challenging problem in terms of document analysis. Binarization is a classification process in which intra-image pixels are assigned to either of the two following classes: foreground text and background. Most of the algorithms are constructed on low-level features in an unsupervised manner, and the consequent disenabling of full utilization of input-domain knowledge considerably limits distinguishing of background noises from the foreground. In this paper, a novel supervised-binarization method is proposed, in which a hierarchical deep supervised network (DSN) architecture is learned for the prediction of the text pixels at different feature levels. With higher-level features, the network can differentiate text pixels from background noises, whereby severe degradations that occur in document images can be managed. Alternatively, foreground maps that are predicted at lower-level features present a higher visual quality at the boundary area. Compared with those of traditional algorithms, binary images generated by our architecture have cleaner background and better-preserved strokes. The proposed approach achieves state-of-the-art results over widely used DIBCO datasets, revealing the robustness of the presented method.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP