•We propose a novel iteration deep learning which can improve the input image iteratively.•We apply the proposed iterative deep learning for document enhancement and binarization in two possible ...ways: recurrent refinement and stacked refinement.•Our proposed method provides a new, clean version of the degraded image, one that is suitable for visualization and which shows promising results for binarization using Otsu’s global threshold.
This paper presents a novel iterative deep learning framework and applies it to document enhancement and binarization. Unlike the traditional methods that predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce uniform images of the degraded input images, which in turn allows the network to refine the output iteratively. Two different iterative methods have been studied in this paper: recurrent refinement (RR) that uses the same trained neural network in each iteration for document enhancement and stacked refinement (SR) that uses a stack of different neural networks for iterative output refinement. Given the learned nature of the uniform and enhanced image, the binarization map can be easily obtained through use of a global or local threshold. The experimental results on several public benchmark data sets show that our proposed method provides a new, clean version of the degraded image, one that is suitable for visualization and which shows promising results for binarization using Otsu’s global threshold, based on enhanced images learned iteratively by the neural network.
•A selectional autoencoder approach for document image binarization is studied.•The neural network is devoted to learning an image-to-image binarization.•Comprehensive experimentation with datasets ...of different typology is presented.•Results demonstrate that the approach is able to outperform the state of the art.
Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of document analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a global threshold. This approach has proven to outperform existing binarization strategies in a number of document types.
The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance ...their image. Image binarization is the process of producing a monochromatic image having its color version as input. It is a key step in the document processing pipeline. The recent Quality-Time Binarization Competitions for documents have shown that no binarization algorithm is good for any kind of document image. This paper uses a sample of the texture of the scanned historical documents as the main document feature to select which of the 63 widely used algorithms, using five different versions of the input images, totaling 315 document image-binarization schemes, provides a reasonable quality-time trade-off.
•Hardware implementation of video compression and decompression processes.•Design and FPGA implementation of improved high throughput HEVC CABAC binarizer.•Design and FPGA implementation of improved ...high throughput HEVC CABAC De-binarizer.•New architectures of CABAC Binarization and de-binarization in H.265 video codec.
High efficiency video coding (HEVC) video codec applies different techniques in order to achieve high compression ratios and video quality that supports real-time applications. One of the critical techniques in HEVC is the Context adaptive Binary Arithmetic Coding (CABAC) which is type of entropy coding. CABAC comes at the cost of increased computational complexity, especially for parallelization and pipeline of these blocks: binarization, context modeling and binary arithmetic encoding. The Binarization (BZ) and de-Binarization (DBZ) methods are considered as important techniques in HEVC CABAC encoder and decoder respectively. Indeed, an important goal is to get high throughput in hardware architectures of CABAC BZ and DBZ in order to achieve high resolution applications. This work is the only one found on recent literature which focuses on design and implementation of full BZ and full DBZ compatible with H.265 and H.264. Consequently, a hardware architectures of BZ and DBZ are designed and implemented by using VHDL language, targeted an FPGA virtex4 xc4vsx25-12ff668 board and emulated with ModelSim. As a result, the implementation of BZ and DBZ can process 2 bins/cycle for each syntax element when operated at 697.83 MHz and 789.26 MHz, respectively. The proposed designs exhibits an improved high-throughput of 1395.66 Mbins/s for BZ and 1578.52 Mbins/s for the DBZ. The obtained Area Efficiencies in our proposed BZ and DBZ are about 0.544 Mbins/s/slices and 0.606 Mbins/s/slices, respectively, and it is better than many recent works.
One of the most fundamental issues in image processing is the thresholding (binarization) method. This method is generally used for segmenting regions with different homogeneity in grayscale images. ...In other words, it performs clustering based on the intensity levels of pixels in an image histogram. This paper presents a new and effective approach to the global thresholding method of grayscale images. In the proposed method, alpha and beta regions are determined using the mean and standard deviation values of an image histogram. The optimum threshold value is obtained by calculating the average of gray-scale values of the alpha and beta regions. The experiments were carried out on three different image sets to demonstrate the effectiveness of the thresholding method. The result of experimental studies show that the proposed method achieves promising performance compared to many traditional, state-of-the-art thresholding and document binarization methods performed in H-DIBCO'14 (Document Image Binarization Competition), Human HT29 colon-cancer cells (BBBC008) and C. elegans live/dead assay (BBBC010) datasets based on various evaluation criteria.
Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end ...framework named document enhancement generative adversarial networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017, and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems.
•A novel document image binarization method based on the structural symmetric pixels (SSPs) is presented.•The adaptive gradient binarization algorithm for degraded document images is effective.•The ...new iterative stroke width estimation algorithm takes no parameters and is robust against noise.•Multiple local thresholds are used to increase the binarization accuracy through a voting strategy.•Extensive experiments on various datasets show that our method is robust and effective.
This paper presents an effective approach for the local threshold binarization of degraded document images. We utilize the structural symmetric pixels (SSPs) to calculate the local threshold in neighborhood and the voting result of multiple thresholds will determine whether one pixel belongs to the foreground or not. The SSPs are defined as the pixels around strokes whose gradient magnitudes are large enough and orientations are symmetric opposite. The compensated gradient map is used to extract the SSP so as to weaken the influence of document degradations. To extract SSP candidates with large magnitudes and distinguish the faint characters and bleed-through background, we propose an adaptive global threshold selection algorithm. To further extract pixels with opposite orientations, an iterative stroke width estimation algorithm is applied to ensure the proper size of neighborhood used in orientation judgement. At last, we present a multiple threshold vote based framework to deal with some inaccurate detections of SSP. The experimental results on seven public document image binarization datasets show that our method is accurate and robust compared with many traditional and state-of-the-art document binarization approaches based on multiple evaluation measures.
To compare the agreement between optical coherence tomography (OCT)-based choroidal vascularity markers measured by two previously reported image binarization techniques.
Spectral-domain OCT using ...enhanced-depth imaging was performed in 100 eyes from 52 normal subjects. Choroidal images were binarized to luminal area and stromal area using two different algorithms. Choroidal vascularity marker was defined as the ratio of luminal area to total choroidal area and they were termed "luminal/choroidal area ratio (L/C ratio)" and "choroidal vascularity index (CVI)" per the algorithm. The agreement between choroidal vascularity markers measured by the two techniques was compared using intraclass correlation coefficient (ICC) and Bland-Altman analysis.
The mean values of choroidal vascularity markers were 70.12% (range, 56.76%-78.55%) for CVI and 67.44% (range, 51.09%-81.31%) for L/C ratio. Low level of absolute agreement between the two binarization techniques was reflected by an adjusted ICC of 0.353 using linear mixed model with age, sex, and spherical equivalent as covariates.
There was discrepancy between measurements of choroidal vascularity using two commonly adopted image binarization techniques. It remained unclear what was the true choroidal vascularity and which binarization algorithm was more accurate. Future studies with enhanced image quality and improved image analysis algorithm are required to decipher the ground truth for choroidal vascularity.