Deep learning on image denoising: An overview Tian, Chunwei; Fei, Lunke; Zheng, Wenxian ...
Neural networks,
November 2020, 2020-Nov, 2020-11-00, 20201101, Letnik:
131
Journal Article
Recenzirano
Odprti dostop
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image ...denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analyses. Finally, we point out some potential challenges and directions of future research.
Visible-infrared person re-identification (VI-ReID) is an emerging and challenging cross-modality image matching problem because of the explosive surveillance data in night-time surveillance ...applications. To handle the large modality gap, various generative adversarial network models have been developed to eliminate the cross-modality variations based on a cross-modal image generation framework. However, the lack of point-wise cross-modality ground-truths makes it extremely challenging to learn such a cross-modal image generator. To address these problems, we learn the correspondence between single-channel infrared images and three-channel visible images by generating intermediate grayscale images as auxiliary information to colorize the single-modality infrared images. We propose a grayscale enhancement colorization network (GECNet) to bridge the modality gap by retaining the structure of the colored image which contains rich information. To simulate the infrared-to-visible transformation, the point-wise transformed grayscale images greatly enhance the colorization process. Our experiments conducted on two visible-infrared cross-modality person re-identification datasets demonstrate the superiority of the proposed method over the state-of-the-arts.
Rainy weather is a challenge for many vision-oriented tasks ( e.g. , object detection and segmentation), which causes performance degradation. Image deraining is an effective solution to avoid ...performance drop of downstream vision tasks. However, most existing deraining methods either fail to produce satisfactory restoration results or cost too much computation. In this work, considering both effectiveness and efficiency of image deraining, we propose a progressive coupled network (PCNet) to well separate rain streaks while preserving rain-free details. To this end, we investigate the blending correlations between them and particularly devise a novel coupled representation module (CRM) to learn the joint features and the blending correlations. By cascading multiple CRMs, PCNet extracts the hierarchical features of multi-scale rain streaks, and separates the rain-free content and rain streaks progressively. To promote computation efficiency, we employ depth-wise separable convolutions and a U-shaped structure, and construct CRM in an asymmetric architecture to reduce model parameters and memory footprint. Extensive experiments are conducted to evaluate the efficacy of the proposed PCNet in two aspects: (1) image deraining on several synthetic and real-world rain datasets and (2) joint image deraining and downstream vision tasks ( e.g. , object detection and segmentation). Furthermore, we show that the proposed CRM can be easily adopted to similar image restoration tasks including image dehazing and low-light enhancement with competitive performance. The source code is available at https://github.com/kuijiang0802/PCNet .
Visible-infrared person re-identification (VI-ReID) is a cross-modality retrieval problem, which aims at matching the same pedestrian between the visible and infrared cameras. Due to the existence of ...pose variation, occlusion, and huge visual differences between the two modalities, previous studies mainly focus on learning image-level shared features. Since they usually learn a global representation or extract uniformly divided part features, these methods are sensitive to misalignments. In this paper, we propose a structure-aware positional transformer (SPOT) network to learn semantic-aware sharable modality features by utilizing the structural and positional information. It consists of two main components: attended structure representation (ASR) and transformer-based part interaction (TPI). Specifically, ASR models the modality-invariant structure feature for each modality and dynamically selects the discriminative appearance regions under the guidance of the structure information. TPI mines the part-level appearance and position relations with a transformer to learn discriminative part-level modality features. With a weighted combination of ASR and TPI, the proposed SPOT explores the rich contextual and structural information, effectively reducing cross-modality difference and enhancing the robustness against misalignments. Extensive experiments indicate that SPOT is superior to the state-of-the-art methods on two cross-modal datasets. Notably, the Rank-1/mAP value on the SYSU-MM01 dataset has improved by 8.43%/6.80%.
Rain removal from a video is a challenging problem and has been recently investigated extensively. Nevertheless, the problem of rain removal from a single image was rarely studied in the literature, ...where no temporal information among successive images can be exploited, making the problem very challenging. In this paper, we propose a single-image-based rain removal framework via properly formulating rain removal as an image decomposition problem based on morphological component analysis. Instead of directly applying a conventional image decomposition technique, the proposed method first decomposes an image into the low- and high-frequency (HF) parts using a bilateral filter. The HF part is then decomposed into a "rain component" and a "nonrain component" by performing dictionary learning and sparse coding. As a result, the rain component can be successfully removed from the image while preserving most original image details. Experimental results demonstrate the efficacy of the proposed algorithm.
Transformer-based method has demonstrated promising performance in image super-resolution tasks, due to its long-range and global aggregation capability. However, the existing Transformer brings two ...critical challenges for applying it in large-area earth observation scenes: (1) redundant token representation due to most irrelevant tokens; (2) single-scale representation which ignores scale correlation modeling of similar ground observation targets. To this end, this paper proposes to adaptively eliminate the interference of irreverent tokens for a more compact self-attention calculation. Specifically, we devise a Residual Token Selective Group (RTSG) to grasp the most crucial token by dynamically selecting the top- k keys in terms of score ranking for each query. For better feature aggregation, a Multi-scale Feed-forward Layer (MFL) is developed to generate an enriched representation of multi-scale feature mixtures during feed-forward process. Moreover, we also proposed a Global Context Attention (GCA) to fully explore the most informative components, thus introducing more inductive bias to the RTSG for an accurate reconstruction. In particular, multiple cascaded RTSGs form our final Top- k Token Selective Transformer (TTST) to achieve progressive representation. Extensive experiments on simulated and real-world remote sensing datasets demonstrate our TTST could perform favorably against state-of-the-art CNN-based and Transformer-based methods, both qualitatively and quantitatively. In brief, TTST outperforms the state-of-the-art approach (HAT-L) in terms of PSNR by 0.14 dB on average, but only accounts for 47.26% and 46.97% of its computational cost and parameters. The code and pre-trained TTST will be available at https://github.com/XY-boy/TTST for validation.
Online image hashing aims to update hash functions on-the-fly along with newly arriving data streams, which has found broad applications in computer vision and beyond. To this end, most existing ...methods update hash functions simply using discrete labels or pairwise similarity to explore intra-class relationships, which, however, often deteriorates search performance when facing a domain gap or semantic shift. One reason is that they ignore the particular semantic relationships among different classes, which should be taken into account in updating hash functions. Besides, the common characteristics between the label vectors (can be regarded as a sort of binary codes) and to-be-learned binary hash codes have left unexploited. In this paper, we present a novel online hashing method, termed Similarity Preserving Linkage Hashing (SPLH), which not only utilizes pairwise similarity to learn the intra-class relationships, but also fully exploits a latent linkage space to capture the inter-class relationships and the common characteristics between label vectors and to-be-learned hash codes. Specifically, SPLH first maps the independent discrete label vectors and binary hash codes into a linkage space, through which the relative semantic distance between data points can be assessed precisely. As a result, the pairwise similarities within the newly arriving data stream are exploited to learn the latent semantic space to benefit binary code learning. To learn the model parameters effectively, we further propose an alternating optimization algorithm. Extensive experiments conducted on three widely-used datasets demonstrate the superior performance of SPLH over several state-of-the-art online hashing methods.
Deep convolutional neural networks (CNNs) have been popularly adopted in image super-resolution (SR). However, deep CNNs for SR often suffer from the instability of training, resulting in poor image ...SR performance. Gathering complementary contextual information can effectively overcome the problem. Along this line, we propose a coarse-to-fine SR CNN (CFSRCNN) to recover a high-resolution (HR) image from its low-resolution version. The proposed CFSRCNN consists of a stack of feature extraction blocks (FEBs), an enhancement block (EB), a construction block (CB) and, a feature refinement block (FRB) to learn a robust SR model. Specifically, the stack of FEBs learns the long- and short-path features, and then fuses the learned features by expending the effect of the shallower layers to the deeper layers to improve the representing power of learned features. A compression unit is then used in each FEB to distill important information of features so as to reduce the number of parameters. Subsequently, the EB utilizes residual learning to integrate the extracted features to prevent from losing edge information due to repeated distillation operations. After that, the CB applies the global and local LR features to obtain coarse features, followed by the FRB to refine the features to reconstruct a high-resolution image. Extensive experiments demonstrate the high efficiency and good performance of our CFSRCNN model on benchmark datasets compared with state-of-the-art SR models. The code of CFSRCNN is accessible on https://github.com/hellloxiaotian/CFSRCNN .
Saliency detection plays important roles in many image processing applications, such as regions of interest extraction and image resizing. Existing saliency detection models are built in the ...uncompressed domain. Since most images over Internet are typically stored in the compressed domain such as joint photographic experts group (JPEG), we propose a novel saliency detection model in the compressed domain in this paper. The intensity, color, and texture features of the image are extracted from discrete cosine transform (DCT) coefficients in the JPEG bit-stream. Saliency value of each DCT block is obtained based on the Hausdorff distance calculation and feature map fusion. Based on the proposed saliency detection model, we further design an adaptive image retargeting algorithm in the compressed domain. The proposed image retargeting algorithm utilizes multioperator operation comprised of the block-based seam carving and the image scaling to resize images. A new definition of texture homogeneity is given to determine the amount of removal block-based seams. Thanks to the directly derived accurate saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the visually important regions for images, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the in-depth analysis in the extensive experiments.
A Video Saliency Detection Model in Compressed Domain YUMING FANG; WEISI LIN; ZHENZHONG CHEN ...
IEEE transactions on circuits and systems for video technology,
2014-Jan., 2014, 2014-01-00, Letnik:
24, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Saliency detection is widely used to extract regions of interest in images for various image processing applications. Recently, many saliency detection models have been proposed for video in ...uncompressed (pixel) domain. However, video over Internet is always stored in compressed domains, such as MPEG2, H.264, and MPEG4 Visual. In this paper, we propose a novel video saliency detection model based on feature contrast in compressed domain. Four types of features including luminance, color, texture, and motion are extracted from the discrete cosine transform coefficients and motion vectors in video bitstream. The static saliency map of unpredicted frames (I frames) is calculated on the basis of luminance, color, and texture features, while the motion saliency map of predicted frames (P and B frames) is computed by motion feature. A new fusion method is designed to combine the static saliency and motion saliency maps to get the final saliency map for each video frame. Due to the directly derived features in compressed domain, the proposed model can predict the salient regions efficiently for video frames. Experimental results on a public database show superior performance of the proposed video saliency detection model in compressed domain.