Low-light image enhancement methods based on classic Retinex model attempt to manipulate the estimated illumination and to project it back to the corresponding reflectance. However, the model does ...not consider the noise, which inevitably exists in images captured in low-light conditions. In this paper, we propose the robust Retinex model, which additionally considers a noise map compared with the conventional Retinex model, to improve the performance of enhancing low-light images accompanied by intensive noise. Based on the robust Retinex model, we present an optimization function that includes novel regularization terms for the illumination and reflectance. Specifically, we use ℓ 1 norm to constrain the piece-wise smoothness of the illumination, adopt a fidelity term for gradients of the reflectance to reveal the structure details in low-light images, and make the first attempt to estimate a noise map out of the robust Retinex model. To effectively solve the optimization problem, we provide an augmented Lagrange multiplier based alternating direction minimization algorithm without logarithmic transformation. Experimental results demonstrate the effectiveness of the proposed method in low-light image enhancement. In addition, the proposed method can be generalized to handle a series of similar problems, such as the image enhancement for underwater or remote sensing and in hazy or dusty conditions.
In this paper, we present a systematic review and evaluation of existing single-image low-light enhancement algorithms. Besides the commonly used low-level vision oriented evaluations, we ...additionally consider measuring machine vision performance in the low-light condition via face detection task to explore the potential of joint optimization of high-level and low-level vision enhancement. To this end, we first propose a large-scale low-light image dataset serving both low/high-level vision with diversified scenes and contents as well as complex degradation in real scenarios, called Vision Enhancement in the LOw-Light condition (VE-LOL). Beyond paired low/normal-light images without annotations, we additionally include the analysis resource related to human, i.e. face images in the low-light condition with annotated face bounding boxes. Then, efforts are made on benchmarking from the perspective of both human and machine visions. A rich variety of criteria is used for the low-level vision evaluation, including full-reference, no-reference, and semantic similarity metrics. We also measure the effects of the low-light enhancement on face detection in the low-light condition. State-of-the-art face detection methods are used in the evaluation. Furthermore, with the rich material of VE-LOL, we explore the novel problem of joint low-light enhancement and face detection. We develop an enhanced face detector to apply low-light enhancement and face detection jointly. The features extracted by the enhancement module are fed to the successive layer with the same resolution of the detection module. Thus, these features are intertwined together to unitedly learn useful information across two phases, i.e. enhancement and detection. Experiments on VE-LOL provide a comparison of state-of-the-art low-light enhancement algorithms, point out their limitations, and suggest promising future directions. Our dataset has supported the Track “Face Detection in Low Light Conditions” of CVPR UG2+ Challenge (2019–2020) (
http://cvpr2020.ug2challenge.org/
).
Noise causes unpleasant visual effects in low-light image/video enhancement. In this paper, we aim to make the enhancement model and method aware of noise in the whole process. To deal with heavy ...noise which is not handled in previous methods, we introduce a robust low-light enhancement approach, aiming at well enhancing low-light images/videos and suppressing intensive noise jointly. Our method is based on the proposed Low-Rank Regularized Retinex Model (LR3M), which is the first to inject low-rank prior into a Retinex decomposition process to suppress noise in the reflectance map. Our method estimates a piece-wise smoothed illumination and a noise-suppressed reflectance sequentially, avoiding remaining noise in the illumination and reflectance maps which are usually presented in alternative decomposition methods. After getting the estimated illumination and reflectance, we adjust the illumination layer and generate our enhancement result. Furthermore, we apply our LR3M to video low-light enhancement. We consider inter-frame coherence of illumination maps and find similar patches through reflectance maps of successive frames to form the low-rank prior to make use of temporal correspondence. Our method performs well for a wide variety of images and videos, and achieves better quality both in enhancing and denoising, compared with the state-of-the-art methods.
In this paper, we address the problem of video rain removal by considering rain occlusion regions, i.e., very low light transmittance for rain streaks. Different from additive rain streaks, in such ...occlusion regions, the details of backgrounds are completely lost. Therefore, we propose a hybrid rain model to depict both rain streaks and occlusions. Integrating the hybrid model and useful motion segmentation context information, we present a Dynamic Routing Residue Recurrent Network (D3R-Net). D3R-Net first extracts the spatial features by a residual network. Then, the spatial features are aggregated by recurrent units along the temporal axis. In the temporal fusion, the context information is embedded into the network in a "dynamic routing" way. A heap of recurrent units takes responsibility for handling the temporal fusion in given contexts, e.g., rain or non-rain regions. In the certain forward and backward processes, one of these recurrent units is mainly activated. Then, a context selection gate is employed to detect the context and select one of these temporally fused features generated by these recurrent units as the final fused feature. Finally, this last feature plays a role of "residual feature." It is combined with the spatial feature and then used to reconstruct the negative rain streaks. In such a D3R-Net, we incorporate motion segmentation, which denotes whether a pixel belongs to fast moving edges or not, and rain type indicator, indicating whether a pixel belongs to rain streaks, rain occlusions, and non-rain regions, as the context variables. Extensive experiments on a series of synthetic and real videos with rain streaks verify not only the superiority of the proposed method over state of the art but also the effectiveness of our network design and its each component.
In this paper, we consider the image super-resolution (SR) problem. The main challenge of image SR is to recover high-frequency details of a low-resolution (LR) image that are important for human ...perception. To address this essentially ill-posed problem, we introduce a Deep Edge Guided REcurrent rEsidual (DEGREE) network to progressively recover the high-frequency details. Different from most of the existing methods that aim at predicting high-resolution (HR) images directly, the DEGREE investigates an alternative route to recover the difference between a pair of LR and HR images by recurrent residual learning. DEGREE further augments the SR process with edge-preserving capability, namely the LR image and its edge map can jointly infer the sharp edge details of the HR image during the recurrent recovery process. To speed up its training convergence rate, by-pass connections across the multiple layers of DEGREE are constructed. In addition, we offer an understanding on DEGREE from the view-point of sub-band frequency decomposition on image signal and experimentally demonstrate how the DEGREE can recover different frequency bands separately. Extensive experiments on three benchmark data sets clearly demonstrate the superiority of DEGREE over the well-established baselines and DEGREE also provides new state-of-the-arts on these data sets. We also present addition experiments for JPEG artifacts reduction to demonstrate the good generality and flexibility of our proposed DEGREE network to handle other image processing tasks.
In this paper, we address a rain removal problem from a single image, even in the presence of large rain streaks and rain streak accumulation (where individual streaks cannot be seen and thus are ...visually similar to mist or fog). For rain streak removal, the mismatch problem between different streak sizes in training and testing phases leads to poor performance, especially when there are large streaks. To mitigate this problem, we embed a hierarchical representation of wavelet transform into a recurrent rain removal process: 1) rain removal on the low-frequency component and 2) recurrent detail recovery on high-frequency components under the guidance of the recovered low-frequency component. Benefiting from the recurrent multi-scale modeling of wavelet transform-like design, the proposed network trained on streaks with one size can adapt to those with larger sizes, which significantly favors real rain streak removal. The dilated residual dense network is used as the basic model of the recurrent recovery process. The network includes multiple paths with different receptive fields, thus it can make full use of multi-scale redundancy and utilize context information in large regions. Furthermore, to handle heavy rain cases where rain streak accumulation is presented, we construct a detail appearing rain accumulation removal to not only improve the visibility but also enhance the details in dark regions. The evaluation of both synthetic and real images, particularly on those containing large rain streaks and heavy accumulation, shows the effectiveness of our novel models, which significantly outperforms the state-of-the-art methods.
Face recognition techniques have been developed significantly in recent years. However, recognizing faces with partial occlusion is still challenging for existing face recognizers, which is heavily ...desired in real-world applications concerning surveillance and security. Although much research effort has been devoted to developing face de-occlusion methods, most of them can only work well under constrained conditions, such as all of faces are from a pre-defined closed set of subjects. In this paper, we propose a robust LSTM-Autoencoders (RLA) model to effectively restore partially occluded faces even in the wild. The RLA model consists of two LSTM components, which aims at occlusion-robust face encoding and recurrent occlusion removal respectively. The first one, named multi-scale spatial LSTM encoder, reads facial patches of various scales sequentially to output a latent representation, and occlusion-robustness is achieved owing to the fact that the influence of occlusion is only upon some of the patches. Receiving the representation learned by the encoder, the LSTM decoder with a dual channel architecture reconstructs the overall face and detects occlusion simultaneously, and by feat of LSTM, the decoder breaks down the task of face de-occlusion into restoring the occluded part step by step. Moreover, to minimize identify information loss and guarantee face recognition accuracy over recovered faces, we introduce an identity-preserving adversarial training scheme to further improve RLA. Extensive experiments on both synthetic and real data sets of faces with occlusion clearly demonstrate the effectiveness of our proposed RLA in removing different types of facial occlusion at various locations. The proposed method also provides significantly larger performance gain than other de-occlusion methods in promoting recognition performance over partially-occluded faces.
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. ...That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts1. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.
Fractional interpolation is used to provide sub-pixel level references for motion compensation in the interprediction of video coding, which attempts to remove temporal redundancy in video sequences. ...Traditional handcrafted fractional interpolation filters face the challenge of modeling discontinuous regions in videos, while existing deep learning-based methods are either designed for a single quantization parameter (QP), only generating half-pixel samples, or need to train a model for each sub-pixel position. In this paper, we present a one-for-all fractional interpolation method based on a grouped variation convolutional neural network (GVCNN). Our method can deal with video frames coded using different QPs and is capable of generating all sub-pixel positions at one sub-pixel level. Also, by predicting variations between integer-position pixels and sub-pixels, our network offers more expressive power. Moreover, we perform specific measurements in training data generation to simulate practical situations in video coding, including blurring the down-sampled sub-pixel samples to avoid aliasing effects and coding integer pixels to simulate reconstruction errors. In addition, we analyze the impact of the size of blur kernels theoretically. Experimental results verify the efficiency of GVCNN. Compared with HEVC, our method achieves 2.2% in bit saving on average and up to 5.2% under low-delay P configuration.
In this paper, we address a rain removal problem from a single image, even in the presence of heavy rain and rain streak accumulation. Our core ideas lie in our new rain image model and new deep ...learning architecture. We add a binary map that provides rain streak locations to an existing model, which comprises a rain streak layer and a background layer. We create a model consisting of a component representing rain streak accumulation (where individual streaks cannot be seen, and thus visually similar to mist or fog), and another component representing various shapes and directions of overlapping rain streaks, which usually happen in heavy rain. Based on the model, we develop a multi-task deep learning architecture that learns the binary rain streak map, the appearance of rain streaks, and the clean background, which is our ultimate output. The additional binary map is critically beneficial, since its loss function can provide additional strong information to the network. To handle rain streak accumulation (again, a phenomenon visually similar to mist or fog) and various shapes and directions of overlapping rain streaks, we propose a recurrent rain detection and removal network that removes rain streaks and clears up the rain accumulation iteratively and progressively. In each recurrence of our method, a new contextualized dilated network is developed to exploit regional contextual information and to produce better representations for rain detection. The evaluation on real images, particularly on heavy rain, shows the effectiveness of our models and architecture.