Due to the unavailability of large-scale underwater depth image datasets and ill-posed problems, underwater single-image depth prediction is a challenging task. An unambiguous depth prediction for ...single underwater image is an essential part of applications like underwater robotics, marine engineering, and so on. This article presents an end-to-end underwater generative adversarial network (UW-GAN) for depth estimation from an underwater single image. Initially, a coarse-level depth map is estimated using the underwater coarse-level generative network (UWC-Net). Then, a fine-level depth map is computed using the underwater fine-level network (UWF-Net) which takes input as the concatenation of the estimated coarse-level depth map and the input image. The proposed UWF-Net composes of spatial and channel-wise squeeze and excitation block for fine-level depth estimation. Also, we propose a synthetic underwater image generation approach for large-scale database. The proposed network is tested on real-world and synthetic underwater datasets for its performance analysis. We also perform a complete evaluation of the proposed UW-GAN on underwater images having different color domination, contrast, and lighting conditions. Presented UW-GAN framework is also investigated for underwater single-image enhancement. Extensive result analysis proves the superiority of proposed UW-GAN over the state-of-the-art (SoTA) hand-crafted, and learning-based approaches for underwater single-image depth estimation (USIDE) and enhancement.
Unlike prevalent facial expressions, micro expressions have subtle, involuntary muscle movements which are short-lived in nature. These minute muscle movements reflect true emotions of a person. Due ...to the short duration and low intensity, these micro-expressions are very difficult to perceive and interpret correctly. In this paper, we propose the dynamic representation of micro-expressions to preserve facial movement information of a video in a single frame. We also propose a Lateral Accretive Hybrid Network (LEARNet) to capture micro-level features of an expression in the facial region. The LEARNet refines the salient expression features in accretive manner by incorporating accretion layers (AL) in the network. The response of the AL holds the hybrid feature maps generated by prior laterally connected convolution layers. Moreover, LEARNet architecture incorporates the cross decoupled relationship between convolution layers which helps in preserving the tiny but influential facial muscle change information. The visual responses of the proposed LEARNet depict the effectiveness of the system by preserving both high- and micro-level edge features of facial expression. The effectiveness of the proposed LEARNet is evaluated on four benchmark datasets: CASME-I, CASME-II, CAS(ME)^2 and SMIC. The experimental results after investigation show a significant improvement of 4.03%, 1.90%, 1.79% and 2.82% as compared with ResNet on CASME-I, CASME-II, CAS(ME)^2 and SMIC datasets respectively.
Underwater image restoration is a challenging problem due to the multiple distortions. Degradation in the information is mainly due to the 1) light scattering effect 2) wavelength dependent color ...attenuation and 3) object blurriness effect. In this letter, we propose a novel end-to-end deep network for underwater image restoration. The proposed network is divided into two parts viz. channel-wise color feature extraction module and dense-residual feature extraction module. A custom loss function is proposed, which preserves the structural details and generates the true edge information in the restored underwater scene. Also, to train the proposed network for underwater image enhancement, a new synthetic underwater image database is proposed. Existing synthetic underwater database images are characterized by light scattering and color attenuation distortions. However, object blurriness effect is ignored. We, on the other hand, introduced the blurring effect along with the light scattering and color attenuation distortions. The proposed network is validated for underwater image restoration task on real-world underwater images. Experimental analysis shows that the proposed network is superior than the existing state-of-the-art approaches for underwater image restoration.
There has been a considerable gap between the recent high-resolution display technologies and the short storage of its content. However, most of the existing restoration methods are restricted by ...local convolution operations and equal treatment of the diverse information in degraded image. These approaches being degradation-specific employ the same rigid spatial processing across different images ultimately resulting in high memory consumption. For overcoming this limitation we propose Con-Net, a network design capable of exploiting the non-uniformities of the degradations in spatial-domain with limited number of parameters (656k). Our proposed Con-Net comprises of basically two main components, (1) a spatial-degradation aware network for extracting the diverse information inherent in any degraded image, and (2) a holistic attention refinement network for exploiting the knowledge from the degradation aware network to selectively restore the degraded pixels. In a nutshell, our proposed method is generalizable for three applications: image denoising, super-resolution and real-world low-light enhancement. Extensive qualitative and quantitative comparison with prior arts on 8 benchmark datasets demonstrates the efficacy of our proposed Con-Net over existing state-of-the-art degradation-specific architectures, by huge parameter and FLOPs reduction in all the three tasks.
Image inpainting is one of the most important and widely used approaches where input image is synthesized at the missing regions. This has various applications like undesired object removal, virtual ...garment shopping, etc . The methods used for image inpainting may use the knowledge of hole locations to effectively regenerate contents in an image. Existing image inpainting methods give astonishing results with coarse-to-fine architectures or with use of guided information like edges, structures, etc . The coarse-to-fine architectures require umpteen resources leading to high computation cost of the architecture. Other methods with edge or structural information depend on the available models to generate guiding information for inpainting. In this context, we have proposed computationally efficient, light-weight network for image inpainting with very less number of parameters (0.97M) and without any guided information. The proposed architecture consists of the multi-encoder level feature fusion module, pseudo decoder and regeneration decoder. The encoder multi level feature fusion module extracts relevant information from each of the encoder levels to merge structural and textural information from various receptive fields. This information is then processed with pseudo decoder followed by space depth correlation module to assist regeneration decoder for inpainting task. The experiments are performed with different types of masks and compared with the state-of-the-art methods on three benchmark datasets i.e ., Paris Street View (PARIS_SV) 1, Places2 2 and CelebA_HQ 3, 4. Along with this, the proposed network is tested on high resolution images (1024 × 1024 and 2048 × 2048) and compared with the existing methods. The extensive comparison with state-of-the-art methods, computational complexity analysis, and ablation study prove the effectiveness of the proposed framework for image inpainting.
Image inpainting is a reconstruction method, where a corrupted image consisting of holes is filled with the most relevant contents from the valid region of an image. To inpaint an image, we have ...proposed a lightweight cascaded architecture with 2.5 M parameters consisting of encoder feature aggregation block (FAB) with decoder feature sharing (DFS) inpainting network followed by a refinement network. Initially, the FAB with DFS (inpainting) generator network is proposed which comprises of multi-level feature aggregation mechanism and feature sharing decoder. The FAB makes use of multi-scale spatial channel-wise attention to fuse weighted features from all the encoder levels. The DFS reconstructs the inpainted image with multi-scale and multi-receptive feature sharing in order to inpaint the image with smaller to larger hole regions effectively. Further, the refinement generator network is proposed for refining the inpainted image from the inpainting generator network. The effectiveness of proposed architecture is verified on CelebA-HQ, Paris Street View (PARIS_SV) and Places2 datasets corrupted using publicly available NVIDIA mask dataset. Extensive result analysis with detailed ablation study prove the robustness of the proposed architecture over state-of-the-art methods for image inpainting.
Depth prediction from single image is a challenging task due to the intra scale ambiguity and unavailability of prior information. The prediction of an unambiguous depth from single RGB image is very ...important aspect for computer vision applications. In this paper, an end-to-end sparse-to-dense network (S2DNet) is proposed for single image depth estimation (SIDE). The proposed network processes single image along with the additional sparse depth samples for depth estimation. The additional sparse depth sample are acquired either with a low-resolution depth sensor or calculated by visual simultaneous localization and mapping (SLAM) algorithms. In first stage, the proposed S2DNet estimates coarse-level depth map using sparse-to-dense coarse network (S2DCNet). In second stage, the estimated coarse-level depth map is concatenated with the input image and used as an input to the sparse-to-dense fine network (S2DFNet) for fine-level depth map estimation. The proposed S2DFNet comprises of attention map architecture which helps to estimate the prominent depth information. The quantitative and qualitative performance evaluation of the proposed network has been carried out using the error metrics. We perform complete evaluation of S2DNet on four publically available benchmark data sets i.e. NYU Depth-V2 indoor dataset <xref ref-type="bibr" rid="ref1">1 , KITTI odometry outdoor dataset <xref ref-type="bibr" rid="ref2">2 , KITTI depth completion test database <xref ref-type="bibr" rid="ref3">3 and SUN-RGB database <xref ref-type="bibr" rid="ref4">4 . Further, we have extended the proposed S2DNet for image de-hazing. The experimental analysis shows that the proposed S2DNet outperforms the existing state-of-the-art methods for both single image depth estimation and image de-hazing.
In this paper, a new image indexing and retrieval algorithm using local mesh patterns are proposed for biomedical image retrieval application. The standard local binary pattern encodes the ...relationship between the referenced pixel and its surrounding neighbors, whereas the proposed method encodes the relationship among the surrounding neighbors for a given referenced pixel in an image. The possible relationships among the surrounding neighbors are depending on the number of neighbors, P. In addition, the effectiveness of our algorithm is confirmed by combining it with the Gabor transform. To prove the effectiveness of our algorithm, three experiments have been carried out on three different biomedical image databases. Out of which two are meant for computer tomography (CT) and one for magnetic resonance (MR) image retrieval. It is further mentioned that the database considered for three experiments are OASIS-MRI database, NEMA-CT database, and VIA/I-ELCAP database which includes region of interest CT images. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP, LBP with Gabor transform, and other spatial and transform domain methods.
Given a degraded low-resolution input image, super-resolution (SR) aims at restoring the lost textures and structures and generating high-resolution image content. Significant advances in image ...super-resolution have been made lately, dominated by convolutional neural networks (CNNs). The top performing CNN-based SR networks typically employ very deep models for embracing the benefits of generating spatially precise results, but at the cost of loss of long-term contextual information. Additionally, state-of-the-art (SOTA) methods generally lack in maintaining the balance between spatial details and contextual information, which is the basic requirement for exhibiting superior performance in SR task. For restoration application like SR, the overall network generally demands efficient preservation of low-frequency information and reconstruction of high-frequency details. Thus, our work presents a novel architecture with the holistic objective of maintaining spatially-precise representation by collecting contextual content and restoring multi-frequency information throughout the network. Our proposed model learns an enriched set of features, that besides combining contextual information from multiple scales simultaneously preserves the high-resolution spatial details. The core of our approach is a novel non-local and local attention (NLLA) block which focuses on (1) learning enriched features by collecting information from multiple scales, (2) simultaneously handling the different frequency information, and (3) effectively fusing the relevant low-frequency and high-frequency information by ignoring the redundant features. Additionally, for effectively mapping the low-resolution features to high-resolution, we propose a novel aggregated attentive up-sampler (AAU) block that attentively learns the weights to up-sample the refined low-resolution feature maps to high-resolution output. Extensive experiments on the benchmark SR datasets demonstrate that the proposed method achieves appealing performance, both qualitatively and quantitatively.
Presence of rainy artifacts severely degrade the overall visual quality of a video and tend to overlap with the useful information present in the video frames. This degraded video affects the ...effectiveness of many automated applications like traffic monitoring, surveillance, etc. As video deraining is a pre-processing step for automated applications, it is highly demanded to have a lightweight deraining module. Therefore, in this paper, a "Progressive Subtractive Recurrent Lightweight Network" is proposed for video deraining. Initially, the Multi-Kernel feature Sharing Residual Block (MKSRB) is designed to learn different sizes of rain streaks which facilitates the complete removal of rain streaks through progressive subtractions. These MKSRB features are merged with previous frame output recurrently to maintain the temporal consistency. Further, multi-receptive feature subtraction is performed through Multi-scale Multi-Receptive Difference Block (MMRDB) to avoid loss of details and extract high-frequency information. Finally, progressively learned features through MKSRB and recurrent feature merging are aggregated with fused MMRDB features which outputs the rain-free frame. Substantial experiments on prevailing synthetic datasets and real-world videos verify the superior performance of the proposed method over the existing state-of-the-art methods for video deraining.