Convolutional neural network (CNN) is well known for its capability of feature learning and has made revolutionary achievements in many applications, such as scene recognition and target detection. ...In this paper, its capability of feature learning in hyperspectral images is explored by constructing a five-layer CNN for classification (C-CNN). The proposed C-CNN is constructed by including recent advances in deep learning area, such as batch normalization, dropout, and parametric rectified linear unit (PReLU) activation function. In addition, both spatial context and spectral information are elegantly integrated into the C-CNN such that spatial-spectral features are learned for hyperspectral images. A companion feature-learning CNN (FL-CNN) is constructed by extracting fully connected feature layers in this C-CNN. Both supervised and unsupervised modes are designed for the proposed FL-CNN to learn sensor-specific spatial-spectral features. Extensive experimental results on four benchmark data sets from two well-known hyperspectral sensors, namely airborne visible/infrared imaging spectrometer (AVIRIS) and reflective optics system imaging spectrometer (ROSIS) sensors, demonstrate that our proposed C-CNN outperforms the state-of-the-art CNN-based classification methods, and its corresponding FL-CNN is very effective to extract sensor-specific spatial-spectral features for hyperspectral applications under both supervised and unsupervised modes.
Light field (LF) photography is an emerging paradigm for capturing more immersive representations of the real world. However, arising from the inherent tradeoff between the angular and spatial ...dimensions, the spatial resolution of LF images captured by commercial micro-lens-based LF cameras is significantly constrained. In this paper, we propose effective and efficient end-to-end convolutional neural network models for spatially super-resolving LF images. Specifically, the proposed models have an hourglass shape, which allows feature extraction to be performed at the low-resolution level to save both the computational and memory costs. To fully make use of the 4D structure information of LF data in both the spatial and angular domains, we propose to use 4D convolution to characterize the relationship among pixels. Moreover, as an approximation of 4D convolution, we also propose to use spatial-angular separable (SAS) convolutions for more computationally and memory-efficient extraction of spatial-angular joint features. Extensive experimental results on 57 test LF images with various challenging natural scenes show significant advantages from the proposed models over the state-of-the-art methods. That is, an average PSNR gain of more than 3.0 dB and better visual quality are achieved, and our methods preserve the LF structure of the super-resolved LF images better, which is highly desirable for subsequent applications. In addition, the SAS convolution-based model can achieve three times speed up with only negligible reconstruction quality decrease when compared with the 4D convolution-based one. The source code of our method is available online.
Recent imaging technologies are rapidly evolving for sampling richer and more immersive representations of the 3D world. One of the emerging technologies is light field (LF) cameras based on ...micro-lens arrays. To record the directional information of the light rays, a much larger storage space and transmission bandwidth are required by an LF image as compared with a conventional 2D image of similar spatial dimension. Hence, the compression of LF data becomes a vital part of its application. In this paper, we propose an LF codec with disparity guided Sparse Coding over a learned perspective-shifted LF dictionary based on selected Structural Key Views (SC-SKV). The sparse coding is based on a limited number of optimally selected SKVs; yet the entire LF can be recovered from the coding coefficients. By keeping the approximation identical between encoder and decoder, only the residuals of the non-key views, disparity map, and the SKVs need to be compressed into the bit stream. An optimized SKV selection method is proposed such that most LF spatial information can be preserved. To achieve optimum dictionary efficiency, the LF is divided into several coding regions, over which the reconstruction works individually. Experiments and comparisons have been carried out over benchmark LF data set, which show that the proposed SC-SKV codec produces convincing compression results in terms of both rate-distortion performance and visual quality compared with Joint Exploration Model: with 37.9% BD-rate reduction and 1.17-dB BD-PSNR improvement achieved on average, especially with up to 6-dB improvement for low bit rate scenarios.
Depth estimation is a fundamental problem for light field photography applications. Numerous methods have been proposed in recent years, which either focus on crafting cost terms for more robust ...matching, or on analyzing the geometry of scene structures embedded in the epipolar-plane images. Significant improvements have been made in terms of overall depth estimation error; however, current state-of-the-art methods still show limitations in handling intricate occluding structures and complex scenes with multiple occlusions. To address these challenging issues, we propose a very effective depth estimation framework which focuses on regularizing the initial label confidence map and edge strength weights. Specifically, we first detect partially occluded boundary regions (POBR) via superpixel-based regularization. Series of shrinkage/reinforcement operations are then applied on the label confidence map and edge strength weights over the POBR. We show that after weight manipulations, even a low-complexity weighted least squares model can produce much better depth estimation than the state-of-the-art methods in terms of average disparity error rate, occlusion boundary precision-recall rate, and the preservation of intricate visual features.
Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction ...algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with a deep efficient spatial-angular separable convolutional sub-network in the form of local and global residual learning to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Furthermore, we extend this pipeline to LF denoising and spatial super-resolution, which could be considered as variants of coded aperture imaging equipped with different degradation matrices. Extensive experimental results demonstrate that the proposed methods outperform state-of-the-art approaches to a significant extent both quantitatively and qualitatively, i.e., the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better on both real and synthetic LF benchmarks. The code will be publicly available at https://github.com/MantangGuo/DRLF .
The elderly population is increasing rapidly all over the world. One major risk for elderly people is fall accidents, especially for those living alone. In this paper, we propose a robust fall ...detection approach by analyzing the tracked key joints of the human body using a single depth camera. Compared to the rivals that rely on the RGB inputs, the proposed scheme is independent of illumination of the lights and can work even in a dark room. In our scheme, a pose-invariant randomized decision tree algorithm is proposed for the key joint extraction, which requires low computational cost during the training and test. Then, the support vector machine classifier is employed to determine whether a fall motion occurs, whose input is the 3-D trajectory of the head joint. The experimental results demonstrate that the proposed fall detection method is more accurate and robust compared with the state-of-the-art methods.
In this paper, an accurate and efficient full-reference image quality assessment (IQA) model using the extracted Gabor features, called Gabor feature-based model (GFM), is proposed for conducting ...objective evaluation of screen content images (SCIs). It is well-known that the Gabor filters are highly consistent with the response of the human visual system (HVS), and the HVS is highly sensitive to the edge information. Based on these facts, the imaginary part of the Gabor filter that has odd symmetry and yields edge detection is exploited to the luminance of the reference and distorted SCI for extracting their Gabor features, respectively. The local similarities of the extracted Gabor features and two chrominance components, recorded in the LMN color space, are then measured independently. Finally, the Gabor-feature pooling strategy is employed to combine these measurements and generate the final evaluation score. Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model not only yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models.
3D point clouds associated with attributes are considered as a promising data representation for immersive communication. The large amount of data, however, poses great challenges to the subsequent ...transmission and storage processes. In this letter, we propose a new compression scheme for the color attribute of static voxelized 3D point clouds. Specifically, we first partition the colors of a 3D point cloud into clusters by applying k-d tree to the geometry information, which are then successively encoded. To eliminate the redundancy, we propose a novel prediction module, namely graph prediction, in which a small number of representative points selected from previously encoded clusters are used to predict the points to be encoded by exploring the underlying graph structure constructed from the geometry information. Furthermore, the prediction residuals are transformed with the graph transform, and the resulting transform coefficients are finally uniformly quantified and entropy encoded. Experimental results show that the proposed compression scheme is able to achieve better rate-distortion performance at a lower computational cost when compared with state-of-the-art methods.
Point cloud data is a large collection of high dimensional 3D points with 3D coordinates and attributes, which has been one of the mainstream representations for emerging 3D applications, such as ...virtual reality, autonomous vehicles, and robotics. Due to the large‐scale unstructured high‐dimensional nature of point clouds, point cloud processing, transmitting and analysing has been challenging issues in multimedia signal processing and communication. Deep learning is a powerful tool to learn statistical knowledge from massive data. Advances in artificial intelligence, especially deep learning models are offering new opportunities for point cloud processing, compression and analysis. This special issue aims at promoting cutting‐edge research on deep learning‐based point cloud processing, including object detection, segmentation, registration, compression, and visual quality assessment.
Light field (LF) cameras provide perspective information of scenes by taking directional measurements of the focusing light rays. The raw outputs are usually dark with additive camera noise, which ...impedes subsequent processing and applications. We propose a novel LF denoising framework based on anisotropic parallax analysis (APA). Two convolutional neural networks are jointly designed for the task: first, the structural parallax synthesis network predicts the parallax details for the entire LF based on a set of anisotropic parallax features. These novel features can efficiently capture the high-frequency perspective components of a LF from noisy observations. Second, the view-dependent detail compensation network restores non-Lambertian variation to each LF view by involving view-specific spatial energies. Extensive experiments show that the proposed APA LF denoiser provides a much better denoising performance than state-of-the-art methods in terms of visual quality and in preservation of parallax details.