While being the de facto standard coordinate representation for human pose estimation, heatmap has not been investigated in-depth. This work fills this gap. For the first time, we find that the ...process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance. We further probe the design limitations of the standard coordinate decoding method, and propose a more principled distributionaware decoding method. Also, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating unbiased/accurate heatmaps. Taking the two together, we formulate a novel Distribution-Aware coordinate Representation of Keypoints (DARK) method. Serving as a model-agnostic plug-in, DARK brings about significant performance boost to existing human pose estimation models. Extensive experiments show that DARK yields the best results on two common benchmarks, MPII and COCO. Besides, DARK achieves the 2nd place entry in the ICCV 2019 COCO Keypoints Challenge. The code is available online.
•We validate the hypothesis that multiple samples per subject might degrade the accuracy.•To enhance the accuracy, two dimensional data selection approach has been proposed.•The proposed method ...selects samples and features simultaneously, and outperforms the existing methods.
Parkinson’s disease (PD) is a serious neurodegenerative disorder. It is reported that more than 90% of PD patients have voice impairments. Multiple types of voice recordings have been used for PD detection. Previous work indicates that the use of multiple types of samples per subject degenerates PD detection accuracy. In this paper, we validate it, and propose a two dimensional data selection method for sample and feature selection. The proposed method ranks features by using chi-square statistical model, searches optimal subset of the ranked features and iteratively selects samples. Experimental results show that the proposed method outperforms the state-of-the-art methods in terms of PD detection accuracy on multiple types of voice data.
An extensive study on the in-loop filter has been proposed for a high efficiency video coding (HEVC) standard to reduce compression artifacts, thus improving coding efficiency. However, in the ...existing approaches, the in-loop filter is always applied to each single frame, without exploiting the content correlation among multiple frames. In this paper, we propose a multi-frame in-loop filter (MIF) for HEVC, which enhances the visual quality of each encoded frame by leveraging its adjacent frames. Specifically, we first construct a large-scale database containing encoded frames and their corresponding raw frames of a variety of content, which can be used to learn the in-loop filter in HEVC. Furthermore, we find that there usually exist a number of reference frames of higher quality and of similar content for an encoded frame. Accordingly, a reference frame selector (RFS) is designed to identify these frames. Then, a deep neural network for MIF (known as MIF-Net) is developed to enhance the quality of each encoded frame by utilizing the spatial information of this frame and the temporal information of its neighboring higher-quality frames. The MIF-Net is built on the recently developed DenseNet, benefiting from its improved generalization capacity and computational efficiency. In addition, a novel block-adaptive convolutional layer is designed and applied in the MIF-Net, for handling the artifacts influenced by coding tree unit (CTU) structure in HEVC. Extensive experiments show that our MIF approach achieves on average 11.621% saving of the Bjøntegaard delta bit-rate (BD-BR) on the standard test set, significantly outperforming the standard in-loop filter in HEVC and other state-of-the-art approaches.
Through exploiting the image nonlocal self-similarity (NSS) prior by clustering similar patches to construct patch groups, recent studies have revealed that structural sparse representation (SSR) ...models can achieve promising performance in various image restoration tasks. However, most existing SSR methods only exploit the NSS prior from the input degraded (internal) image, and few methods utilize the NSS prior from external clean image corpus; how to jointly exploit the NSS priors of internal image and external clean image corpus is still an open problem. In this paper, we propose a novel approach for image restoration by simultaneously considering internal and external nonlocal self-similarity (SNSS) priors that offer mutually complementary information. Specifically, we first group nonlocal similar patches from images of a training corpus. Then a group-based Gaussian mixture model (GMM) learning algorithm is applied to learn an external NSS prior. We exploit the SSR model by integrating the NSS priors of both internal and external image data. An alternating minimization with an adaptive parameter adjusting strategy is developed to solve the proposed SNSS-based image restoration problems, which makes the entire algorithm more stable and practical. Experimental results on three image restoration applications, namely image denoising, deblocking and deblurring, demonstrate that the proposed SNSS produces superior results compared to many popular or state-of-the-art methods in both objective and perceptual quality measurements.
To provide excellent visual experience for customers, virtual reality (VR) sources require higher resolutions and better visual quality than traditional picture sequences. The content of a VR video ...can be mapped into a sphere by playing devices to present a 360° scene, which is usually called VR360 in industrial community. The most popular formats for VR360 sources are the equirectangular projection (ERP) and the cubemap projection (CMP). Both ERP and CMP pictures can be effectively projected to a virtual three-dimensional spherical surface for rendering. It brings a new challenge to the compression of VR video sources, which is how to reallocate proper bit-rate to match mainstream projection formats. The most intuitive way to deal with this challenge is to empirically assign a fixed quantization parameter (QP) to each coding unit according to its position, which evidently lacks precision, rationality, and thus, degrades coding performance. This research proposes a new entropy equilibrium optimization (EEO) methodology to enhance the coding performance of VR360 videos. Specifically, we develop a spherical bit-rate equalization strategy to obtain a block-level Lagrangian multiplier (lambda, λ) for the rate-distortion optimization process in video coding. The appropriate QP value for each block is then dynamically determined in accordance with its λ. Based on our EEO methodology, we develop two algorithms, EEOA-ERP and EEOA-CMP, to enhance compression efficiency for the ERP and CMP pictures, respectively. Experimental results demonstrate that both algorithms achieve significant BD-Rate savings and outperform the HM16.17 platform for all-intra (AI), low-delay (LD) and random-access (RA) configurations, respectively. Concretely, compared with the state-of-the-art algorithm WSU-ERP, the proposed EEOA-ERP achieves BD-Rate saving of 0.37% in LD configuration. Furthermore, the proposed EEOA-CMP gains 2.6% on objective quality in RA configuration when compared with the HM16.17 VR CMP under the common test condition.
Tensor completion recovers missing entries of multiway data. Most of the current methods exploit the low-rank tensor structure for image completion applications. In this paper, we simultaneously ...exploit the globally multidimensional structure and locally piecewise smoothness to further enhance the performance. In the proposed optimization model, the low tensor tree rank minimization is used for the global data structure, and the total variation minimization is used for the local structure. Two kinds of total variation functions are discussed. The optimization problem is transformed into several subproblems by alternating direction method of multipliers. The subproblem on low tensor tree rank minimization is solved by singular value thresholding, and the subproblem on total variation minimization can be solved by soft thresholding. Numerical experiments on color images and light field images demonstrate that the proposed method outperforms most of the state-of-the-art methods in terms of recovery accuracy and computational complexity.
View synthesis with depth-image-based rendering (DIBR) has attracted great interest in that it can provide a virtual image at any arbitrary viewpoint in 3-D video and free-viewpoint TV. An inherent ...problem in the DIBR view synthesis is occurrence of holes in a synthesized image, which is also known as disocclusion problem. The disoccluded regions need to be handled properly in order to generate a synthesized view of good quality. This paper provides a fundamental examination of hole generation mechanism in the DIBR oriented view synthesis process. A necessary and sufficient condition of hole generation is first shown, and the corresponding hole location and length is obtained analytically. Furthermore, in view that the conventional hole filling algorithms may fail to fill up a hole correctly when lacking (adequate) visible background information, we propose utilizing the occluded (invisible) information to identify and locate the relevant background pixels around a hole. We then make use of the visible and invisible background information together to perform hole filling. Experimental results validate our hole generation model demonstrating agreement to our analytical results, while our proposed hole filling approach shows superior performance in terms of visual quality of synthesized views.
Sparse coding has achieved a great success in various image processing tasks. However, a benchmark to measure the sparsity of image patch/group is missing since sparse coding is essentially an ...NP-hard problem. This work attempts to fill the gap from the perspective of rank minimization. We firstly design an adaptive dictionary to bridge the gap between group-based sparse coding (GSC) and rank minimization. Then, we show that under the designed dictionary, GSC and the rank minimization problems are equivalent, and therefore the sparse coefficients of each patch group can be measured by estimating the singular values of each patch group. We thus earn a benchmark to measure the sparsity of each patch group because the singular values of the original image patch groups can be easily computed by the singular value decomposition (SVD). This benchmark can be used to evaluate performance of any kind of norm minimization methods in sparse coding through analyzing their corresponding rank minimization counterparts. Towards this end, we exploit four well-known rank minimization methods to study the sparsity of each patch group and the weighted Schatten p-norm minimization (WSNM) is found to be the closest one to the real singular values of each patch group. Inspired by the aforementioned equivalence regime of rank minimization and GSC, WSNM can be translated into a non-convex weighted ℓp-norm minimization problem in GSC. By using the earned benchmark in sparse coding, the weighted ℓp-norm minimization is expected to obtain better performance than the three other norm minimization methods, i.e., ℓ1-norm, ℓp-norm and weighted ℓ1-norm. To verify the feasibility of the proposed benchmark, we compare the weighted ℓp-norm minimization against the three aforementioned norm minimization methods in sparse coding. Experimental results on image restoration applications, namely image inpainting and image compressive sensing recovery, demonstrate that the proposed scheme is feasible and outperforms many state-of-the-art methods.
Sparse representation has achieved great success in various image processing and computer vision tasks. For image processing, typical patch-based sparse representation (PSR) models usually tend to ...generate undesirable visual artifacts, while group-based sparse representation (GSR) models lean to produce over-smooth effects. In this paper, we propose a new sparse representation model, termed joint patch-group based sparse representation (JPG-SR). Compared with existing sparse representation models, the proposed JPG-SR provides an effective mechanism to integrate the local sparsity and nonlocal self-similarity of images. We then apply the proposed JPG-SR to image restoration tasks, including image inpainting and image deblocking. An iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is developed to solve the proposed JPG-SR based image restoration problems. Experimental results demonstrate that the proposed JPG-SR is effective and outperforms many state-of-the-art methods in both objective and perceptual quality.
Low-delay hierarchical coding structure (LD-HCS), as one of the most important components in the latest High Efficiency Video Coding (HEVC) standard, greatly improves coding performance. It groups ...consecutive P/B frames into different layers and encodes them with different quantization parameters (QPs) and reference mechanisms in such a way that temporal dependency among frames can be exploited. However, due to varying characteristics of video contents, temporal dependency among coding units differs significantly from each other in the same or different layers, while a fixed LD-HCS scheme cannot take full advantage of the dependency, leading to a substantial loss in coding performance. This paper addresses the temporally dependent rate distortion optimization (RDO) problem by attempting to exploit varying temporal dependency of different units. First, the temporal relationship of different frames under the LD-HCS is examined, and hierarchical temporal propagation chains are constructed to represent the temporal dependency among coding units in different frames. Then, a hierarchical temporally dependent RDO scheme is developed specifically for the LD-HCS based on a source distortion propagation model. Experimental results show that our proposed scheme can achieve 2.5% and 2.3% BD-rate gain in average compared with the HEVC codec under the same configuration of P and B frames, respectively, with a negligible increase in encoding time. Furthermore, coupled with QP adaption, our proposed method can achieve higher coding gains, e.g., with multi-QP optimization, about 5.4% and 5.0% BD-rate saving in average over the HEVC codec under the same setting of P and B frames, respectively.