Recently, convolutional neural network (CNN) has attracted tremendous attention and has achieved great success in many image processing tasks. In this paper, we focus on CNN technology combined with ...image restoration to facilitate video coding performance and propose the content-aware CNN based in-loop filtering for high-efficiency video coding (HEVC). In particular, we quantitatively analyze the structure of the proposed CNN model from multiple dimensions to make the model interpretable and optimal for CNN-based loop filtering. More specifically, each coding tree unit (CTU) is treated as an independent region for processing, such that the proposed content-aware multimodel filtering mechanism is realized by the restoration of different regions with different CNN models under the guidance of the discriminative network. To adapt the image content, the discriminative neural network is learned to analyze the content characteristics of each region for the adaptive selection of the deep learning model. The CTU level control is also enabled in the sense of rate-distortion optimization. To learn the CNN model, an iterative training method is proposed by simultaneously labeling filter categories at the CTU level and fine-tuning the CNN model parameters. The CNN based in-loop filter is implemented after sample adaptive offset in HEVC, and extensive experiments show that the proposed approach significantly improves the coding performance and achieves up to 10.0% bit-rate reduction. On average, 4.1%, 6.0%, 4.7%, and 6.0% bit-rate reduction can be obtained under all intra, low delay, low delay P, and random access configurations, respectively.
In this paper, we propose an efficient inter prediction scheme by introducing the deep virtual reference frame (VRF), which serves better reference in the temporal redundancy removal process of video ...coding. In particular, the high quality VRF is generated with the deep learning-based frame rate up conversion (FRUC) algorithm from two reconstructed bi-directional frames, which is subsequently incorporated into the reference list serving as the high quality reference. Moreover, to alleviate the compression artifacts of VRF, we develop a convolutional neural network (CNN)-based enhancement model to further improve its quality. To facilitate better utilization of the VRF, a CTU level coding mode termed as direct virtual reference frame (DVRF) is devised, which achieves better trade-off between compression performance and complexity. The proposed scheme is integrated into HM-16.6 and JEM-7.1 software platforms, and the simulation results under random access (RA) configuration demonstrate significant superiority of the proposed method. When adding VRF to RPS, more than 6% average BD-rate gain is achieved for HEVC test sequences on HM-16.6, and 0.8% BD-rate gain is observed based on JEM-7.1 software. Regarding the DVRF mode, 3.6% bitrate saving is achieved on HM-16.6 with the computational complexity effectively reduced.
In the latest Joint Video Exploration Team development, the quadtree plus binary tree (QTBT) block partitioning structure has been proposed for future video coding. Compared to the traditional ...quadtree structure of High Efficiency Video coding (HEVC) standard, QTBT provides more flexible patterns for splitting the blocks, which results in dramatically increased combinations of block partitions and high computational complexity. In view of this, a confidence interval based early termination (CIET) scheme is proposed for QTBT to identify the unnecessary partition modes in the sense of rate-distortion (RD) optimization. In particular, a RD model is established to predict the RD cost of each partition pattern without the full encoding process. Subsequently, the mode decision problem is casted into a probabilistic framework to select the final partition based on the confidence interval decision strategy. Experimental results show that the proposed CIET algorithm can speed up QTBT block partitioning structure by reducing 54.7% encoding time with only 1.12% increase in terms of bit rate. Moreover, the proposed scheme performs consistently well for the high resolution sequences, of which the video coding efficiency is crucial in real applications.
Light field (LF) has become an attractive representation of immersive multimedia content for simultaneously capturing both the spatial and angular information of the light rays. In this paper, we ...present a LF image compression framework driven by a generative adversarial network (GAN)-based sub-aperture image (SAI) generation and a cascaded hierarchical coding structure. Specifically, we sparsely sample the SAIs in LF and propose the GAN of LF (LF-GAN) to generate the unsampled SAIs by analogy with adversarial learning conditioned on its surrounding contexts. In particular, the LF-GAN learns to interpret both the angular and spatial context of the LF structure and, meanwhile, generates intermediate hypothesis for the unsampled SAIs in a certain position. Subsequently, the sampled SAIs and the residues of the generated-unsampled SAIs are re-organized as pseudo-sequences and compressed by standard video codecs. Finally, the hierarchical coding structure is adopted for the sampled SAI to effectively remove the inter-view redundancies. During the training process of LF-GAN, the pixel-wise Euclidean loss and the adversarial loss are chosen as the optimization objective, such that sharp textures with less blurring in details can be produced. Extensive experimental results show that the proposed LF-GAN-based LF image compression framework outperforms the state-of-the-art learning-based LF image compression approach with on average 4.9% BD-rate reductions over multiple LF datasets.
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive ...telepresence, navigation for autonomous driving and gaming. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. To this end, we propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic point clouds. The optimal predictive transform proves to be the Generalized Graph Fourier Transform in terms of spatio-temporal decorrelation. Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of irregular point clouds. Finally, we present a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained <inline-formula> <tex-math notation="LaTeX">\lambda </tex-math></inline-formula>-Q model. Experimental results show that we achieve around 17% bit rate reduction on average over competitive dynamic point cloud compression methods.
In this paper, a Rate-GOP based frame level rate control scheme is proposed for High Efficiency Video Coding (HEVC). The proposed scheme is developed with the consideration of the new coding tools ...adopted into HEVC, including the quad-tree coding structure and the new reference frame selection mechanism, called reference picture set (RPS). The contributions of this paper mainly include the following three aspects. Firstly, a RPS based hierarchical rate control structure is designed to maintain the high video quality of the key frames. Secondly, the inter-frame dependency based distortion model and bit rate model are proposed, considering the dependency between a coding frame and its reference frame. Thus the distortion and bit rate of the coding frame can be represented by the distortion and bit rate of its reference frame. Accordingly, the Rate-GOP based distortion model and rate model can be achieved via the inter-frame dependency based distortion model and bit rate model. Thirdly, based on these models and a mixed Laplacian distribution of residual information, a new ρ-domain Rate-GOP based rate control is proposed. Experimental results demonstrate the proposed Rate-GOP based rate control has much better R-D performance. Compared with the two state-of-the-art rate control schemes for HEVC, the coding gain with BD-PSNR can be up to 0.87 dB and 0.13 dB on average respectively for all testing configurations. Especially for random access low complexity testing configuration, the BD-PSNR gain can be up to 1.30 dB and 0.23 dB respectively.
Motion compensation has been widely employed for removing temporal redundancies in typical hybrid video coding framework. The popular video compression standards, such as H.264/AVC and HEVC, adopt ...the block-based partitioning model to describe the motion field due to its high-compression efficiency and relatively low-computational complexity. However, block-based motion compensation may not align with the actual object motion boundaries, potentially limiting the compression efficiency. In view of this, we propose a three-zone segmentation-based motion compensation scheme to improve the description accuracy of motion field as well as the coding efficiency. In particular, the segmentation information is implied in the reference frame instead of being explicitly signalled. Based on the segmentation information, three motion compensation zones can be identified, including one edge, one foreground, and one background zone. The foreground zone is motion compensated by the signalled motion vector of the block, and the background zone is motion compensated by the motion information implicitly derived from the local motion field. Regarding the edge zone, it is viewed as an overlapped area and the weighted compensation strategy is adopted. The proposed algorithm is implemented into the reference software VTM-1.0 of versatile video coding (VVC), and the simulation results show that the algorithm can achieve 1.14% and 1.06% bitrate savings for random access and low-delay configurations, respectively.
In the emerging video coding standard, Versatile Video Coding (VVC), a quadtree with nested multi-type tree (MTT) using binary and ternary tree structure was proposed. MTT brings significant coding ...efficiency but increases the encoding complexity. In this paper, a look-ahead prediction based coding unit size pruning algorithm is proposed to cut down redundant MTT partitions. The proposed scheme aims to identify the unnecessary partition direction in advance and consists of two steps, i.e. SATD-based mode decision (SMD) for possible blocks and refined cost derivation based on rate-distortion optimization. Experimental results show that the proposed method can save 41% encoder time with only 0.84% increase in bit rate on average.
Hybrid All Zero Soft Quantized Block Detection for HEVC Cui, Jing; Xiong, Ruiqin; Zhang, Xinfeng ...
IEEE transactions on image processing,
2018-Oct., 2018-Oct, 2018-10-00, 20181001, Volume:
27, Issue:
10
Journal Article
Peer reviewed
Transform and quantization account for a considerable amount of computation time in video encoding process. However, there are a large number of discrete cosine transform coefficients which are ...finally quantized into zeros. In essence, blocks with all zero quantized coefficients do not transmit any information, but still occupy substantial unnecessary computational resources. As such, detecting all-zero block (AZB) before transform and quantization has been recognized to be an efficient approach to speed up the encoding process. Instead of considering the hard-decision quantization (HDQ) only, in this paper, we incorporate the properties of soft-decision quantization into the AZB detection. In particular, we categorize the AZB blocks into genuine AZBs (G-AZB) and pseudo AZBs (P-AZBs) to distinguish their origins. For G-AZBs directly generated from HDQ, the sum of absolute transformed difference-based approach is adopted for early termination. Regarding the classification of P-AZBs which are generated in the sense of rate-distortion optimization, the rate-distortion models established based on transform coefficients together with the adaptive searching of the maximum transform coefficient are jointly employed for the discrimination. Experimental results show that our algorithm can achieve up to 24.16% transform and quantization time-savings with less than 0.06% RD performance loss. The total encoder time saving is about 5.18% on average with the maximum value up to 9.12%. Moreover, the detection accuracy of larger TU sizes, such as 16\times 16 and 32\times 32 can reach to 95% on average.
Rate distortion optimized quantization (RDOQ) is an efficient encoder optimization method that plays an important role in improving the rate-distortion (RD) performance of the high-efficiency video ...coding (HEVC) codecs. However, the superior performance of RDOQ is achieved at the expense of high computational complexity cost in two stages RD minimization, including the determination of optimal quantized level among available candidates for each transformed coefficient and the determination of best quantized coefficients for transform units with the minimum total cost, to softly optimize the quantized coefficients. To reduce the computational cost of the RDOQ algorithm in HEVC, we propose a low-complexity RDOQ scheme by modeling the statistics of the transform coefficients with hybrid Laplace distribution. In this manner, specifically designed block level rate and distortion models are established based on the coefficient distribution. Therefore, the optimal quantization levels can be directly determined by optimizing the RD performance of the whole block, while the complicated RD cost calculations can be eventually avoided. Extensive experimental results show that with about 0.3%-0.4% RD performance degradation, the proposed low-complexity RDOQ algorithm is able to reduce around 70% quantization time with up to 17% total encoding time reduction compared with the original RDOQ implementation in HEVC on average.