To ensure the fidelity of virtual views, rate-distortion optimization (RDO) criterion for the 3D extension of the High Efficiency Video Coding (3D-HEVC) is well designed, in which the synthesized ...view distortion (SVD) is introduced to derive the rate-distortion (RD) cost. To obtain accurate SVDs, the rendering operation is employed which demands a fairly high computational complexity. To address this problem, a fast RDO method for depth maps is proposed, which checks the RD cost during its calculation process. Specifically, given a coding mode, the RD cost is composed of several cumulative items. If the accumulated RD cost is equal to or exceeds the minimum RD cost of previously coded modes, it will not be necessary to continue the RD cost calculation for the mode. To reduce the encoding complexity, existing methods usually aim at reducing the number of tested modes or block partitions. To the best of our knowledge, it is the first time that the latent redundant complexity in the RD cost calculation is investigated and removed. Experimental results demonstrate that, compared with the 3D-HEVC reference software, the proposed method can save 28.1% of depth coding time with a small coding gain (0.04% BD-rate saving). An additional test is designed to evaluate four typical fast coding methods with/without the proposed method. Extensive results verify that the proposed method can be seamlessly combined with the state-of-the-art methods.
Thanks to the rapid development of naked-eye 3D and wireless communication technology, 3D video related applications on mobile devices have attracted a lot of attention. Nevertheless, the ...time-varying characteristics of the wireless channel is very challenging for conventional source-channel coding based transmission strategy. Also, the high complexity of source-channel coding based transmission scheme is undesired for low power mobile terminals. An advanced transmission scheme named Softcast was proposed to achieve efficient transmission performance for 2D image/video. Unfortunately, it cannot be directly applied to wireless 3D video transmission with high efficiency. This paper proposes a more efficient soft transmission scheme for 3D video with a graceful quality adaptation within a wide range of channel Signal-to-Noise Ratio (SNR). The proposed method first extends the linear transform to 4 dimensions with additional view dimension to eliminate the view redundancy, and then metadata optimization and chunk interleaving are designed to further improve the transmission performance. Meanwhile, a synthesis distortion based chunk discard strategy is developed to improve the overall 3D video quality under the condition of limited bandwidth. The experimental results demonstrate that the proposed method significantly improves the 3D video transmission performance over the wireless channel for low power and low complexity scenarios.
This paper addresses high performance depth coding in 3D video by making good use of its coded texture video counterpart. The relationship between the depth and its associated texture video in terms ...of coding mode and motion vector is carefully examined. Our statistical study suggests that the skip-coding mode and its associated motion vectors in the coded texture can be shared for depth coding by saving bit rate at the cost of little increase of distortion, which subsequently results in a nonsequential coding of the depth map. In this sense, coding/prediction of a block can be performed using the skip-coded blocks below and right, which are not available in the conventional sequential coding, thus producing the so-called omnidirectional blocks predicted in the intra-coding by making the best use of (at most) four neighboring blocks. Moreover, in view of the depth-texture structure similarity, a depth-texture cooperative clustering-based prediction method is proposed for cluster-based depth prediction in the intra-coding, which exploits the structure similarity for the current coding block and its neighboring pixels around the block. On the other hand, some large prediction errors may be present for the depth-texture misaligned pixels, which may greatly compromise the coding performance. To deal with these large residuals induced by the depth-texture misalignment, a simple yet effective detection and rectification approach is incorporated in the proposed depth coding scheme. Experimental results show that our proposed depth coding scheme achieves superior rate-distortion performance compared with other relevant coding methods.
This paper gives an end-to-end overview of 3D video and free viewpoint video, which can be regarded as advanced functionalities that expand the capabilities of a 2D video. Free viewpoint video can be ...understood as the functionality to freely navigate within real world visual scenes, as it is known for instance from virtual worlds in computer graphics. 3D video shall be understood as the functionality that provides the user with a 3D depth impression of the observed scene, which is also known as stereo video. In that sense as functionalities, 3D video and free viewpoint video are not mutually exclusive but can very well be combined in a single system. Research in this area combines computer graphics, computer vision and visual communications. It spans the whole media processing chain from capture to display and the design of systems has to take all parts into account, which is outlined in different sections of this paper giving an end-to-end view and mapping of this broad area. The conclusion is that the necessary technology including standard media formats for 3D video and free viewpoint video is available or will be available in the future, and that there is a clear demand from industry and user for such advanced types of visual media. As a consequence we are witnessing these days how such technology enters our everyday life
Deep Multi-Domain Prediction for 3D Video Coding Lei, Jianjun; Shi, Yanan; Pan, Zhaoqing ...
IEEE transactions on broadcasting,
2021-Dec., 2021-12-00, 20211201, Letnik:
67, Številka:
4
Journal Article
Recenzirano
Three-dimensional (3D) video contains plentiful multi-domain correlations, including spatial, temporal, and inter-view correlations. In this paper, a deep multi-domain prediction method is proposed ...for 3D video coding. Different from previous methods, our proposed method utilizes not only spatial and temporal correlations but also inter-view correlation to obtain a more accurate prediction, and adopts deep convolutional neural networks to effectively fuse multi-domain references. More specifically, a hierarchical prediction mechanism, which includes a spatial-temporal prediction network and a multi-domain prediction network, is designed to overcome the fusion difficulty of multi-domain reference information. Furthermore, a progressive spatial-temporal prediction network and a multi-scale multi-domain prediction network are designed to obtain the spatial-temporal prediction result and multi-domain prediction result respectively. Experimental results show that the proposed method achieves considerable bitrate saving compared with 3D-HEVC.
With the evolution of electronic devices, such as 3D cameras, addressing the challenges of text localization in 3D video (e.g., for indexing) is increasingly drawing the attention of the multimedia ...and video processing community. Existing methods focus on 2D video and their performance in the presence of the challenges in 3D video, such as shadow areas associated with text and irregularly sized and shaped text, degrades. This paper proposes the first approach that successfully addresses the challenges of 3D video in addition to those of 2D. It employs a number of innovations, among which, the first is the Generalized Gradient Vector Flow (GGVF) for dominant points detection. The second is the Wavefront concept for text candidate point detection from those dominant points. In addition, an Adaptive B-Spline Polygon Curve Network (ABS-Net) is proposed for accurate text localization in 3D videos by constructing tight fitting bounding polygons using text candidate points. Extensive experiments on custom (3D video) and standard datasets (2D video and scene text) show that the proposed method is practical and useful, and overall outperforms existing state-of-the-art methods.
The popularity of 3D video is increasing daily due to the availability of low-cost 3D televisions and high-speed Internet access. However, currently the contents of 3D video can be distributed ...illegally without any protection. For views generated using a depth-image-based rendering technique, not only the left and right views can be distributed as 3D content, but also the center, left, or right views individually as 2D content. As digital video watermarking is a possible way of protecting these views from unauthorized distribution, in this paper, we propose a digital watermarking method for depth-image-based rendered 3D video. In this method, the watermark is embedded in both of the chrominance channels of a YUV representation of the center view using the dual-tree complex wavelet transform. Then, the left and right views are generated from the watermarked center view and depth map using a depth-image based rendering technique. Finally, the watermark can be extracted from the center, left, and right views in a blind fashion without using the original unwatermarked center, left, or right views. This watermark is robust to geometric distortions, such as upscaling, rotation and cropping, downscaling to an arbitrary resolution, and the most common video distortions, including lossy compression and additive noise. Due to the approximate shift invariance characteristic of the dual-tree complex wavelet transform, the technique is robust against distortions in the left and right views generated using depth-image based rendering. The proposed method can also survive baseline distance adjustment and both 2D and 3D camcording.
Recently, region-based 3D video coding has been proposed. However, existing view synthesis distortion estimation (VSDE) methods are performed at the frame level. To guide the rate-distortion ...optimization process of region-based 3D video coding schemes, this paper proposes the first pixel-level VSDE (PL-VSDE) method. We first give the definition of the pixel-level view synthesis distortion. To estimate it, a backward prediction method is then developed, which starts from the pixels of interest (POIs) in the virtual view and finds their corresponding pixels in the reference view via a coarse-to-fine approach, denoted as coarse-to-fine backward prediction (CFBP) method. Additionally, the CFBP fully considers the details of 3D warping, the rounding operation and the warping competition in view synthesis, leading to improve accuracy of the prediction. Besides, a table-lookup method and a warping property are introduced to speed up the CFBP. After integrating the CFBP into the PL-VSDE, we can estimate the view synthesis distortion at the pixel level. Our method is carried out pixel-by-pixel independently, which is friendly for parallel processing. The experimental results demonstrate that our proposed method has significant advantages in both accuracy and efficiency compared with the state-of-the-art frame-level VSDE methods.
Depth Intra Coding for 3D Video Based on Geometric Primitives Merkle, Philipp; Muller, Karsten; Marpe, Detlev ...
IEEE transactions on circuits and systems for video technology,
2016-March, 2016-3-00, 20160301, Letnik:
26, Številka:
3
Journal Article
Recenzirano
Odprti dostop
This paper presents an advanced depth intra-coding approach for 3D video coding based on the High Efficiency Video Coding (HEVC) standard and the multiview video plus depth (MVD) representation. This ...paper is motivated by the fact that depth signals have specific characteristics that differ from those of natural signals, i.e., camera-view video. Our approach replaces conventional intra-picture coding for the depth component, targeting a consistent and efficient support of 3D video applications that utilize depth maps or polygon meshes or both, with a high depth coding efficiency in terms of minimal artifacts in rendered views and meshes with a minimal number of triangles for a given bit rate. For this purpose, we introduce intra-picture prediction modes based on geometric primitives along with a residual coding method in the spatial domain, substituting conventional intra-prediction modes and transform coding, respectively. The results show that our solution achieves the same quality of rendered or synthesized views with about the same bit rate as MVD coding with the 3D video extension of HEVC (3D-HEVC) for high-quality depth maps and with about 8% less overall bit rate as with 3D-HEVC without related depth tools. At the same time, the combination of 3D video with 3D computer graphics content is substantially simplified, as the geometry-based depth intra signals can be represented as a surface mesh with about 85% less triangles, generated directly in the decoding process as an alternative decoder output.
We propose a method for converting a single image of a transparent object into multi‐view photo that enables users observing the object from multiple new angles, without inputting any 3D shape. The ...complex light paths formed by refraction and reflection makes it challenging to compute the lighting effects of transparent objects from a new angle. We construct an encoder–decoder network for normal reconstruction and texture extraction, which enables synthesizing novel views of transparent object from a set of new views and new environment maps using only one RGB image. By simultaneously considering the optical transmission and perspective variation, our network learns the characteristics of optical transmission and the change of perspective as guidance to the conversion from RGB colours to surface normals. A texture extraction subnetwork is proposed to alleviate the contour loss phenomenon during normal map generation. We test our method using 3D objects within and without our training data, including real 3D objects that exists in our lab, and completely new environment maps that we take using our phones. The results show that our method performs better on view synthesis of transparent objects in complex scenes using only a single‐view image.
We propose a method for converting a single image of a transparent object into multiview photo that enables users observing the object from multiple new angles, without inputting any 3D shape. The complex light paths formed by refraction and reflection makes it challenging to compute the lighting effects of transparent objects from a new angle. We construct an Encoder‐Decoder network for normal reconstruction and texture extraction, which enables synthesizing novel views of transparent object from a set of new views and new environment maps using only one RGB image. By simultaneously considering the optical transmission and perspective variation, our network learns the characteristics of optical transmission and the change of perspective as guidance to the conversion from RGB colors to surface normals. A texture extraction subnetwork is proposed to alleviate the contour loss phenomenon during normal map generation. We test our method using 3D objects within and without our training data, including real 3D objects that exists in our lab, and completely new environment maps that we take using our phones. The results show that our method performs better on view synthesis of transparent objects in complex scenes using only a single‐view image.