Holoscopic imaging, also known as integral, light field, and plenoptic imaging, is an appealing technology for glassless 3D video systems, which has recently emerged as a prospective candidate for ...future image and video applications, such as 3D television. However, to successfully introduce 3D holoscopic video applications into the market, adequate coding tools that can efficiently handle 3D holoscopic video are necessary. In this context, this paper discusses the requirements and challenges for 3D holoscopic video coding, and presents an efficient 3D holoscopic coding scheme based on High Efficiency Video Coding (HEVC). The proposed 3D holoscopic codec makes use of the self-similarity (SS) compensated prediction concept to efficiently explore the inherent correlation of the 3D holoscopic content in Intra- and Inter-coded frames, as well as a novel vector prediction scheme to take advantage of the peculiar characteristics of the SS prediction data. Extensive experiments were conducted, and have shown that the proposed solution is able to outperform HEVC as well as other coding solutions proposed in the literature. Moreover, a consistently better performance is also observed for a set of different quality metrics proposed in the literature for 3D holoscopic content, as well as for the visual quality of views synthesized from decompressed 3D holoscopic content.
•An efficient 3D holoscopic video coding solution based on HEVC is proposed.•It relies on a self-similarity prediction and a new micro-image based vector prediction.•Superior coding efficiency is shown compared to HEVC and other benchmark solutions.•Consistently better performance regarding a set of different objective quality metrics.
•A motion variation based quality pooling method for 3D video quality assessment.•The method relies on the edge energy information and motion energy information.•Temporal information is effective in ...estimating monocular energy.
The two-stage framework has been widely used in objective image/video quality assessment (IQA/VQA) methods, which includes local quality measurement and global quality pooling. Most of the existing IQA/VQA methods dedicate to local quality degradation modeling. Nevertheless, the significant pooling strategies are often neglected, especially in the quality assessment methods for asymmetrically distorted 3D video sequences. In this paper, motivated by the fact that the perceptual visual information might be affected by the visual masking due to motion suppression, a novel pooling strategy is proposed for visual quality assessment of asymmetrically distorted 3D video sequences by considering the motion variation-based perceptual visual information (MVPVI). In specific, to extract valuable visual information of video sequences, the proposed method computes the motion energy information (MEI) and edge energy information (EEI) by quantifying the frame difference map and the gradient map, respectively. Furthermore, in order to simulate the phenomenon of motion suppression in local motion scenes, where the perceptual information of low-attentional regions might be neglected by the human visual system (HVS), the proposed method first employs the coefficient of variation (CV) to determine the type of motion (local or global) in consecutive frames. Then, the perceptual visual information of the two cases is quantified. Finally, to simulate the role of HVS as an optimal information extractor, the estimated perceptual visual information of each single-view video sequence is used as the weighting factor for the quality pooling of left and right-view video sequences. Extensive comparison experiments conducted on Waterloo-IVC 3D video quality databases demonstrate that the proposed pooling strategy outperforms other relevant pooling strategies, and it achieves better performance than state-of-the-art 3D-IQA/VQA methods when combined with 2D-IQA/VQA methods.
Depth-image-based rendering (DIBR) oriented view synthesis has been widely employed in the current depth-based 3-D video systems by synthesizing a virtual view from an arbitrary viewpoint. However, ...holes may appear in the synthesized view due to disocclusion, thus significantly degrading the quality. Consequently, efforts have been made on developing effective and efficient hole-filling algorithms. Current hole-filling techniques generally extrapolate/interpolate the hole regions with the neighboring information based on an assumption that the texture pattern in the holes is similar to that of the neighboring background information. However, in many scenarios, especially of complex texture, the assumption may not hold. In other words, hole-filling techniques can only provide an estimation for a hole which may not be good enough or may even be erroneous considering a wide variety of complex scene of images. In this paper, we first examine the view interpolation with multiple reference views, demonstrating that the problem of emerging holes in a target virtual view can be greatly alleviated by making good use of other neighboring complementary views in addition to its two (commonly used) most neighboring primary views. The effects of using multiple views for view extrapolation in reducing holes are also investigated in this paper. In view of the 3D Video and ongoing free-viewpoint TV standardization, we propose a new view synthesis framework, which employs multiple views to synthesize output virtual views. Furthermore, a scheme of selective warping of complementary views is developed by efficiently locating a small number of useful pixels in the complementary views for hole reduction, to avoid full warping of additional complementary views thus lowering greatly the warping complexity. Experimental results show that the hole size based on two primary reference views may be reduced by up to about 70% with the help of two complementary reference views in the case of view interpolation, while the hole size based on one primary reference view may be reduced by about 27% with the help of one more complementary reference view in view extrapolation. Moreover, it is shown that by using one more pair of views in view interpolation and one more view in view extrapolation, 10% hole pixels may be reduced additionally.
Inter-View Dependency-Based Rate Control for 3D-HEVC Tan, Songchao; Ma, Siwei; Wang, Shanshe ...
IEEE transactions on circuits and systems for video technology,
02/2017, Letnik:
27, Številka:
2
Journal Article
Recenzirano
We propose an inter-view dependency-based rate control (RC) algorithm for 3D extension of high efficiency video coding (HEVC). First, considering the rate-distortion (R-D) dependency between the ...synthesized views and the input views (including texture videos and depth maps), a synthesized view distortion model is derived. Second, a novel distortion model for dependent views is proposed by investigating the inter-view dependency between the base and dependent views. Based on these two distortion models, a joint optimal bit allocation strategy (including texture/depth level, view level, and frame level) is developed to allocate target bits for both texture videos and depth maps of different views. Furthermore, an effective initial quantization parameter decision scheme considering the characteristics of the input video content is presented. Extensive experimental results exhibit that the proposed scheme achieves a higher rate control accuracy and provides a better R-D performance than the state-of-the-art RC algorithms.
With the rapid development of mixed reality (MR) technology, many compact, lightweight, and powerful devices suitable for remote collaboration, such as MR headsets, hand trackers, and 3D cameras, ...become readily available, providing hardware and software support for remote collaboration. Consequently, exploring MR technologies for remote collaboration on physical industry tasks is becoming increasingly worthwhile. In many complex production scenarios, such as assembly tasks, significant gains can be achieved by having remote experts assist local workers to manipulate objects in local workspaces. However, it can be challenging for a remote expert to carry out effective spatial reference and action demonstration in a local scene. Sharing 3D stereoscopic scenes can provide depth perception and support remote experts to move and explore a local user’s environment freely. Previous studies have demonstrated that gesture-based interaction is natural and intuitive, and interaction based on virtual replicas can provide clear guidance, especially for industrial physical tasks. In this study, we develop an MR remote collaboration system that shares the stereoscopic scene of the local workspace by using real-time 3D video. This system combines gesture cues and virtual replicas in a complementary manner to support the remote expert to create augmented reality (AR) guidance for the local worker naturally and intuitively in the virtual reality immersive space. A formal user study was performed to explore the effects of two different modalities interface in industrial assembly tasks: our novel method of using the combination of virtual replicas and gesture cues in the 3D video (VG3DV), and a method similar to the popular method currently of using gesture cues in the 3D video (G3DV). We found that using the VG3DV can significantly improve the performance and user experience of MR remote collaboration in industrial assembly tasks. Finally, some conclusions and future research directions were given.
The quality assessment for synthesized video with texture/depth compression distortion is important for the design, optimization, and evaluation of the multi-view video plus depth (MVD)-based 3D ...video system. In this paper, the subjective and objective studies for synthesized view assessment are both conducted. First, a synthesized video quality database with texture/depth compression distortion is presented with subjective scores given by 56 subjects. The 140 videos are synthesized from ten MVD sequences with different texture/depth quantization combinations. Second, a full reference objective video quality assessment (VQA) method is proposed concerning about the annoying temporal flicker distortion and the change of spatio-temporal activity in the synthesized video. The proposed VQA algorithm has a good performance evaluated on the entire synthesized video quality database, and is particularly prominent on the subsets which have significant temporal flicker distortion induced by depth compression and view synthesis process.
A three-dimensional (3D) video is a special video representation with an artificial stereoscopic vision effect that increases the depth perception of the viewers. The quality of a 3D video is ...generally measured based on the similarity to stereoscopic vision obtained with the human vision system (HVS). The reason for the usage of these high-cost and time-consuming subjective tests is due to the lack of an objective video Quality of Experience (QoE) evaluation method that models the HVS. In this paper, we propose a hybrid 3D-video QoE evaluation method based on spatial resolution associated with depth cues (i.e., motion information, blurriness, retinal-image size, and convergence). The proposed method successfully models the HVS by considering the 3D video parameters that directly affect depth perception, which is the most important element of stereoscopic vision. Experimental results show that the measurement of the 3D-video QoE by the proposed hybrid method outperforms the widely used existing methods. It is also found that the proposed method has a high correlation with the HVS. Consequently, the results suggest that the proposed hybrid method can be conveniently utilized for the 3D-video QoE evaluation, especially in real-time applications.
Construction management is considered a hands-on field of study which requires good spatial and visual cognitive ability. Virtual reality and other innovative immersive technologies have been used to ...facilitate experiential learning and to improve students’ spatial cognitive abilities. Virtual environments have been criticized due to the gamified look of the environment. Static panorama pictures have been previously used to bring a better sense of reality and immersion at the same time in construction education. However, they cannot provide a continuous experience, and the sense of presence (immersion) is not ideal either. Immersive videos such as 360-degree videos can address this shortfall by providing a continuous experience and a better sense of presence. The use of this technology in construction education field is very limited. As a result, this study investigated a pilot experiment where a combination of 360, 180 3D, and flat videos was incorporated as an educational instrument in delivering construction management content. The content was recorded using different configurations from different body postures to further investigate the optimal way of utilizing this technology for content delivery. The content of the videos was focused on construction means and methods. Students reviewed the content using head-mounted display devices and laptop screens and answered a survey designed to capture their perception and experience of using this technology as an educational tool in the construction management field. The results show a positive perception toward using immersive videos in construction education. Furthermore, the students preferred the head-mounted display as their favorite delivery method. As a result, the prospect of incorporating immersive videos to enhance construction management education is promising.
This paper presents a mixed reality (MR) system that results from the integration of a telepresence system and an application to improve collaborative space exploration. The system combines free ...viewpoint video with immersive projection technology to support nonverbal communication (NVC), including eye gaze, interpersonal distance, and facial expression. Importantly, these features can be interpreted together as people move around the simulation, maintaining a natural social distance. The application is a simulation of Mars, within which the collaborators must come to agreement over; for example, where the Rover should land and go. The first contribution is the creation of an MR system supporting contextualization of NVC. Two technological contributions are prototyping a technique to subtract a person from a background that may contain physical objects and/or moving images and a lightweight texturing method for multiview rendering, which provides balance in terms of visual and temporal quality. A practical contribution is the demonstration of pragmatic approaches to sharing space between display systems of distinct levels of immersion. A research tool contribution is a system that allows comparison of conventional authored and video-based reconstructed avatars, within an environment that encourages exploration and social interaction. Aspects of system quality, including the communication of facial expression and end-to-end latency are reported.