Depth image-based rendering (DIBR) techniques play an important role in free viewpoint videos (FVVs), which have a wide range of applications including immersive entertainment, remote monitoring, ...education, etc. FVVs are usually synthesized by DIBR techniques in a "blind" environment (without a reference video). Thus, an effective reference-free synthesized video quality assessment (VQA) metric is vital. At present, many image quality assessment (IQA) algorithms for DIBR-synthesized images have been proposed, but limited researches have been concerned about the quality assessment of DIBR-synthesized videos. To this end, this paper proposes a novel reference-free VQA method for synthesized videos, which operates in Spatial and Temporal Domains, dubbed as STD. The design fundamental of the proposed STD metric considers the effects of two major distortions introduced by DIBR techniques on the visual quality of synthesized videos. First, considering the geometric distortion introduced by DIBR technologies can increase high-frequency contents of the synthesized frame, the influence of the geometric distortion on the visual quality of a synthesized video can be effectively evaluated by estimating high-frequency energies of each synthesized frame in spatial domain. Second, temporal inconsistency caused by DIBR techniques brings the temporal flicker distortion, which is one of the most annoying artifacts in DIBR-synthesized videos. In temporal domain, we quantify temporal inconsistency by measuring motion differences between consecutive frames. Specifically, optical flow method is first used to estimate the motion field between adjacent frames. Then, we calculate the structural similarity of adjacent optical flow fields and further adopt the structural similarity value to weight the pixel differences of adjacent optical flow fields. Experiments show that the above two features are able to well perceive the visual quality of DIBR-synthesized videos. Furthermore, since the two features are extracted from spatial and temporal domains, respectively, we integrate them using a linear weighting strategy to obtain our STD metric, which proves advantageous over two components and the competing state-of-the-art I/VQA methods. The source code is available at https://github.com/wgc-vsfm/DIBR-video-quality-assessment .
In an increasingly connected world, consumer video experiences have diversified away from traditional broadcast video into new applications with increased use of non-camera-captured content such as ...computer screen desktop recordings or animations created by computer rendering, collectively referred to as screen content. There has also been increased use of graphics and character content that is rendered and mixed or overlaid together with camera-generated content. The emerging Versatile Video Coding (VVC) standard, in its first version, addresses this market change by the specification of low-level coding tools suitable for screen content. This is in contrast to its predecessor, the High Efficiency Video Coding (HEVC) standard, where highly efficient screen content support is only available in extension profiles of its version 4. This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode. The specification of these coding tools in the first version of VVC enables the VVC reference software implementation (VTM) to achieve average bit-rate savings of about 41% to 61% relative to the HEVC test model (HM) reference software implementation using the Main 10 profile for 4:2:0 screen content test sequences. Compared to the HM using the Screen-Extended Main 10 profile and the same 4:2:0 test sequences, the VTM provides about 19% to 25% bit-rate savings. The same comparison with 4:4:4 test sequences revealed bit-rate savings of about 13% to 27% for <inline-formula> <tex-math notation="LaTeX">Y'C_{B}C_{R} </tex-math></inline-formula> and of about 6% to 14% for <inline-formula> <tex-math notation="LaTeX">R'G'B' </tex-math></inline-formula> screen content. Relative to the HM without the HEVC version 4 screen content coding extensions, the bit-rate savings for 4:4:4 test sequences are about 33% to 64% for <inline-formula> <tex-math notation="LaTeX">Y'C_{B}C_{R} </tex-math></inline-formula> and 43% to 66% for <inline-formula> <tex-math notation="LaTeX">R'G'B' </tex-math></inline-formula> screen content.
Tile-Based Panoramic Video Quality Assessment Jiang, Zhiqian; Xu, Yiling; Sun, Jun ...
IEEE transactions on broadcasting,
2022-June, 2022-6-00, 20220601, Letnik:
68, Številka:
2
Journal Article
Recenzirano
As part of video transmission, video quality assessment plays an important role in improving the user's visual quality and reducing transmission bitrate. However, the existing assessment schemes for ...the tile-based transmission mechanism cannot accurately predict the video quality. In this paper, we propose a tile-based panoramic video quality assessment method to optimize the transmission strategy for tile-based streaming. Considering the distortion of panoramic video projection, the inverse of gnomonic projection and bilinear interpolation are utilized to convert the planar tiles to sphere. Video Multimethod Assessment Fusion (VMAF) is applied to obtain the tile quality score. Besides, the user's head movement and eye movement are combined to generate a tile weight map, and generate the final score of the panoramic video through weighted averaging. In addition, considering the impact of the quality fluctuation of the tiles within field of view, the tile quality loss model is established from multiple dimensions including the encoding parameter, viewport position and video content. Experimental results show that there is a high correlation between the predicted results and actual subjective scores, under two video quality assessment datasets. Furthermore, we integrate the method into the panoramic video transmission system. Compared with the tile-based adaptive rate algorithm, the proposed method degrades the video quality by only 3%, while reducing the bandwidth consumption by 16.99%.
•Based on literature, we elaborate ten video and five vision quality characteristics.•15 characteristics form a quality model for vision videos shown in 2 representations.•A hierarchical ...decomposition of vision video quality to evaluate given vision videos.•A mapping of the characteristics to the single process steps to guide video production.•Six characteristics are significantly related to the overall quality of vision videos.
Establishing a shared software project vision is a key challenge in Requirements Engineering (RE). Several approaches use videos to represent visions. However, these approaches omit how to produce a good video. This missing guidance is one crucial reason why videos are not established in RE. We propose a quality model for videos representing a vision, so-called vision videos. Based on two literature reviews, we elaborate ten quality characteristics of videos and five quality characteristics of visions which together form a quality model for vision videos that includes all 15 quality characteristics. We provide two representations of the quality model: (a) a hierarchical decomposition of vision video quality into the quality characteristics and (b) a mapping of these characteristics to the video production and use process. While the hierarchical decomposition supports the evaluation of vision videos, the mapping provides guidance for video production. In an evaluation with 139 students, we investigated whether the 15 characteristics are related to the overall quality of vision videos perceived by the subjects from a developer’s the point of view. Six characteristics (video length, focus, prior knowledge, clarity, pleasure, and stability) correlated significantly with the likelihood that the subjects perceived a vision video as good. These relationships substantiate a fundamental relevance of the proposed quality model. Therefore, we conclude that the quality model is a sound basis for future refinements and extensions.
To save manufacturing cost, most color digital video cameras employ a single-sensor technology with a red-green- blue (RGB) color filter array (CFA) to capture real-world scenes. Due to only one ...primary color measured at each pixel location, the captured videos are usually referred to as the mosaic videos. For the purposes of economical storage and transmission, it is very important to achieve a good tradeoff between the quality and bitrate when compressing mosaic videos with different RGB-CFA structures. In this paper, based on mathematical optimization technique, a novel chroma subsampling strategy is presented for compressing mosaic videos with arbitrary RGB-CFA structures in H.264/AVC and High Efficiency Video Coding (HEVC). For each 2 × 2 YUV block to be subsampled with 4:2:0 format, the proposed strategy determines the proper sampled U and V components by minimizing, prior to compression, the quality distortion between the original colocated mosaic block and the mosaic block conversed from the current subsampled YUV block. Through the mathematical optimization formulated in the proposed strategy, the significance of the sampled U and V components for reconstructing R, G, and B pixels can be simultaneously taken into consideration. The experimental results demonstrate that the proposed chroma subsampling strategy has the best quality and bitrate tradeoff at a similar execution time requirement for compressing mosaic videos with arbitrary RGB-CFA structures in H.264/AVC and HEVC compared with the state-of-the-art ones by Chen et al. and Yang et al. as well as the three commonly used ones.
Objectives: Breast self-examination (BSE) is very important to early detect breast cancer in women in addition to imaging methods. The easiest way to access information concerning how to perform this ...examination is undoubtedly the internet, and the most popular platform is YouTube. However, the most important disadvantage of this massive platform is the risk of spreading false information since it cannot be audited. This study aimed to evaluate Turkish videos on BSE on YouTube in terms of quality and content. Methods: On January 17, 2022, a search was conducted on YouTube using the keyword “breast self-examination”, and the first 210 videos presented on the first five pages were obtained. After applying the study criteria, 156 were included in the sample and evaluated by two general surgeons in terms of educational value, content, and upload source. Results: Of the 156 videos, 23 were categorized as useful (14.7%) and 133 as misleading (85.3%). When examined according to the upload source group, universities/professional organizations/non-profit physicians/physicians had the highest rate of misleading videos (96.9%), while stand-alone health information websites had the highest rate of useful videos (24%). There was no significant difference between the upload sources in terms of video length, number of views, content score, or quality score. Conclusions: The number of useful Turkish videos on BSE is very low. Our results indicate the need for more educational and useful videos to be produced, especially by healthcare professionals who use the YouTube platform.
Summarizing Unconstrained Videos Using Salient Montages Min Sun; Farhadi, Ali; Taskar, Ben ...
IEEE transactions on pattern analysis and machine intelligence,
2017-Nov.-1, 2017-11-01, 2017-11-1, 20171101, Letnik:
39, Številka:
11
Journal Article
Recenzirano
Odprti dostop
We present a novel method to summarize unconstrained videos using salient montages (i.e., a "melange" of frames in the video as shown in Fig. 1), by finding "montageable moments" and identifying the ...salient people and actions to depict in each montage. Our method aims at addressing the increasing need for generating concise visualizations from the large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken "in the wild" where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. In our experiments, we show that our method can reliably detect and track humans under significant action and camera motion. Moreover, the predicted salient people are more accurate than results from state-of-the-art video salieny method 1 . Finally, we demonstrate that a novel "montageability" score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.
Enrollment in courses taught remotely in higher education has been on the rise, with a recent surge in response to a global pandemic. While adapting this form of teaching, instructors familiar with ...traditional face‐to‐face methods are now met with a new set of challenges, including students not turning on their cameras during synchronous class meetings held via videoconferencing. After transitioning to emergency remote instruction in response to the COVID‐19 pandemic, our introductory biology course shifted all in‐person laboratory sections into synchronous class meetings held via the Zoom videoconferencing program. Out of consideration for students, we established a policy that video camera use during class was optional, but encouraged. However, by the end of the semester, several of our instructors and students reported lower than desired camera use that diminished the educational experience. We surveyed students to better understand why they did not turn on their cameras. We confirmed several predicted reasons including the most frequently reported: being concerned about personal appearance. Other reasons included being concerned about other people and the physical location being seen in the background and having a weak internet connection, all of which our exploratory analyses suggest may disproportionately influence underrepresented minorities. Additionally, some students revealed to us that social norms also play a role in camera use. This information was used to develop strategies to encourage—without requiring—camera use while promoting equity and inclusion. Broadly, these strategies are to not require camera use, explicitly encourage usage while establishing norms, address potential distractions, engage students with active learning, and understand your students’ challenges through surveys. While the demographics and needs of students vary by course and institution, our recommendations will likely be directly helpful to many instructors and also serve as a model for gathering data to develop strategies more tailored for other student populations.
Students were asked why they chose not to turn on their cameras during synchronous class meetings held via Zoom. Their responses influenced a strategy for encouraging them to do so.