Conditional coding has lately emerged as the main-stream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information ...bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB and outperforms VTM-17.0 in terms of MS-SSIM-RGB. It also opens up a new research direction for advancing learned video compression.
55% new material in the latest edition of this "must-have" for students and practitioners of image & video processing!This Handbook is intended to serve as the basic reference point on image and ...video processing, in the field, in the research laboratory, and in the classroom. Each chapter has been written by carefully selected, distinguished experts specializing in that topic and carefully reviewed by the Editor, Al Bovik, ensuring that the greatest depth of understanding be communicated to the reader. Coverage includes introductory, intermediate and advanced topics and as such, this book serves equally well as classroom textbook as reference resource. . Provides practicing engineers and students with a highly accessible resource for learning and using image/video processing theory and algorithms . Includes a new chapter on image processing education, which should prove invaluable for those developing or modifying their curricula . Covers the various image and video processing standards that exist and are emerging, driving today's explosive industry . Offers an understanding of what images are, how they are modeled, and gives an introduction to how they are perceived . Introduces the necessary, practical background to allow engineering students to acquire and process their own digital image or video data . Culminates with a diverse set of applications chapters, covered in sufficient depth to serve as extensible models to the reader's own potential applications About the Editor. Al Bovik is the Cullen Trust for Higher Education Endowed Professor at The University of Texas at Austin, where he is the Director of the Laboratory for Image and Video Engineering (LIVE). He has published over 400 technical articles in the general area of image and video processing and holds two U.S. patents. Dr. Bovik was Distinguished Lecturer of the IEEE Signal Processing Society (2000), received the IEEE Signal Processing Society Meritorious Service Award (1998), the IEEE Third Millennium Medal (2000), and twice was a two-time Honorable Mention winner of the international Pattern Recognition Society Award. He is a Fellow of the IEEE, was Editor-in-Chief, of the IEEE Transactions on Image Processing (1996-2002), has served on and continues to serve on many other professional boards and panels, and was the Founding General Chairman of the IEEE International Conference on Image Processing which was held in Austin, Texas in 1994.
* No other resource for image and video processing contains the same breadth of up-to-date coverage* Each chapter written by one or several of the top experts working in that area* Includes all essential mathematics, techniques, and algorithms for every type of image and video processing used by electrical engineers, computer scientists, internet developers, bioengineers, and scientists in various, image-intensive disciplines
The human visual system (HVS) exhibits nonlinear sensitivity to the distortions introduced by lossy image and video coding. This effect is due to the luminance masking, contrast masking, and spatial ...and temporal frequency masking characteristics of the HVS. This paper proposes a novel perception-based quantization to remove nonvisible information in high dynamic range (HDR) color pixels by exploiting luminance masking so that the performance of the High Efficiency Video Coding (HEVC) standard is improved for HDR content. A profile scaling based on a tone-mapping curve computed for each HDR frame is introduced. The quantization step is then perceptually tuned on a transform unit basis. The proposed method has been integrated into the HEVC reference model for the HEVC range extensions (HM-RExt), and its performance was assessed by measuring the bitrate reduction against the HM-RExt. The results indicate that the proposed method achieves significant bitrate savings, up to 42.2%, with an average of 12.8%, compared with HEVC at the same quality (based on HDR-visible difference predictor-2 and subjective evaluations).
A video compression framework based on spatio-temporal resolution adaptation (ViSTRA) is proposed, which dynamically resamples the input video spatially and temporally during encoding, based on a ...quantisation-resolution decision, and reconstructs the full resolution video at the decoder. Temporal upsampling is performed using frame repetition, whereas a convolutional neural network super-resolution model is employed for spatial resolution upsampling. ViSTRA has been integrated into the high efficiency video coding reference software (HM 16.14). Experimental results verified via an international challenge show significant improvements, with BD-rate gains of 15% based on PSNR and an average MOS difference of 0.5 based on subjective visual quality tests.
Video compression performance is closely related to the accuracy of inter prediction. It tends to be difficult to obtain accurate inter prediction for the local video regions with inconsistent motion ...and occlusion. Traditional video coding standards propose various technologies to handle motion inconsistency and occlusion, such as recursive partitions, geometric partitions, and long-term references. However, existing learned video compression schemes focus on obtaining an overall minimized prediction error averaged over all regions while ignoring the motion inconsistency and occlusion in local regions. In this paper, we propose a spatial decomposition and temporal fusion based inter prediction for learned video compression. To handle motion inconsistency, we propose to decompose the video into structure and detail (SDD) components first. Then we perform SDD-based motion estimation and SDD-based temporal context mining for the structure and detail components to generate short-term temporal contexts. To handle occlusion, we propose to propagate long-term temporal contexts by recurrently accumulating the temporal information of each historical reference feature and fuse them with short-term temporal contexts. With the SDD-based motion model and long short-term temporal contexts fusion, our proposed learned video codec can obtain more accurate inter prediction. Comprehensive experimental results demonstrate that our codec outperforms the reference software of H.266/VVC on all common test datasets for both PSNR and MS-SSIM.
Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding ...Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today's media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.
Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are ...optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/ .
NASA's Solar Dynamics Observatory (SDO) mission collects extensive data to monitor the Sun's daily activity. In the realm of space mission design, data compression plays a crucial role in addressing ...the challenges posed by limited telemetry rates. The primary objective of data compression is to facilitate efficient data management and transmission to work within the constrained bandwidth, thereby ensuring that essential information is captured while optimizing the utilization of available resources. This paper introduces a neural video compression technique that achieves a high compression ratio for the SDO's image data collection. The proposed approach focuses on leveraging both temporal and spatial redundancies in the data, leading to a more efficient compression. In this work, we introduce an architecture based on the Transformer model, which is specifically designed to capture both local and global information from input images in an effective and efficient manner. Additionally, our network is equipped with an entropy model that can accurately model the probability distribution of the latent representations and improves the speed of the entropy decoding step. The entropy model leverages a channel-dependent approach and utilizes checkerboard-shaped local and global spatial contexts. By combining the Transformer-based video compression network with our entropy model, the proposed compression algorithm demonstrates superior performance over traditional video codecs like H.264 and H.265, as confirmed by our experimental results.
To enable high-speed underwater wireless optical communication (UWOC) in tap-water and seawater environments over long distances, a 450-nm blue GaN laser diode (LD) directly modulated by pre-leveled ...16-quadrature amplitude modulation (QAM) orthogonal frequency division multiplexing (OFDM) data was employed to implement its maximal transmission capacity of up to 10 Gbps. The proposed UWOC in tap water provided a maximal allowable communication bit rate increase from 5.2 to 12.4 Gbps with the corresponding underwater transmission distance significantly reduced from 10.2 to 1.7 m, exhibiting a bit rate/distance decaying slope of -0.847 Gbps/m. When conducting the same type of UWOC in seawater, light scattering induced by impurities attenuated the blue laser power, thereby degrading the transmission with a slightly higher decay ratio of 0.941 Gbps/m. The blue LD based UWOC enables a 16-QAM OFDM bit rate of up to 7.2 Gbps for transmission in seawater more than 6.8 m.