We conducted a study on exemplar-based colorization for grayscale videos, aiming to enhance visual perception by transforming them into vibrant videos with plausible colors. Despite promising results ...from existing approaches, effectively transferring colors from a reference image to grayscale video frames while maintaining temporal consistency between frames remains a challenge. To tackle this challenge, we designed a Hierarchical Color Fusion Network (HCFN). For each grayscale video frame, HCFN initially uses global and local attention mechanisms to calculate pixel-level similarity with the reference image and spatio-temporal correspondence with its neighboring frame, respectively. Based on these relationships, the pixel-level color and spatio-temporal color of the grayscale video frame are generated. Additionally, the tone-based color for the given frame is obtained based on the color distribution of the reference image. Finally, the ambiguity among these three types of colors is eliminated through the proposed hierarchical fusion mechanism, and the final color of the grayscale video frame is produced. Experimental results on public databases show that our method outperforms state-of-the-art methods in visual quality, realism and temporal consistency by a large margin. The source code and pre-trained checkpoints for HCFN is publicly available at https://github.com/wangyins/HCFN.
•We propose a unified exemplar-based video colorization network, dubbed HCFN.•We design a hierarchical gating mechanism to alleviate the ambiguity of colors.•HCFN achieves superior performance compared to state-of-the-art methods.
Scanner Poms, Alex; Crichton, Will; Hanrahan, Pat ...
ACM transactions on graphics,
08/2018, Letnik:
37, Številka:
4
Journal Article
Recenzirano
Odprti dostop
A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems ...for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field's ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as dataflow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.
Video Enhancement with Task-Oriented Flow Xue, Tianfan; Chen, Baian; Wu, Jiajun ...
International journal of computer vision,
1/8, Letnik:
127, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal ...representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
Similar to a fingerprint search system, face recognition technology can assist law enforcement agencies in identifying suspects or finding missing persons. Face recognition technology lets the police ...detect a suspect’s face and compare it with image databases of known criminals and provides investigators with a match list of the most similar faces. Face recognition is a highly efficient and accurate tool in investigation processes. However, in some sensitive scenarios covert methods are required for the detection of suspects or missing persons without risking the lives of police. With the availability of the nano devices such as Raspberry Pi, law enforcement agencies such as police can be equipped with a concealed and secure face recognition system. In this paper, a Raspberry Pi and cloud assisted face recognition framework is proposed. A small-sized portable wireless camera is mounted on a police officer’s uniform to capture a video stream, which is passed to Raspberry Pi for face detection and recognition. The proposed method uses Bag of Words for extraction of oriented FAST and rotated BRIEF points from the detected face, followed by support vector machine for identification of suspects. Raspberry Pi has limited resources such as storage space, memory, and processing power, and therefore the proposed classifier is stored and trained on the cloud. The proposed method is implemented on Raspberry Pi 3 model B in Python 2.7 and is tested on various standard datasets. Experimental results validate the efficiency of the proposed method in accurate detection of faces compared to state-of-the-art face detection and recognition methods, and verify its effectiveness for enhancing law-enforcement services in smart cities.
•Framework for suspect identification to passively enable security in smart cities.•Computationally inexpensive Viola Jones algorithm for face detection on Raspberry Pi.•ORB features extraction from only face regions for recognition and classification.•Energy-aware offloading of features to cloud for real-time suspect identification.
Snapshot compressive imaging (SCI) refers to compressive imaging systems where multiple frames are mapped into a single measurement, with video compressive imaging and hyperspectral compressive ...imaging as two representative applications. Though exciting results of high-speed videos and hyperspectral images have been demonstrated, the poor reconstruction quality precludes SCI from wide applications. This paper aims to boost the reconstruction quality of SCI via exploiting the high-dimensional structure in the desired signal. We build a joint model to integrate the nonlocal self-similarity of video/hyperspectral frames and the rank minimization approach with the SCI sensing process. Following this, an alternating minimization algorithm is developed to solve this non-convex problem. We further investigate the special structure of the sampling process in SCI to tackle the computational workload and memory issues in SCI reconstruction. Both simulation and real data (captured by four different SCI cameras) results demonstrate that our proposed algorithm leads to significant improvements compared with current state-of-the-art algorithms. We hope our results will encourage the researchers and engineers to pursue further in compressive imaging for real applications.
Specialized image signal processors (ISPs) exploit the structure of image processing pipelines to minimize memory bandwidth using the architectural pattern of
line-buffering
, where all intermediate ...data between each stage is stored in small on-chip buffers. This provides high energy efficiency, allowing long pipelines with tera-op/sec. image processing in battery-powered devices, but traditionally requires painstaking manual design in hardware. Based on this pattern, we present Darkroom, a language and compiler for image processing. The semantics of the Darkroom language allow it to compile programs directly into line-buffered pipelines, with all intermediate values in local line-buffer storage, eliminating unnecessary communication with off-chip DRAM. We formulate the problem of optimally scheduling line-buffered pipelines to minimize buffering as an integer linear program. Finally, given an optimally scheduled pipeline, Darkroom synthesizes hardware descriptions for ASIC or FPGA, or fast CPU code. We evaluate Darkroom implementations of a range of applications, including a camera pipeline, low-level feature detection algorithms, and deblurring. For many applications, we demonstrate gigapixel/sec. performance in under 0.5mm
2
of ASIC silicon at 250 mW (simulated on a 45nm foundry process), real-time 1080p/60 video processing using a fraction of the resources of a modern FPGA, and tens of megapixels/sec. of throughput on a quad-core x86 processor.
Extending image processing techniques to videos is a non-trivial task; applying processing independently to each video frame often leads to temporal inconsistencies, and explicitly encoding temporal ...consistency requires algorithmic changes. We describe a more general approach to temporal consistency. We propose a gradient-domain technique that is blind to the particular image processing algorithm. Our technique takes a series of processed frames that suffers from flickering and generates a temporally-consistent video sequence. The core of our solution is to infer the temporal regularity from the original unprocessed video, and use it as a temporal consistency guide to stabilize the processed sequence. We formally characterize the frequency properties of our technique, and demonstrate, in practice, its ability to stabilize a wide range of popular image processing techniques including enhancement and stylization of color and tone, intrinsic images, and depth estimation.
Robust principal component analysis (RPCA) via decomposition into low-rank plus sparse matrices offers a powerful framework for a large variety of applications such as image processing, video ...processing, and 3-D computer vision. Indeed, most of the time these applications require to detect sparse outliers from the observed imagery data that can be approximated by a low-rank matrix. Moreover, most of the time experiments show that RPCA with additional spatial and/or temporal constraints often outperforms the state-of-the-art algorithms in these applications. Thus, the aim of this paper is to survey the applications of RPCA in computer vision. In the first part of this paper, we review representative image processing applications as follows: 1) low-level imaging such as image recovery and denoising, image composition, image colorization, image alignment and rectification, multifocus image, and face recognition; 2) medical imaging such as dynamic magnetic resonance imaging (MRI) for acceleration of data acquisition, background suppression, and learning of interframe motion fields; and 3) imaging for 3-D computer vision with additional depth information such as in structure from motion (SfM) and 3-D motion recovery. In the second part, we present the applications of RPCA in video processing which utilize additional spatial and temporal information compared to image processing. Specifically, we investigate video denoising and restoration, hyperspectral video, and background/foreground separation. Finally, we provide perspectives on possible future research directions and algorithmic frameworks that are suitable for these applications.