Belief propagation (BP) is a popular global optimization technique in computer vision. However, it requires huge bandwidth and memory in hardware implementation because it iteratively processes ...messages between the neighboring nodes. In this paper, we propose an efficient message reduction algorithm that greatly reduces the bandwidth and memory consumption. Compared with the original message passing operation, we successfully reduce the memory and bandwidth with similar quality. For stereo matching of a VGA input where the disparity range is 64, the proposed algorithm can achieve 93.75% message memory reduction with about only 0.2-2.2% bad pixels quality degradation. The proposed algorithm greatly reduces the memory requirement and is suitable both for hardware and software realization.
DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most ...smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.
This paper introduces methods for automatic annotation of landmark photographs via learning textual tags and visual features of landmarks from landmark photographs that are appropriately ...location-tagged from social media. By analyzing spatial distributions of text tags from Flickr’s geotagged photos, we identify thousands of tags that likely refer to landmarks. Further verification by utilizing Wikipedia articles filters out non-landmark tags. Association analysis is used to find the containment relationship between landmark tags and other geographic names, thus forming a geographic hierarchy. Photographs relevant to each landmark tag were retrieved from Flickr and distinctive visual features were extracted from them. The results form ontology for landmarks, including their names, equivalent names, geographic hierarchy, and visual features. We also propose an efficient indexing method for content-based landmark search. The resultant ontology could be used in tag suggestion and content-relevant re-ranking.
We propose a new device, programmable aperture camera (PAC), to capture 4D light field in a camera. PAC can adjust the shape of the aperture in each exposure. This allows us to capture the angular ...information of the light field, which is lost in regular photography. Although multiple exposures are needed to obtain a light field, the total exposure time remains the same as that of taking a single regular photograph at the same image quality level. As opposed to previous techniques that seriously reduce the spatial resolution, PAC captures the image at full spatial resolution and allows adjustable angular resolution. Also its manufacturing cost is much lower than previous techniques. We describe the PAC prototype and demonstrate how digital refocusing is made possible by using the captured light field.
Belief propagation is a popular global optimization technique for many computer vision problems. However, it requires extensive computation due to the iterative message passing operations. In this ...paper, we present a new process element (PE) for efficient message construction. The efficiency is gained by exploiting the unique characteristics of the generalized Potts model (truncated linear mode) of the smoothness term in the Markov random field. For stereo estimation with L disparity values, the algorithm successfully reduces the computation from O(L 2 ) to O(L) and retains the high throughput and low latency. Compared with the direct message construction PE, our method achieves 87.14% computation saving and a 94.38% PE area reduction.
Video blogs and selfies are popular social media formats, which are often captured by wide-angle cameras to show human subjects and expanded background. Unfortunately, due to perspective projection, ...faces near corners and edges exhibit apparent distortions that stretch and squish the facial features, resulting in poor video quality. In this work, we present a video warping algorithm to correct these distortions. Our key idea is to apply stereographic projection locally on the facial regions. We formulate a mesh warp problem using spatial-temporal energy minimization and minimize background deformation using a line-preservation term to maintain the straight edges in the background. To address temporal coherency, we constrain the temporal smoothness on the warping meshes and facial trajectories through the latent variables. For performance evaluation, we develop a wide-angle video dataset with a wide range of focal lengths. The user study shows that 83.9% of users prefer our algorithm over other alternatives based on perspective projection.
We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning. The network fuses optical flow with ...real/virtual camera pose histories into a joint motion representation. Next, the LSTM block infers the new virtual camera pose, and this virtual pose is used to generate a warping grid that stabilizes the frame. Novel relative motion representation as well as a multi-stage training process are presented to optimize our model without any supervision. To the best of our knowledge, this is the first DNN solution that adopts both sensor data and image for stabilization. We validate the proposed framework through ablation studies and demonstrated the proposed method outperforms the state-of-art alternative solutions via quantitative evaluations and a user study.
We propose a new architecture for stereo matching using belief propagation. The architecture combines our fast, fully-parallel processing element (PE) and memory-efficient tile-based BP (TBP) ...algorithm. On the architectural level, we develop several novel techniques, including a three stage pipeline, a message forwarding scheme, and a boundary message reuse scheme, which greatly reduce the required bandwidth and power consumption without sacrificing performance. The simulation shows that the architecture can generate HDTV720p results at 30 fps when operating at 227MHz. The high-quality depth maps enable real-time depth image based rendering and many other important applications in the 3D TV industry.
Hardware-efficient belief propagation Chia-Kai Liang; Chao-Chung Cheng; Yen-Chieh Lai ...
2009 IEEE Conference on Computer Vision and Pattern Recognition,
2009-June
Conference Proceeding
Belief propagation (BP) is an effective algorithm for solving energy minimization problems in computer vision. However, it requires enormous memory, bandwidth, and computation because messages are ...iteratively passed between nodes in the Markov random field (MRF). In this paper, we propose two methods to address this problem. The first method is a message passing scheme called tile-based belief propagation. The key idea of this method is that a message can be well approximated from other faraway ones. We split the MRF into many tiles and perform BP within each one. To preserve the global optimality, we store the outgoing boundary messages of a tile and use them when performing BP in the neighboring tiles. The tile-based BP only requires 1-5% memory and 0.2-1% bandwidth of the ordinary BP. The second method is an O(L) message construction algorithm for the robust functions commonly used for describing the smoothness terms in the energy function. We find that many variables in constructing a message are repetitive; thus these variables can be calculated once and reused many times. The proposed algorithms are suitable for parallel implementations. We design a low-power VLSI circuit for disparity estimation that can construct 440 M messages per second and generate high quality disparity maps in near real-time. We also implement the proposed algorithms on a GPU, which can calculate messages 4 times faster than the sequential O(L) method.
Motion blur of fast-moving subjects is a longstanding problem in photography and very common on mobile phones due to limited light collection efficiency, particularly in low-light conditions. While ...we have witnessed great progress in image deblurring in recent years, most methods require significant computational power and have limitations in processing high-resolution photos with severe local motions. To this end, we develop a novel face deblurring system based on the dual camera fusion technique for mobile phones. The system detects subject motion to dynamically enable a reference camera, e.g., ultrawide angle camera commonly available on recent premium phones, and captures an auxiliary photo with faster shutter settings. While the main shot is low noise but blurry, the reference shot is sharp but noisy. We learn ML models to align and fuse these two shots and output a clear photo without motion blur. Our algorithm runs efficiently on Google Pixel 6, which takes 463 ms overhead per shot. Our experiments demonstrate the advantage and robustness of our system against alternative single-image, multi-frame, face-specific, and video deblurring algorithms as well as commercial products. To the best of our knowledge, our work is the first mobile solution for face motion deblurring that works reliably and robustly over thousands of images in diverse motion and lighting conditions.