When learning functions on manifolds, we can improve performance by regularizing with respect to the intrinsic manifold geometry rather than the ambient space. However, when regularizing tensor ...learning, calculating the derivatives along this intrinsic geometry is not possible, and so existing approaches are limited to regularizing in Euclidean space. Our new method for intrinsically regularizing and learning tensors on Riemannian manifolds introduces a surrogate object to encapsulate the geometric characteristic of the tensor. Regularizing this instead allows us to learn non-symmetric and high-order tensors. We apply our approach to the relative attributes problem, and we demonstrate that explicitly regularizing high-order relationships between pairs of data points improves performance.
Visual formats have advanced beyond single‐view images and videos: 3D movies are commonplace, researchers have developed multi‐view navigation systems, and VR is helping to push light field cameras ...to mass market. However, editing tools for these media are still nascent, and even simple filtering operations like color correction or stylization are problematic: naively applying image filters per frame or per view rarely produces satisfying results due to time and space inconsistencies. Our method preserves and stabilizes filter effects while being agnostic to the inner working of the filter. It captures filter effects in the gradient domain, then uses input frame gradients as a reference to impose temporal and spatial consistency. Our least‐squares formulation adds minimal overhead compared to naive data processing. Further, when filter cost is high, we introduce a filter transfer strategy that reduces the number of per‐frame filtering computations by an order of magnitude, with only a small reduction in visual quality. We demonstrate our algorithm on several camera array formats including stereo videos, light fields, and wide baselines.
We present a method to synthesize plausible video sequences of humans according to user-defined body motions and viewpoints. We first capture a small database of multi-view video sequences of an ...actor performing various basic motions. This database needs to be captured only once and serves as the input to our synthesis algorithm. We then apply a marker-less model-based performance capture approach to the entire database to obtain pose and geometry of the actor in each database frame. To create novel video sequences of the actor from the database, a user animates a 3D human skeleton with novel motion and viewpoints. Our technique then synthesizes a realistic video sequence of the actor performing the specified motion based only on the initial database. The first key component of our approach is a new efficient retrieval strategy to find appropriate spatio-temporally coherent database frames from which to synthesize target video frames. The second key component is a warping-based texture synthesis approach that uses the retrieved most-similar database frames to synthesize spatio-temporally coherent target video frames. For instance, this enables us to easily create video sequences of actors performing dangerous stunts without them being placed in harm's way. We show through a variety of result videos and a user study that we can synthesize realistic videos of people, even if the target motions and camera views are different from the database content.
Many 4D light field processing applications rely on superpixel segmentations, for which occlusion-aware view consistency is important. Yet, existing methods often enforce consistency by propagating ...clusters from a central view only, which can lead to inconsistent superpixels for non-central views. Our proposed approach combines an occlusion-aware angular segmentation in horizontal and vertical EPI spaces with an occlusion-aware clustering and propagation step across all views. Qualitative video demonstrations show that this helps to remove flickering and inconsistent boundary shapes versus the state-of-the-art approach, and quantitative metrics reflect these findings with improved boundary accuracy and view consistency scores.
It is now possible to capture the 3D motion of the human body on consumer hardware and to puppet in real time skeleton‐based virtual characters. However, many characters do not have humanoid ...skeletons. Characters such as spiders and caterpillars do not have boned skeletons at all, and these characters have very different shapes and motions. In general, character control under arbitrary shape and motion transformations is unsolved ‐ how might these motions be mapped? We control characters with a method which avoids the rigging‐skinning pipeline — source and target characters do not have skeletons or rigs. We use interactively‐defined sparse pose correspondences to learn a mapping between arbitrary 3D point source sequences and mesh target sequences. Then, we puppet the target character in real time. We demonstrate the versatility of our method through results on diverse virtual characters with different input motion controllers. Our method provides a fast, flexible, and intuitive interface for arbitrary motion mapping which provides new ways to control characters for real‐time animation.
Photographs usually show a scene from a single perspective. However, as commonly seen in art, scenes and objects can be visualized from multiple perspectives. Making such images manually is time ...consuming and tedious. We propose a novel system for designing multi‐perspective images and videos. First, the images in the input sequence are aligned using structure from motion. This enables us to track feature points across the sequence. Second, the user chooses portal polygons in a target image into which different perspectives are to be embedded. The corresponding image regions from the other images are then copied into these portals. Due to the tracking feature and automatic warping, this approach is considerably faster than current tools. We explore a wide range of artistic applications using our system with image and video data, such as looking around corners and up and down stair cases, recursive multi‐perspective imaging, cubism and panoramas.
Predictor Combination at Test Time Kwang In Kim; Tompkin, James; Richardt, Christian
2017 IEEE International Conference on Computer Vision (ICCV),
2017-Oct.
Conference Proceeding
We present an algorithm for test-time combination of a set of reference predictors with unknown parametric forms. Existing multi-task and transfer learning algorithms focus on training-time transfer ...and combination, where the parametric forms of predictors are known and shared. However, when the parametric form of a predictor is unknown, e.g., for a human predictor or a predictor in a precompiled library, existing algorithms are not applicable. Instead, we empirically evaluate predictors on sampled data points to measure distances between different predictors. This embeds the set of reference predictors into a Riemannian manifold, upon which we perform manifold denoising to obtain the refined predictor. This allows our approach to make no assumptions about the underlying predictor forms. Our test-time combination algorithm equals or outperforms existing multi-task and transfer learning algorithms on challenging real-world datasets, without introducing specific model assumptions.