Recent advances in machine learning have led to increased interest in solving visual computing problems using methods that employ coordinate‐based neural networks. These methods, which we call neural ...fields, parameterize physical properties of scenes or objects across space and time. They have seen widespread success in problems such as 3D shape and image synthesis, animation of human bodies, 3D reconstruction, and pose estimation. Rapid progress has led to numerous papers, but a consolidation of the discovered knowledge has not yet emerged. We provide context, mathematical grounding, and a review of over 250 papers in the literature on neural fields. In Part I, we focus on neural field techniques by identifying common components of neural field methods, including different conditioning, representation, forward map, architecture, and manipulation methods. In Part II, we focus on applications of neural fields to different problems in visual computing, and beyond (e.g., robotics, audio). Our review shows the breadth of topics already covered in visual computing, both historically and in current incarnations, and highlights the improved quality, flexibility, and capability brought by neural field methods. Finally, we present a companion website that acts as a living database that can be continually updated by the community.
Extending image processing techniques to videos is a non-trivial task; applying processing independently to each video frame often leads to temporal inconsistencies, and explicitly encoding temporal ...consistency requires algorithmic changes. We describe a more general approach to temporal consistency. We propose a gradient-domain technique that is blind to the particular image processing algorithm. Our technique takes a series of processed frames that suffers from flickering and generates a temporally-consistent video sequence. The core of our solution is to infer the temporal regularity from the original unprocessed video, and use it as a temporal consistency guide to stabilize the processed sequence. We formally characterize the frequency properties of our technique, and demonstrate, in practice, its ability to stabilize a wide range of popular image processing techniques including enhancement and stylization of color and tone, intrinsic images, and depth estimation.
Efficient motion intent communication is necessary for safe and collaborative work environments with co-located humans and robots. Humans efficiently communicate their motion intent to other humans ...through gestures, gaze, and other non-verbal cues, and can replan their motions in response. However, robots often have difficulty using these methods. Many existing methods for robot motion intent communication rely on 2D displays, which require the human to continually pause their work to check a visualization. We propose a mixed-reality head-mounted display (HMD) visualization of the intended robot motion over the wearer’s real-world view of the robot and its environment. In addition, our interface allows users to adjust the intended goal pose of the end effector using hand gestures. We describe its implementation, which connects a ROS-enabled robot to the HoloLens using ROS Reality, using MoveIt for motion planning, and using Unity to render the visualization. To evaluate the effectiveness of this system against a 2D display visualization and against no visualization, we asked 32 participants to label various arm trajectories as either colliding or non-colliding with blocks arranged on a table. We found a 15% increase in accuracy with a 38% decrease in the time it took to complete the task compared with the next best system. These results demonstrate that a mixed-reality HMD allows a human to determine where the robot is going to move more quickly and accurately than existing baselines.
Interactive intrinsic video editing Bonneel, Nicolas; Sunkavalli, Kalyan; Tompkin, James ...
ACM transactions on graphics,
11/2014, Letnik:
33, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Separating a photograph into its reflectance and illumination intrinsic images is a fundamentally ambiguous problem, and state-of-the-art algorithms combine sophisticated reflectance and illumination ...priors with user annotations to create plausible results. However, these algorithms cannot be easily extended to videos for two reasons: first, näively applying algorithms designed for single images to videos produce results that are temporally incoherent; second, effectively specifying user annotations for a video requires interactive feedback, and current approaches are orders of magnitudes too slow to support this. We introduce a fast and temporally consistent algorithm to decompose video sequences into their reflectance and illumination components. Our algorithm uses a hybrid
ℓ
2
ℓ
p
formulation that separates image gradients into smooth illumination and sparse reflectance gradients using look-up tables. We use a multi-scale parallelized solver to reconstruct the reflectance and illumination from these gradients while enforcing spatial and temporal reflectance constraints and user annotations. We demonstrate that our algorithm automatically produces reasonable results, that can be interactively refined by users, at rates that are two orders of magnitude faster than existing tools, to produce high-quality decompositions for challenging real-world video sequences. We also show how these decompositions can be used for a number of video editing applications including recoloring, retexturing, illumination editing, and lighting-aware compositing.
Metallophones such as glockenspiels produce sounds in response to contact. Building these instruments is a complicated process, limiting their shapes to well-understood designs such as bars. We ...automatically optimize the shape of arbitrary 2D and 3D objects through deformation and perforation to produce sounds when struck which match user-supplied frequency and amplitude spectra. This optimization requires navigating a complex energy landscape, for which we develop
Latin Complement Sampling
to both speed up finding minima and provide probabilistic bounds on landscape exploration. Our method produces instruments which perform similarly to those that have been professionally-manufactured, while also expanding the scope of shape and sound that can be realized, e.g., single object chords. Furthermore, we can optimize sound spectra to create overtones and to dampen specific frequencies. Thus our technique allows even novices to design metallophones with unique sound and appearance.
Lighting is crucial for portrait photography, yet the complex interactions between the skin and incident light are expensive to model computationally in graphics and difficult to reconstruct ...analytically via computer vision. Alternatively, to allow fast and controllable reflectance and lighting editing, we developed a physically based decomposition through deep learned priors from path-traced portrait images. Previous approaches that used simplified material models or low-frequency or low-dynamic-range lighting struggled to model specular reflections or relight directly without intermediate decomposition. However, we estimate the surface normal, skin albedo and roughness, and high-frequency HDRI maps, and propose an architecture to estimate both diffuse and specular reflectance components. In our experiments, we show that this approach can represent the true appearance function more effectively than simpler baseline methods, leading to better generalization and higher-quality editing.
Evaluating 'Graphical Perception' with CNNs Haehn, Daniel; Tompkin, James; Pfister, Hanspeter
IEEE transactions on visualization and computer graphics,
01/2019, Letnik:
25, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Convolutional neural networks can successfully perform many computer vision tasks on images. For visualization, how do CNNs perform when applied to graphical perception tasks? We investigate this ...question by reproducing Cleveland and McGill's seminal 1984 experiments, which measured human perception efficiency of different visual encodings and defined elementary perceptual tasks for visualization. We measure the graphical perceptual capabilities of four network architectures on five different visualization tasks and compare to existing and new human performance baselines. While under limited circumstances CNNs are able to meet or outperform human task performance, we find that CNNs are not currently a good model for human graphical perception. We present the results of these experiments to foster the understanding of how CNNs succeed and fail when applied to data visualizations.
Automatic cell image segmentation methods in connectomics produce merge and split errors, which require correction through proofreading. Previous research has identified the visual search for these ...errors as the bottleneck in interactive proofreading. To aid error correction, we develop two classifiers that automatically recommend candidate merges and splits to the user. These classifiers use a convolutional neural network (CNN) that has been trained with errors in automatic segmentations against expert-labeled ground truth. Our classifiers detect potentially-erroneous regions by considering a large context region around a segmentation boundary. Corrections can then be performed by a user with yes/no decisions, which reduces variation of information 7.5Ã- faster than previous proofreading methods. We also present a fully-automatic mode that uses a probability threshold to make merge/split decisions. Extensive experiments using the automatic approach and comparing performance of novice and expert users demonstrate that our method performs favorably against state-of-the-art proofreading methods on different connectomics datasets.
Depth estimation tries to obtain 3D scene geometry from low-dimensional data like 2D images. This is a vital operation in computer vision and any general solution must preserve all depth information ...of potential relevance to support higher-level tasks. For scenes with well-defined depth, this work shows that multi-view edges can encode all relevant information—that multi-view edges are complete. For this, we follow Elder’s complementary work on the completeness of 2D edges for image reconstruction. We deploy an image-space geometric representation: an encoding of multi-view scene edges as constraints and a diffusion reconstruction method for inverting this code into depth maps. Due to inaccurate constraints, diffusion-based methods have previously underperformed against deep learning methods; however, we will reassess the value of diffusion-based methods and show their competitiveness without requiring training data. To begin, we work with structured light fields and epipolar plane images (EPIs). EPIs present high-gradient edges in the angular domain: with correct processing, EPIs provide depth constraints with accurate occlusion boundaries and view consistency. Then, we present a differentiable representation form that allows the constraints and the diffusion reconstruction to be optimized in an unsupervised way via a multi-view reconstruction loss. This is based around point splatting via radiative transport, and extends to unstructured multi-view images. We evaluate our reconstructions for accuracy, occlusion handling, view consistency, and sparsity to show that they retain the geometric information required for higher-level tasks.
Connectomics has recently begun to image brain tissue at nanometer resolution, which produces petabytes of data. This data must be aligned, labeled, proofread, and formed into graphs, and each step ...of this process requires visualization for human verification. As such, we present the BUTTERFLY middleware, a scalable platform that can handle massive data for interactive visualization in connectomics. Our platform outputs image and geometry data suitable for hardware-accelerated rendering, and abstracts low-level data wrangling to enable faster development of new visualizations. We demonstrate scalability and extendability with a series of open source Web-based applications for every step of the typical connectomics workflow: data management and storage, informative queries, 2D and 3D visualizations, interactive editing, and graph-based analysis. We report design choices for all developed applications and describe typical scenarios of isolated and combined use in everyday connectomics research. In addition, we measure and optimize rendering throughput—from storage to display—in quantitative experiments. Finally, we share insights, experiences, and recommendations for creating an open source data management and interactive visualization platform for connectomics.