The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the ...research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks. We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.
Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, ...they often provide the most objective measure of performance and are therefore important guides for research. We present
MOTChallenge
, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i)
MOT15
, along with numerous state-of-the-art results that were submitted in the last years, (ii)
MOT16
, which contains new challenging videos, and (iii)
MOT17
, that extends
MOT16
sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes, but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.
Since their introduction as a means of front propagation and their first application to edge-based segmentation in the early 90's, level set methods have become increasingly popular as a general ...framework for image segmentation. In this paper, we present a survey of a specific class of region-based level set segmentation methods and clarify how they can all be derived from a common statistical framework. Region-based segmentation schemes aim at partitioning the image domain by progressively fitting statistical models to the intensity, color, texture or motion in each of a set of regions. In contrast to edge-based schemes such as the classical Snakes, region-based methods tend to be less sensitive to noise. For typical images, the respective cost functionals tend to have less local minima which makes them particularly well-suited for local optimization methods such as the level set method. We detail a general statistical formulation for level set segmentation. Subsequently, we clarify how the integration of various low level criteria leads to a set of cost functionals. We point out relations between the different segmentation schemes. In experimental results, we demonstrate how the level set function is driven to partition the image plane into domains of coherent color, texture, dynamic texture or motion. Moreover, the Bayesian formulation allows to introduce prior shape knowledge into the level set method. We briefly review a number of advances in this domain.PUBLICATION ABSTRACT
This paper deals with the problem of reconstructing a depth map from a sequence of differently focused images, also known as depth from focus (DFF) or shape from focus. We propose to state the DFF ...problem as a variational problem, including a smooth but nonconvex data fidelity term and a convex nonsmooth regularization, which makes the method robust to noise and leads to more realistic depth maps. In addition, we propose to solve the nonconvex minimization problem with a linearized alternating directions method of multipliers, allowing to minimize the energy very efficiently. A numerical comparison to classical methods on simulated as well as on real data is presented.
Numerous scientific fields rely on elaborate but partly suboptimal data processing pipelines. An example is diffusion magnetic resonance imaging (diffusion MRI), a non-invasive microstructure ...assessment method with a prominent application in neuroimaging. Advanced diffusion models providing accurate microstructural characterization so far have required long acquisition times and thus have been inapplicable for children and adults who are uncooperative, uncomfortable, or unwell. We show that the long scan time requirements are mainly due to disadvantages of classical data processing. We demonstrate how deep learning, a group of algorithms based on recent advances in the field of artificial neural networks, can be applied to reduce diffusion MRI data processing to a single optimized step. This modification allows obtaining scalar measures from advanced models at twelve-fold reduced scan time and detecting abnormalities without using diffusion models. We set a new state of the art by estimating diffusion kurtosis measures from only 12 data points and neurite orientation dispersion and density measures from only 8 data points. This allows unprecedentedly fast and robust protocols facilitating clinical routine and demonstrates how classical data processing can be streamlined by means of deep learning.
We systematically study the local single-valuedness of the Bregman proximal mapping and local smoothness of the Bregman–Moreau envelope of a nonconvex function under relative prox-regularity—an ...extension of prox-regularity—which was originally introduced by Poliquin and Rockafellar. As Bregman distances are asymmetric in general, in accordance with Bauschke et al., it is natural to consider two variants of the Bregman proximal mapping, which, depending on the order of the arguments, are called left and right Bregman proximal mapping. We consider the left Bregman proximal mapping first. Then, via translation result, we obtain analogue (and partially sharp) results for the right Bregman proximal mapping. The class of relatively prox-regular functions significantly extends the recently considered class of relatively hypoconvex functions. In particular, relative prox-regularity allows for functions with a possibly nonconvex domain. Moreover, as a main source of examples and analogously to the classical setting, we introduce relatively amenable functions, i.e. convexly composite functions, for which the inner nonlinear mapping is component-wise smooth adaptable, a recently introduced extension of Lipschitz differentiability. By way of example, we apply our theory to locally interpret joint alternating Bregman minimization with proximal regularization as a Bregman proximal gradient algorithm, applied to a smooth adaptable function.
Building upon recent developments in optical flow and stereo matching estimation, we propose a variational framework for the estimation of stereoscopic scene flow, i.e., the motion of points in the ...three-dimensional world from stereo image sequences. The proposed algorithm takes into account image pairs from two consecutive times and computes both depth and a 3D motion vector associated with each point in the image. In contrast to previous works, we partially decouple the depth estimation from the motion estimation, which has many practical advantages. The variational formulation is quite flexible and can handle both sparse or dense disparity maps. The proposed method is very efficient; with the depth map being computed on an FPGA, and the scene flow computed on the GPU, the proposed algorithm runs at frame rates of 20 frames per second on QVGA images (320×240 pixels). Furthermore, we present solutions to two important problems in scene flow estimation: violations of intensity consistency between input images, and the uncertainty measures for the scene flow result.
We present a survey and a comparison of a variety of algorithms that have been proposed over the years to minimize multi-label optimization problems based on the Potts model. Discrete approaches ...based on Markov Random Fields as well as continuous optimization approaches based on partial differential equations can be applied to the task. In contrast to the case of binary labeling, the multi-label problem is known to be NP hard and thus one can only expect near-optimal solutions. In this paper, we carry out a theoretical comparison and an experimental analysis of existing approaches with respect to accuracy, optimality and runtime, aimed at bringing out the advantages and short-comings of the respective algorithms. Systematic quantitative comparison is done on the Graz interactive image segmentation benchmark. This paper thereby generalizes a previous experimental comparison (Klodt et al.
2008
) from the binary to the multi-label case.
We conduct a thorough study of photometric stereo under nearby point light source illumination, from modeling to numerical solution, through calibration. In the classical formulation of photometric ...stereo, the luminous fluxes are assumed to be directional, which is very difficult to achieve in practice. Rather, we use light-emitting diodes to illuminate the scene to be reconstructed. Such point light sources are very convenient to use, yet they yield a more complex photometric stereo model which is arduous to solve. We first derive in a physically sound manner this model, and show how to calibrate its parameters. Then, we discuss two state-of-the-art numerical solutions. The first one alternatingly estimates the albedo and the normals, and then integrates the normals into a depth map. It is shown empirically to be independent from the initialization, but convergence of this sequential approach is not established. The second one directly recovers the depth, by formulating photometric stereo as a system of nonlinear partial differential equations (PDEs), which are linearized using image ratios. Although the sequential approach is avoided, initialization matters a lot and convergence is not established either. Therefore, we introduce a provably convergent alternating reweighted least-squares scheme for solving the original system of nonlinear PDEs. Finally, we extend this study to the case of RGB images.
We present the first method to handle curvature regularity in region-based image segmentation and inpainting that is independent of initialization.
To this end we start from a new formulation of ...length-based optimization schemes, based on surface continuation constraints, and discuss the connections to existing schemes. The formulation is based on a
cell complex
and considers basic regions and boundary elements. The corresponding optimization problem is cast as an integer linear program.
We then show how the method can be extended to include curvature regularity, again cast as an integer linear program. Here, we are considering pairs of boundary elements to reflect curvature. Moreover, a constraint set is derived to ensure that the boundary variables indeed reflect the boundary of the regions described by the region variables.
We show that by solving the linear programming relaxation one gets reasonably close to the global optimum, and that curvature regularity is indeed much better suited in the presence of long and thin objects compared to standard length regularity.