We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of
ℓ
2
norms of the blocks of coefficients ...associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of
joint subspace selection
, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time.
We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to ...capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-the-art approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets.
We present a method for detecting and parsing buildings from unorganized 3D point clouds into a compact, hierarchical representation that is useful for high-level tasks. The input is a set of range ...measurements that cover large-scale urban environment. The desired output is a set of parse trees, such that each tree represents a semantic decomposition of a building - the nodes are roof surfaces as well as volumetric parts inferred from the observable surfaces. We model the above problem using a simple and generic grammar and use an efficient dependency parsing algorithm to generate the desired semantic description. We show how to learn the parameters of this simple grammar in order to produce correct parses of complex structures. We are able to apply our model on large point clouds and parse an entire city.
Object detection via boundary structure segmentation Toshev, Alexander; Taskar, Ben; Daniilidis, Kostas
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
06/2010
Conference Proceeding
Open access
We address the problem of object detection and segmentation using holistic properties of object shape. Global shape representations are highly susceptible to clutter inevitably present in realistic ...images, and can be robustly recognized only using a precise segmentation of the object. To this end, we propose a figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient. Our shape representation, called the chordiogram, is based on geometric relationships of object boundary edges, while the perceptual saliency cues we use favor coherent regions distinct from the background. We formulate the segmentation problem as an integer quadratic program and use a semidefinite programming relaxation to solve it. Obtained solutions provide the segmentation of an object as well as a detection score used for object recognition. Our single-step approach improves over state of the art methods on several object detection and segmentation benchmarks.
We address the problem of object detection and segmentation using global holistic properties of object shape. Global shape representations are highly susceptible to clutter inevitably present in ...realistic images, and thus can be applied robustly only using a precise segmentation of the object. To this end, we propose a figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient. Our shape representation, called the chordiogram, is based on geometric relationships of object boundary edges, while the perceptual saliency cues we use favor coherent regions distinct from the background. We formulate the segmentation problem as an integer quadratic program and use a semidefinite programming relaxation to solve it. The obtained solutions provide a segmentation of the object as well as a detection score used for object recognition. Our single-step approach achieves state-of-the-art performance on several object detection and segmentation benchmarks.
Summarizing Unconstrained Videos Using Salient Montages Min Sun; Farhadi, Ali; Taskar, Ben ...
IEEE transactions on pattern analysis and machine intelligence,
2017-Nov.-1, 2017-11-01, 2017-11-1, 20171101, Volume:
39, Issue:
11
Journal Article
Peer reviewed
Open access
We present a novel method to summarize unconstrained videos using salient montages (i.e., a "melange" of frames in the video as shown in Fig. 1), by finding "montageable moments" and identifying the ...salient people and actions to depict in each montage. Our method aims at addressing the increasing need for generating concise visualizations from the large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken "in the wild" where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. In our experiments, we show that our method can reliably detect and track humans under significant action and camera motion. Moreover, the predicted salient people are more accurate than results from state-of-the-art video salieny method 1 . Finally, we demonstrate that a novel "montageability" score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.
We consider the problem of reconstructing a symmetric matrix from its principal minors, which has several applications in information theory and statistical modeling. We develop a theory of symmetric ...matrices with equal corresponding principal minors based on a simple equivalent property due to Oeding (2011) 10. We then use this theory to provide a method for choosing a canonical representative from the class of all symmetric matrices with specified principal minors. Finally, we provide an efficient algorithm for computing this canonical representative given its principal minors as input.
Dynamic Structured Model Selection Weiss, David; Sapp, Benjamin; Taskar, Ben
2013 IEEE International Conference on Computer Vision,
12/2013
Conference Proceeding, Journal Article
In many cases, the predictive power of structured models for for complex vision tasks is limited by a trade-off between the expressiveness and the computational tractability of the model. However, ...choosing this trade-off statically a priori is sub optimal, as images and videos in different settings vary tremendously in complexity. On the other hand, choosing the trade-off dynamically requires knowledge about the accuracy of different structured models on any given example. In this work, we propose a novel two-tier architecture that provides dynamic speed/accuracy trade-offs through a simple type of introspection. Our approach, which we call dynamic structured model selection (DMS), leverages typically intractable features in structured learning problems in order to automatically determine' which of several models should be used at test-time in order to maximize accuracy under a fixed budgetary constraint. We demonstrate DMS on two sequential modeling vision tasks, and we establish a new state-of-the-art in human pose estimation in video with an implementation that is roughly 23× faster than the previous standard implementation.
This paper presents a novel dimensionality reduction method for classification in medical imaging. The goal is to transform very high-dimensional input (typically, millions of voxels) to a ...low-dimensional representation (small number of constructed features) that preserves discriminative signal and is clinically interpretable. We formulate the task as a constrained optimization problem that combines generative and discriminative objectives and show how to extend it to the semi-supervised learning (SSL) setting. We propose a novel large-scale algorithm to solve the resulting optimization problem. In the fully supervised case, we demonstrate accuracy rates that are better than or comparable to state-of-the-art algorithms on several datasets while producing a representation of the group difference that is consistent with prior clinical reports. Effectiveness of the proposed algorithm for SSL is evaluated with both benchmark and medical imaging datasets. In the benchmark datasets, the results are better than or comparable to the state-of-the-art methods for SSL. For evaluation of the SSL setting in medical datasets, we use images of subjects with mild cognitive impairment (MCI), which is believed to be a precursor to Alzheimer's disease (AD), as unlabeled data. AD subjects and normal control (NC) subjects are used as labeled data, and we try to predict conversion from MCI to AD on follow-up. The semi-supervised extension of this method not only improves the generalization accuracy for the labeled data (AD/NC) slightly but is also able to predict subjects which are likely to converge to AD.