Paired Regions for Shadow Detection and Removal Ruiqi Guo; Qieyun Dai; Hoiem, Derek
IEEE transactions on pattern analysis and machine intelligence,
12/2013, Letnik:
35, Številka:
12
Journal Article
Recenzirano
In this paper, we address the problem of shadow detection and removal from single images of natural scenes. Differently from traditional methods that explore pixel or edge information, we employ a ...region-based approach. In addition to considering individual regions separately, we predict relative illumination conditions between segmented regions from their appearances and perform pairwise classification based on such information. Classification results are used to build a graph of segments, and graph-cut is used to solve the labeling of shadow and nonshadow regions. Detection results are later refined by image matting, and the shadow-free image is recovered by relighting each pixel based on our lighting model. We evaluate our method on the shadow detection dataset in Zhu et al. . In addition, we created a new dataset with shadow-free ground truth images, which provides a quantitative basis for evaluating shadow removal. We study the effectiveness of features for both unary and pairwise classification.
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a ...shared space where they are compared for relevance with an inner product. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the recently released VQA 1 dataset, which features free-form human-annotated questions and answers.
Learning without Forgetting Li, Zhizhong; Hoiem, Derek
IEEE transactions on pattern analysis and machine intelligence,
12/2018, Letnik:
40, Številka:
12
Journal Article
Recenzirano
Odprti dostop
When building a unified vision system or gradually adding new apabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks ...grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.
In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. ...Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout – the locations of walls and floor in the image – to refine the 3D object estimates. We use an existing approach to compute spatial layout 1, and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.
This paper presents a fully automatic method for creating a 3D model from a single photograph. The model is made up of several texture-mapped planar billboards and has the complexity of a typical ...children's pop-up book illustration. Our main insight is that instead of attempting to recover precise geometry, we statistically model
geometric classes
defined by their orientations in the scene. Our algorithm labels regions of the input image into coarse categories: "ground", "sky", and "vertical". These labels are then used to "cut and fold" the image into a pop-up model using a set of simple assumptions. Because of the inherent ambiguity of the problem and the statistical nature of the approach, the algorithm is not expected to work on every image. However. it performs surprisingly well for a wide range of scenes taken from a typical person's photo album.
In this paper, we consider the problem of recovering the spatial layout of indoor scenes from monocular images. The presence of clutter is a major problem for existing single-view 3D reconstruction ...algorithms, most of which rely on finding the ground-wall boundary. In most rooms, this boundary is partially or entirely occluded. We gain robustness to clutter by modeling the global room space with a parameteric 3D "box" and by iteratively localizing clutter and refitting the box. To fit the box, we introduce a structured learning algorithm that chooses the set of parameters to minimize error, based on global perspective cues. On a dataset of 308 images, we demonstrate the ability of our algorithm to recover spatial layout in cluttered rooms and show several examples of estimated free space.
Current fiducial marker detection algorithms rely on marker IDs for false positive rejection. Time is wasted on potential detections that will eventually be rejected as false positives. We introduce ...ChromaTag, a fiducial marker and detection algorithm designed to use opponent colors to limit and quickly reject initial false detections and grayscale for precise localization. Through experiments, we show that ChromaTag is significantly faster than current fiducial markers while achieving similar or better detection accuracy. We also show how tag size and viewing direction effect detection accuracy. Our contribution is significant because fiducial markers are often used in real-time applications (e.g. marker assisted robot navigation) where heavy computation is required by other parts of the system.
Recent approaches for predicting layouts from 360
∘
panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on ...edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g., SegNet or ResNet), type of elements predicted (e.g., corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset (Chang et al.: Matterport3d: learning from rgb-d data in indoor environments.
arXiv:1709.06158
, 2017), and introduce two depth-based evaluation metrics.
We propose a category-independent method to produce a bag of regions and rank them, such that top-ranked regions are likely to be good segmentations of different objects. Our key objectives are ...completeness and diversity: Every object should have at least one good proposed region, and a diverse set should be top-ranked. Our approach is to generate a set of segmentations by performing graph cuts based on a seed region and a learned affinity function. Then, the regions are ranked using structured learning based on various cues. Our experiments on the Berkeley Segmentation Data Set and Pascal VOC 2011 demonstrate our ability to find most objects within a small bag of proposed regions.
In this paper we consider the problem of recovering the free space of an indoor scene from its single image. We show that exploiting the box like geometric structure of furniture and constraints ...provided by the scene, allows us to recover the extent of major furniture objects in 3D. Our "boxy" detector localizes box shaped objects oriented parallel to the scene across different scales and object types, and thus blocks out the occupied space in the scene. To localize the objects more accurately in 3D we introduce a set of specially designed features that capture the floor contact points of the objects. Image based metrics are not very indicative of performance in 3D. We make the first attempt to evaluate single view based occupancy estimates for 3D errors and propose several task driven performance measures towards it. On our dataset of 592 indoor images marked with full 3D geometry of the scene, we show that: (a) our detector works well using image based metrics; (b) our refinement method produces significant improvements in localization in 3D; and (c) if one evaluates using 3D metrics, our method offers major improvements over other single view based scene geometry estimation methods.