3D-FUTURE: 3D Furniture Shape with TextURE Fu, Huan; Jia, Rongfei; Gao, Lin ...
International journal of computer vision,
12/2021, Letnik:
129, Številka:
12
Journal Article
Recenzirano
Odprti dostop
The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them ...less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 9992 modern 3D furniture shapes with high-resolution textures and detailed attributes. To support the studies of 3D modeling from images, we couple the CAD models with 20,240 scene images. The room scenes are designed by professional designers or generated by an industrial scene creating system. Given the well-organized 3D-FUTURE and its characteristics, we provide a package of baseline experiments, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, texture recovery for 3D shapes, and furniture composition, to facilitate related future researches on our database.
Nonnegative matrix factorization (NMF) has been greatly popularized by its parts-based interpretation and the effective multiplicative updating rule for searching local solutions. In this paper, we ...study the problem of how to obtain an attractive local solution for NMF, which not only fits the given training data well but also generalizes well on the unseen test data. Based on the geometric interpretation of NMF, we introduce two large-cone penalties for NMF and propose large-cone NMF (LCNMF) algorithms. Compared with NMF, LCNMF will obtain bases comprising a larger simplicial cone, and therefore has three advantages. 1) the empirical reconstruction error of LCNMF could mostly be smaller; (2) the generalization ability of the proposed algorithm is much more powerful; and (3) the obtained bases of LCNMF have a low-overlapping property, which enables the bases to be sparse and makes the proposed algorithms very robust. Experiments on synthetic and real-world data sets confirm the efficiency of LCNMF.
We propose a coarse-fine network (CFN) that exploits multi-level supervisions for keypoint localization. Recently, convolutional neural networks (CNNs)-based methods have achieved great success due ...to the powerful hierarchical features in CNNs. These methods typically use confidence maps generated from ground-truth keypoint locations as supervisory signals. However, while some keypoints can be easily located with high accuracy, many of them are hard to localize due to appearance ambiguity. Thus, using strict supervision often fails to detect keypoints that are difficult to locate accurately To target this problem, we develop a keypoint localization network composed of several coarse detector branches, each of which is built on top of a feature layer in a CNN, and a fine detector branch built on top of multiple feature layers. We supervise each branch by a specified label map to explicate a certain supervision strictness level. All the branches are unified principally to produce the final accurate keypoint locations. We demonstrate the efficacy, efficiency, and generality of our method on several benchmarks for multiple tasks including bird part localization and human body pose estimation. Especially, our method achieves 72.2% AP on the 2016 COCO Keypoints Challenge dataset, which is an 18% improvement over the winning entry.
Scale-invariant keypoint detection is a fundamental problem in low-level vision. To accelerate keypoint detectors (e.g. DoG, Harris-Laplace, Hessian-Laplace) that are developed in Gaussian ...scale-space, various fast detectors (e.g., SURF, CenSurE, and BRISK) have been developed by approximating Gaussian filters with simple box filters. However, there is no principled way to design the shape and scale of the box filters. Additionally, the involved integral image technique makes it difficult to figure out the continuous kernels that correspond to the discrete ones used in these detectors, so there is no guarantee that those good properties such as causality in the original Gaussian space can be inherited. To address these issues, in this paper, we propose a unified B-spline framework for scale-invariant keypoint detection. Owing to an approximate relationship to Gaussian kernels, the B-spline framework provides a mathematical interpretation of existing fast detectors based on integral images. In addition, from B-spline theories, we illustrate the problem in repeated integration, which is the generalized version of the integral image technique. Finally, following the dominant measures for keypoint detection and automatic scale selection, we develop B-spline determinant of Hessian (B-DoH) and B-spline Laplacian-of-Gaussian (B-LoG) as two instantiations within the unified B-spline framework. For efficient computation, we propose to use
repeated running-sums
to convolve images with B-spline kernels with fixed orders, which avoids the problem of integral images by introducing an extra interpolation kernel. Our B-spline detectors can be designed in a principled way without the heuristic choice of kernel shape and scales and naturally extend the popular SURF and CenSurE detectors with more complex kernels. Extensive experiments on the benchmark dataset demonstrate that the proposed detectors outperform the others in terms of repeatability and efficiency.
Object detection has been widely studied in the computer vision community and it has many real applications, despite its variations, such as scale, pose, lighting, and background. Most classical ...object detection methods heavily rely on category-based training to handle intra-class variations. In contrast to classical methods that use a rigid category-based representation, exemplar-based methods try to model variations among positives by learning from specific positive samples. However, current existing exemplar-based methods either fail to use any training information or suffer from a significant performance drop when few exemplars are available. In this paper, we design a novel local metric learning approach to well handle exemplar-based object detection task. The main works are two-fold: 1) a novel local metric learning algorithm called exemplar metric learning (EML) is designed and 2) an exemplar-based object detection algorithm based on EML is implemented. We evaluate our method on two generic object detection data sets: UIUC-Car and UMass FDDB. Experiments show that compared with other exemplar-based methods, our approach can effectively enhance object detection performance when few exemplars are available.
We study a first-price auction with two bidders where one bidder is characterized by a constant relative risk aversion utility function (i.e., a concave power function) while the other has a general ...concave utility function. We establish the existence and uniqueness of the optimal strategic markups and analyze the effects of one bidder’s risk aversion level on the optimal strategic markups of him and his opponent’s, the allocative efficiency of the auction, and the seller’s expected revenue, respectively.
Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal ...pathways. In this paper, a new local image descriptor, termed distinctive efficient robust features (DERF), is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG, which leads to much better performance. Extensive experiments conducted in the image matching task on the multiview stereo correspondence data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
Accurate segmentation is an essential task when working with medical images. Recently, deep convolutional neural networks achieved a state-of-the-art performance for many segmentation benchmarks. ...Regardless of the network architecture, the deep learning-based segmentation methods view the segmentation problem as a supervised task that requires a relatively large number of annotated images. Acquiring a large number of annotated medical images is time consuming, and high-quality segmented images (i.e., strong labels) crafted by human experts are expensive. In this paper, we have proposed a method that achieves competitive accuracy from a "weakly annotated" image where the weak annotation is obtained via a 3D bounding box denoting an object of interest. Our method, called "3D-BoxSup," employs a positive-unlabeled learning framework to learn segmentation masks from 3D bounding boxes. Specially, we consider the pixels outside of the bounding box as positively labeled data and the pixels inside the bounding box as unlabeled data. Our method can suppress the negative effects of pixels residing between the true segmentation mask and the 3D bounding box and produce accurate segmentation masks. We applied our method to segment a brain tumor. The experimental results on the BraTS 2017 dataset (Menze et al., 2015; Bakas et al., 2017a,b,c) have demonstrated the effectiveness of our method.
Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level ...information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multilayer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and faster convergence in synch. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI 16, Make3D 49, and NYU Depth v2 41, and outperforms existing methods by a large margin.
Few-shot font generation (FFG), which aims to generate a new font with a few examples, is gaining increasing attention due to the significant reduction in labor cost. A typical FFG pipeline considers ...characters in a standard font library as content glyphs and transfers them to a new target font by extracting style information from the reference glyphs. Most existing solutions explicitly disentangle content and style of reference glyphs globally or component-wisely. However, the style of glyphs mainly lies in the local details, i.e. the styles of radicals, components, and strokes together depict the style of a glyph. Therefore, even a single character can contain different styles distributed over spatial locations. In this paper, we propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs. Therefore, each spatial location in the content glyph can be assigned with the right fine-grained style. To this end, we adopt cross-attention over the representation of the content glyphs as the queries and the representations of the reference glyphs as the keys and values. Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs. The experiments show that the proposed method outperforms the state-of-the-art methods in FFG. In particular, the user studies also demonstrate the style consistency of our approach significantly outperforms previous methods.