In this paper, we propose a highly accurate inpainting algorithm which reconstructs an image from a fraction of its pixels. Our algorithm is inspired by the recent progress of non‐local image ...processing techniques following the idea of ‘grouping and collaborative filtering’. In our framework, we first match and group similar patches in the input image, and then convert the problem of estimating missing values for the stack of matched patches to the problem of low‐rank matrix completion, and finally obtain the result by synthesizing all the restored patches. In our algorithm, how to accurately perform patch matching process and solve the low‐rank matrix completion problem are key points. For the first problem, we propose a robust patch matching approach, and for the second task, the alternating direction method of multipliers is employed. Experiments show that our algorithm has superior advantages over existing inpainting techniques. Besides, our algorithm can be easily extended to handle practical applications including rendering acceleration, photo restoration and object removal.
In this paper, we propose a highly accurate inpainting algorithm which reconstructs an image from a fraction of its pixels. Our algorithm is inspired by the recent progress of non‐local image processing techniques following the idea of ‘grouping and collaborative filtering.’ In our framework, we first match and group similar patches in the input image, and then convert the problem of estimating missing values for the stack of matched patches to the problem of low‐rank matrix completion and finally obtain the result by synthesizing all the restored patches. In our algorithm, how to accurately perform patch matching process and solve the low‐rank matrix completion problem are key points. For the first problem, we propose a robust patch matching approach, and for the second task, the alternating direction method of multipliers is employed. Experiments show that our algorithm has superior advantages over existing inpainting techniques. Besides, our algorithm can be easily extended to handle practical applications including rendering acceleration, photo restoration and object removal.
In this paper, we investigate compressed sensing principles to devise an in‐situ data reduction framework for visualization of volumetric datasets. We exploit the universality of the compressed ...sensing framework and show that the proposed method offers a refinable data reduction approach for volumetric datasets. The accurate reconstruction is obtained from partial Fourier measurements of the original data that are sensed without any prior knowledge of specific feature domains for the data. Our experiments demonstrate the superiority of surfacelets for efficient representation of volumetric data. Moreover, we establish that the accuracy of reconstruction can further improve once a more effective basis for a sparser representation of the data becomes available.
Many useful algorithms for processing images and geometry fall under the general framework of high‐dimensional Gaussian filtering. This family of algorithms includes bilateral filtering and non‐local ...means. We propose a new way to perform such filters using the permutohedral lattice, which tessellates high‐dimensional space with uniform simplices. Our algorithm is the first implementation of a high‐dimensional Gaussian filter that is both linear in input size and polynomial in dimensionality. Furthermore it is parameter‐free, apart from the filter size, and achieves a consistently high accuracy relative to ground truth (> 45 dB). We use this to demonstrate a number of interactive‐rate applications of filters in as high as eight dimensions.
Image matting aims at extracting foreground elements from an image by means of color and opacity (alpha) estimation. While a lot of progress has been made in recent years on improving the accuracy of ...matting techniques, one common problem persisted: the low speed of matte computation. We present the first real‐time matting technique for natural images and videos. Our technique is based on the observation that, for small neighborhoods, pixels tend to share similar attributes. Therefore, independently treating each pixel in the unknown regions of a trimap results in a lot of redundant work. We show how this computation can be significantly and safely reduced by means of a careful selection of pairs of background and foreground samples. Our technique achieves speedups of up to two orders of magnitude compared to previous ones, while producing high‐quality alpha mattes. The quality of our results has been verified through an independent benchmark. The speed of our technique enables, for the first time, real‐time alpha matting of videos, and has the potential to enable a new class of exciting applications.
Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for this task: (1) lack of aligned training pairs and (2) multiple possible outputs from ...a single input image. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative evaluations, we measure realism with user study and Fréchet inception distance, and measure diversity with the perceptual distance metric, Jensen–Shannon divergence, and number of statistically-different bins.
This paper presents a novel interactive approach for adding depth information into hand‐drawn cartoon images and animations. In comparison to previous depth assignment techniques our solution ...requires minimal user effort and enables creation of consistent pop‐ups in a matter of seconds. Inspired by perceptual studies we formulate a custom tailored optimization framework that tries to mimic the way that a human reconstructs depth information from a single image. Its key advantage is that it completely avoids inputs requiring knowledge of absolute depth and instead uses a set of sparse depth (in)equalities that are much easier to specify. Since these constraints lead to a solution based on quadratic programming that is time consuming to evaluate we propose a simple approximative algorithm yielding similar results with much lower computational overhead. We demonstrate its usefulness in the context of a cartoon animation production pipeline including applications such as enhancement, registration, composition, 3D modelling and stereoscopic display.
The Open Images Dataset V4 Kuznetsova, Alina; Rom, Hassan; Alldrin, Neil ...
International journal of computer vision,
07/2020, Letnik:
128, Številka:
7
Journal Article
Recenzirano
We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons ...Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide
15
×
more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.
Underwater images suffer from color distortion and low contrast, because light is attenuated while it propagates through water. Attenuation under water varies with wavelength, unlike terrestrial ...images where attenuation is assumed to be spectrally uniform. The attenuation depends both on the water body and the 3D structure of the scene, making color restoration difficult. Unlike existing single underwater image enhancement techniques, our method takes into account multiple spectral profiles of different water types. By estimating just two additional global parameters: the attenuation ratios of the blue-red and blue-green color channels, the problem is reduced to single image dehazing, where all color channels have the same attenuation coefficients. Since the water type is unknown, we evaluate different parameters out of an existing library of water types. Each type leads to a different restored image and the best result is automatically chosen based on color distribution. We also contribute a dataset of 57 images taken in different locations. To obtain ground truth, we placed multiple color charts in the scenes and calculated its 3D structure using stereo imaging. This dataset enables a rigorous quantitative evaluation of restoration algorithms on natural images for the first time.
Learning to Prompt for Vision-Language Models Zhou, Kaiyang; Yang, Jingkang; Loy, Chen Change ...
International journal of computer vision,
09/2022, Letnik:
130, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional ...representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via
prompting
, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose
Context Optimization (CoOp)
, a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt’s context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts.
A Survey of Urban Reconstruction Musialski, P.; Wonka, P.; Aliaga, D. G. ...
Computer graphics forum,
September 2013, Letnik:
32, Številka:
6
Journal Article
Recenzirano
Odprti dostop
This paper provides a comprehensive overview of urban reconstruction. While there exists a considerable body of literature, this topic is still under active research. The work reviewed in this survey ...stems from the following three research communities: computer graphics, computer vision and photogrammetry and remote sensing. Our goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field. Further, we would like to bring the mentioned research communities to even more interdisciplinary work, since the reconstruction problem itself is by far not solved.
This paper provides a comprehensive overview of urban reconstruction. While there exists a considerable body of literature, this topic is still under active research. The work reviewed in this survey stems from the following three research communities: computer graphics, computer vision and photogrammetry and remote sensing. Our goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field. Further, we would like to bring the mentioned research communities to even more interdisciplinary work, since the reconstruction problem itself is by far not solved.