Single image dehazing has been a challenging problem which aims to recover clear images from hazy ones. The performance of existing image dehazing methods is limited by hand-designed features and ...priors. In this paper, we propose a multi-scale deep neural network for single image dehazing by learning the mapping between hazy images and their transmission maps. The proposed algorithm consists of a coarse-scale net which predicts a holistic transmission map based on the entire image, and a fine-scale net which refines dehazed results locally. To train the multi-scale deep network, we synthesize a dataset comprised of hazy images and corresponding transmission maps based on the NYU Depth dataset. In addition, we propose a holistic edge guided network to refine edges of the estimated transmission map. Extensive experiments demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods on both synthetic and real-world images in terms of quality and speed.
In this work we present an end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval. This system is based on a region proposal ...mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.
Portraiture is a major art form in both photography and painting. In most instances, artists seek to make the subject stand out from its surrounding, for instance, by making it brighter or sharper. ...In the digital world, similar effects can be achieved by processing a portrait image with photographic or painterly filters that adapt to the semantics of the image. While many successful user‐guided methods exist to delineate the subject, fully automatic techniques are lacking and yield unsatisfactory results. Our paper first addresses this problem by introducing a new automatic segmentation algorithm dedicated to portraits. We then build upon this result and describe several portrait filters that exploit our automatic segmentation algorithm to generate high‐quality portraits.
•The S–kNN algorithm identifies an optimal k value for each test sample.•Our approach takes the local structures of samples into account.•This paper proposes a novel optimization method to solve the ...designed objective function.
This paper studies an example-driven k-parameter computation that identifies different k values for different test samples in kNN prediction applications, such as classification, regression and missing data imputation. This is carried out with reconstructing a sparse coefficient matrix between test samples and training data. In the reconstruction process, an ℓ1−norm regularization is employed to generate an element-wise sparsity coefficient matrix, and an LPP (Locality Preserving Projection) regularization is adopted to keep the local structures of data for achieving the efficiency. Further, with the learnt k value, kNN approach is applied to classification, regression and missing data imputation. We experimentally evaluate the proposed approach with 20 real datasets, and show that our algorithm is much better than previous kNN algorithms in terms of data mining tasks, such as classification, regression and missing value imputation.
We present data products from the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS). CFHTLenS is based on the Wide component of the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS). It ...encompasses 154 deg2 of deep, optical, high-quality, sub-arcsecond imaging data in the five optical filters u*g
′
r
′
i
′
z
′. The scientific aims of the CFHTLenS team are weak gravitational lensing studies supported by photometric redshift estimates for the galaxies. This paper presents our data processing of the complete CFHTLenS data set. We were able to obtain a data set with very good image quality and high-quality astrometric and photometric calibration. Our external astrometric accuracy is between 60 and 70 mas with respect to Sloan Digital Sky Survey (SDSS) data, and the internal alignment in all filters is around 30 mas. Our average photometric calibration shows a dispersion of the order of 0.01-0.03 mag for g
′
r
′
i
′
z
′ and about 0.04 mag for u* with respect to SDSS sources down to i
SDSS ≤ 21. We demonstrate in accompanying papers that our data meet necessary requirements to fully exploit the survey for weak gravitational lensing analyses in connection with photometric redshift studies. In the spirit of the CFHTLS, all our data products are released to the astronomical community via the Canadian Astronomy Data Centre at http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/community/CFHTLens/query.html. We give a description and how-to manuals of the public products which include image pixel data, source catalogues with photometric redshift estimates and all relevant quantities to perform weak lensing studies.
•Two metaheuristic algorithms (WOA and MFO) are used.•These algorithms are applied to multilevel thresholding image segmentation.•MFO and WOA are better than compared algorithms.•MFO is better than ...WOA for higher number of thresholds.
Determining the optimal thresholding for image segmentation has got more attention in recent years since it has many applications. There are several methods used to find the optimal thresholding values such as Otsu and Kapur based methods. These methods are suitable for bi-level thresholding case and they can be easily extended to the multilevel case, however, the process of determining the optimal thresholds in the case of multilevel thresholding is time-consuming. To avoid this problem, this paper examines the ability of two nature inspired algorithms namely: Whale Optimization Algorithm (WOA) and Moth-Flame Optimization (MFO) to determine the optimal multilevel thresholding for image segmentation. The MFO algorithm is inspired from the natural behavior of moths which have a special navigation style at night since they fly using the moonlight, whereas, the WOA algorithm emulates the natural cooperative behaviors of whales. The candidate solutions in the adapted algorithms were created using the image histogram, and then they were updated based on the characteristics of each algorithm. The solutions are assessed using the Otsu’s fitness function during the optimization operation. The performance of the proposed algorithms has been evaluated using several of benchmark images and has been compared with five different swarm algorithms. The results have been analyzed based on the best fitness values, PSNR, and SSIM measures, as well as time complexity and the ANOVA test. The experimental results showed that the proposed methods outperformed the other swarm algorithms; in addition, the MFO showed better results than WOA, as well as provided a good balance between exploration and exploitation in all images at small and high threshold numbers.
Selective Search for Object Recognition Uijlings, J. R. R.; van de Sande, K. E. A.; Gevers, T. ...
International journal of computer vision,
09/2013, Letnik:
104, Številka:
2
Journal Article
Recenzirano
Odprti dostop
This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and ...segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software:
http://disi.unitn.it/~uijlings/SelectiveSearch.html
).
Image representations, from SIFT and bag of visual words to convolutional neural networks (CNNs) are a crucial component of almost all computer vision systems. However, our understanding of them ...remains limited. In this paper we study several landmark representations, both shallow and deep, by a number of complementary visualization techniques. These visualizations are based on the concept of “natural pre-image”, namely a natural-looking image whose representation has some notable property. We study in particular three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We pose these as a regularized energy-minimization framework and demonstrate its generality and effectiveness. In particular, we show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.
We propose the task of
free-form
and
open-ended
Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language ...answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing
∼
0.25 M images,
∼
0.76 M questions, and
∼
10 M answers (
www.visualqa.org
), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (
http://cloudcv.org/vqa
).