In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder–decoder approach using Recurrent ...Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of “fill-in-the-blank”, and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines.
•The S–kNN algorithm identifies an optimal k value for each test sample.•Our approach takes the local structures of samples into account.•This paper proposes a novel optimization method to solve the ...designed objective function.
This paper studies an example-driven k-parameter computation that identifies different k values for different test samples in kNN prediction applications, such as classification, regression and missing data imputation. This is carried out with reconstructing a sparse coefficient matrix between test samples and training data. In the reconstruction process, an ℓ1−norm regularization is employed to generate an element-wise sparsity coefficient matrix, and an LPP (Locality Preserving Projection) regularization is adopted to keep the local structures of data for achieving the efficiency. Further, with the learnt k value, kNN approach is applied to classification, regression and missing data imputation. We experimentally evaluate the proposed approach with 20 real datasets, and show that our algorithm is much better than previous kNN algorithms in terms of data mining tasks, such as classification, regression and missing value imputation.
Crack detection is a crucial task in periodic pavement survey. This study establishes and compares the performance of two intelligent approaches for automatic recognition of pavement cracks. The ...first model relies on edge detection approaches of the Sobel and Canny algorithms. Since the implementation of the two edge detectors require the setting of threshold values, Differential Flower Pollination, as a metaheuristic, is employed to fine-tune the model parameters. The second model is constructed by the implementation of the Convolution Neural Network (CNN) – a deep learning algorithm. CNN has the advantage of performing the feature extraction and the prediction of crack/non-crack condition in an integrated and fully automated manner. Experimental results show that the model based on CNN achieves a good prediction performance of Classification Accuracy Rate (CAR) = 92.08%. This performance is significantly better than the method based on the edge detection algorithms (CAR = 79.99%). Accordingly, the proposed CNN based crack detection model is a promising alternative to support transportation agencies in the task of periodic pavement inspection.
•Two approaches for automatic recognition of pavement crack are constructed.•The first approach relies on the Sobel and Canny edge detection algorithms.•The second approach employs deep neural network.•Edge detection based model attains an accuracy rate of 79.99%.•Deep neural network achieves a superior accuracy rate of 92.08%.
Localization-based super-resolution techniques open the door to unprecedented analysis of molecular organization. This task often involves complex image processing adapted to the specific topology ...and quality of the image to be analyzed. Here we present a segmentation framework based on Voronoï tessellation constructed from the coordinates of localized molecules, implemented in freely available and open-source SR-Tesseler software. This method allows precise, robust and automatic quantification of protein organization at different scales, from the cellular level down to clusters of a few fluorescent markers. We validated our method on simulated data and on various biological experimental data of proteins labeled with genetically encoded fluorescent proteins or organic fluorophores. In addition to providing insight into complex protein organization, this polygon-based method should serve as a reference for the development of new types of quantifications, as well as for the optimization of existing ones.
•Digital image colorimetry on smartphone is a powerful, fast and low-cost analysis method.•Detecting target analyte with color changes of digital image of sample.•Principle, color spaces, components, ...and application of digital image colorimetry on smartphone were summarized.•Digital image colorimetry on smartphone will be improved with the rapid development of smartphone’s camera and APPs.
Digital image colorimetry (DIC) on smartphone is regarded as a powerful, fast and low-cost analysis method to measure target analyte with color changes of digital image obtained by the built-in camera. We summarized the basic procedure of DIC, the color spaces (RGB, CMYK, HSB/HSL, CIE XYZ, L*a*b*, and YUV), the principal architectures (tools for capturing image, lighting conditions, and color quantification APPs and DIC APPs), and current status of DIC on smartphone in analysis of metals/heavy metals, herbicides, pesticides, antibiotics, biological and medical indicators, natural compounds, and bacteria/viruses. The advantages and disadvantages of DIC are also revealed. Nowadays, DIC on smartphone must be further refined with controlled geometry and standard lighting sources to become robust and reliable analytical procedures. And it will be improved in the near future with the continuous development of smartphones owing to the rapid development of smartphone’s camera technology and the continuous optimization of related software.
This paper introduces a video representation based on dense trajectories and motion boundary descriptors. Trajectories capture the local motion information of the video. A dense representation ...guarantees a good coverage of foreground motion as well as of the surrounding context. A state-of-the-art optical flow algorithm enables a robust and efficient extraction of dense trajectories. As descriptors we extract features aligned with the trajectories to characterize shape (point coordinates), appearance (histograms of oriented gradients) and motion (histograms of optical flow). Additionally, we introduce a descriptor based on motion boundary histograms (MBH) which rely on differential optical flow. The MBH descriptor shows to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion. We evaluate our video representation in the context of action classification on nine datasets, namely KTH, YouTube, Hollywood2, UCF sports, IXMAS, UIUC, Olympic Sports, UCF50 and HMDB51. On all datasets our approach outperforms current state-of-the-art results.
Existing computational models for salient object detection primarily rely on hand-crafted features, which are only able to capture low-level contrast information. In this paper, we learn the ...hierarchical contrast features by formulating salient object detection as a binary labeling problem using deep learning techniques. A novel superpixelwise convolutional neural network approach, called SuperCNN, is proposed to learn the internal representations of saliency in an efficient manner. In contrast to the classical convolutional networks, SuperCNN has four main properties. First, the proposed method is able to learn the hierarchical contrast features, as it is fed by two meaningful superpixel sequences, which is much more effective for detecting salient regions than feeding raw image pixels. Second, as SuperCNN recovers the contextual information among superpixels, it enables large context to be involved in the analysis efficiently. Third, benefiting from the superpixelwise mechanism, the required number of predictions for a densely labeled map is hugely reduced. Fourth, saliency can be detected independent of region size by utilizing a multiscale network structure. Experiments show that SuperCNN can robustly detect salient objects and outperforms the state-of-the-art methods on three benchmark datasets.
Zero-shot learning for visual recognition, e.g., object and action recognition, has recently attracted a lot of attention. However, it still remains challenging in bridging the semantic gap between ...visual features and their underlying semantics and transferring knowledge to semantic categories unseen during learning. Unlike most of the existing zero-shot visual recognition methods, we propose a stagewise bidirectional latent embedding framework of two subsequent learning stages for zero-shot visual recognition. In the bottom–up stage, a latent embedding space is first created by exploring the topological and labeling information underlying training data of known classes via a proper supervised subspace learning algorithm and the latent embedding of training data are used to form landmarks that guide embedding semantics underlying unseen classes into this learned latent space. In the top–down stage, semantic representations of unseen-class labels in a given label vocabulary are then embedded to the same latent space to preserve the semantic relatedness between all different classes via our proposed semi-supervised Sammon mapping with the guidance of landmarks. Thus, the resultant latent embedding space allows for predicting the label of a test instance with a simple nearest-neighbor rule. To evaluate the effectiveness of the proposed framework, we have conducted extensive experiments on four benchmark datasets in object and action recognition, i.e., AwA, CUB-200-2011, UCF101 and HMDB51. The experimental results under comparative studies demonstrate that our proposed approach yields the state-of-the-art performance under inductive and transductive settings.
•An adaptive underwater image restoration system which take advantage of DCP and CycleGAN.•Multi-Scale Structural Similarity Index Measure loss to improve the image restoration performance.•An ...end-to-end system for underwater image restoration.
Underwater image restoration, which is the keystone to the underwater vision research, is still a challenging work. The key point of underwater image restoration work is how to remove the turbidity and the color distortion caused by the underwater environment. In this paper, we propose an underwater image restoration method based on transferring an underwater style image into a recovered style using Multi-Scale Cycle Generative Adversarial Network (MCycle GAN) System. We include a Structural Similarity Index Measure loss (SSIM loss), which can provide more flexibility to model the detail structural to improve the image restoration performance. We use dark channel prior (DCP) algorithm to get the transmission map of an image and design an adaptive SSIM loss to improve underwater image quality. We input this information into the network for multi-scale calculation on the images, which achieves the combination of DCP algorithm and Cycle-Consistent Adversarial Networks (CycleGAN). By compared the quantitative and qualitative with existing state-of-the-art approaches, our method shows a pleasing performance on the underwater image dataset.
Image matching has a history of more than 50 years, with the first experiments performed with analogue procedures for cartographic and mapping purposes. The recent integration of computer vision ...algorithms and photogrammetric methods is leading to interesting procedures which have increasingly automated the entire image‐based 3D modelling process. Image matching is one of the key steps in 3D modelling and mapping. This paper presents a critical review and analysis of four dense image‐matching algorithms, available as open‐source and commercial software, for the generation of dense point clouds. The eight datasets employed include scenes recorded from terrestrial and aerial blocks, acquired with convergent and normal (parallel axes) images, and with different scales. Geometric analyses are reported in which the point clouds produced with each of the different algorithms are compared with one another and also to ground‐truth data.
Résumé
L'histoire de l'appariement d'images remonte à plus de cinquante ans, lorsque les premières expériences ont été réalisées avec des procédures analogiques pour des applications cartographiques. L'intégration récente d'algorithmes de vision par ordinateur et de méthodes photogrammétriques a rendu possibles des procédures très intéressantes dans lesquelles la modélisation 3D à base d'images est de plus en plus automatisée. L'appariement d'images est une des étapes essentielles de la modélisation et de la cartographie tridimensionnelles. Cet article passe en revue et analyse de manière critique quatre algorithmes d'appariement dense d'images, disponibles dans des logiciels libres et commerciaux, pour la production de nuages denses de points. Les huit jeux de données utilisés sont des scènes issues de blocs d'images terrestres et aériennes, acquises avec des axes de prise de vue convergents et parallèles, et à différentes échelles. L'analyse géométrique présentée consiste à comparer les nuages de points obtenus par les différents algorithmes, entre eux ainsi qu'avec des données de terrain.
Zusammenfassung
Die digitale Bildzuordnung hat seit den ersten analogen Ansätzen für die automatisierte Kartierung eine über 50‐jährige Geschichte. Die Integration von Computer Vision Algorithmen und photogrammetrischen Methoden hat zu Verfahren geführt, die den gesamten Prozess der bildbasierten 3D‐Modellierung zunehmend automatisieren. Die Bildzuordnung ist einer der zentralen Schritte in der 3D‐Modellierung und der Kartierung. Dieser Beitrag gibt einen kritischen Überblick und eine Analyse von vier Bildzuordnungsverfahren, die als Open‐Source oder kommerziell erhältlich sind und durch dichte Zuordnung dichte Punktwolken erzeugen. Die Basis für die Evaluation bilden acht Datensätze aus terrestrischen und Luftbildblöcken, die mit konvergenten und mit Normalaufnahmen in verschiedenen Maßstäben erzeugt wurden. Die vorgestellten geometrischen Analysen umfassen Vergleiche der erzeugten Punktwolken untereinander aber auch zu Solldaten.
Resumen
La correspondencia de imágenes tiene una historia de más de 50 años, desde los primeros experimentos realizados con procesos analógicos y fines cartográficos. La reciente integración de algoritmos de visión por computador y métodos fotogramétricos está dando lugar a procedimientos interesantes que automatizan cada vez más todo el proceso de modelado 3D basado en imágenes. La correspondencia de imágenes es uno de los pasos clave en el modelado 3D y la cartografía. Este artículo presenta una revisión crítica y un análisis de cuatro algoritmos de correspondencia de imágenes, disponibles como software libre y comercial, para la generación de nubes densas de puntos. Los ocho conjuntos de datos usados incluyen escenas pertenecientes a bloques terrestres y aéreos, adquiridos con imágenes convergentes y de ejes paralelos, y a diferentes escalas. Se analiza geométricamente las nubes de puntos producidas con cada algoritmo comparándolas con la de los otros y también con la referencia.
摘要
影像匹配技术在模拟摄影测量中首次应用开始,已经有50年的发展历史。近几年,计算机视觉算法和摄影测量方法融合进行影像匹配促进了基于影像进行三维建模自动化程度。影像匹配是三维建模和制图的关键步骤,本文详细阐述和分析了四种开源软件和商业软件中为制作点云的密集匹配算法,采用通过中心投影或平行光投影方式获取的不同分辨率的地面摄影和航空摄影的八组影像进行测试分析,比较了不同算法制作的点云并和地面真值进行比较。