The task of reconstructing 3D scenes based on visual data represents a longstanding problem in computer vision. Common reconstruction approaches rely on the use of multiple volumetric primitives to ...describe complex objects. Superquadrics (a class of volumetric primitives) have shown great promise due to their ability to describe various shapes with only a few parameters. Recent research has shown that deep learning methods can be used to accurately reconstruct random superquadrics from both 3D point cloud data and simple depth images. In this paper, we extended these reconstruction methods to intensity and color images. Specifically, we used a dedicated convolutional neural network (CNN) model to reconstruct a single superquadric from the given input image. We analyzed the results in a qualitative and quantitative manner, by visualizing reconstructed superquadrics as well as observing error and accuracy distributions of predictions. We showed that a CNN model designed around a simple ResNet backbone can be used to accurately reconstruct superquadrics from images containing one object, but only if one of the spatial parameters is fixed or if it can be determined from other image characteristics, e.g., shadows. Furthermore, we experimented with images of increasing complexity, for example, by adding textures, and observed that the results degraded only slightly. In addition, we show that our model outperforms the current state-of-the-art method on the studied task. Our final result is a highly accurate superquadric reconstruction model, which can also reconstruct superquadrics from real images of simple objects, without additional training.
A rare and valuable Palaeolithic wooden point, presumably belonging to a hunting weapon, was found in the Ljubljanica River in Slovenia in 2008. In order to prevent complete decay, the waterlogged ...wooden artefact had to undergo conservation treatment, which usually involves some expected deformations of structure and shape. To investigate these changes, a series of surface-based 3D models of the artefact were created before, during and after the conservation process. Unfortunately, the surface-based 3D models were not sufficient to understand the internal processes inside the wooden artefact (cracks, cavities, fractures). Since some of the surface-based 3D models were taken with a microtomographic scanner, we decided to create a volumetric 3D model from the available 2D tomographic images. In order to have complete control and greater flexibility in creating the volumetric 3D model than is the case with commercial software, we decided to implement our own algorithm. In fact, two algorithms were implemented for the construction of surface-based 3D models and for the construction of volumetric 3D models, using (1) unsegmented 2D images CT and (2) segmented 2D images CT. The results were positive in comparison with commercial software and new information was obtained about the actual state and causes of the deformation of the artefact. Such models could be a valuable aid in the selection of appropriate conservation and restoration methods and techniques in cultural heritage research.
Reconstruction of 3D space from visual data has always been a significant challenge in the field of computer vision. A popular approach to address this problem can be found in the form of bottom-up ...reconstruction techniques which try to model complex 3D scenes through a constellation of volumetric primitives. Such techniques are inspired by the current understanding of the human visual system and are, therefore, strongly related to the way humans process visual information, as suggested by recent visual neuroscience literature. While advances have been made in recent years in the area of 3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train efficient models for the prediction of volumetric primitives. In this article, we address these challenges and present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D shapes using only a few parameters. We present a new learning objective that relies on the superquadric (inside-outside) function and develop two learning strategies for training convolutional neural networks (CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the difference between the predicted and reference superquadric parameters. The second strategy uses implicit supervision and penalizes differences between the input depth images and depth images rendered from the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions of the modelled scenes at a much shorter processing time.
In this paper we present the development of an interactive, content-aware and cost-effective digital signage system. Using a monocular camera installed within the frame of a digital signage display, ...we employ real-time computer vision algorithms to extract temporal, spatial and demographic features of the observers, which are further used for observer-specific broadcasting of digital signage content. The number of observers is obtained by the Viola and Jones face detection algorithm, whilst facial images are registered using multi-view Active Appearance Models. The distance of the observers from the system is estimated from the interpupillary distance of registered faces. Demographic features, including gender and age group, are determined using SVM classifiers to achieve individual observer-specific selection and adaption of the digital signage broadcasting content. The developed system was evaluated at the laboratory study level and in a field study performed for audience measurement research. Comparison of our monocular localization module with the Kinect stereo-system reveals a comparable level of accuracy. The facial characterization module is evaluated on the FERET database with 95% accuracy for gender classification and 92% for age group. Finally, the field study demonstrates the applicability of the developed system in real-life environments.
•An alternative way of acquiring input parameters for trajectory calculations is proposed.•Machine learning tries to predict performances based on recorded similar flights from the past.•User are ...often left with default parameters, which are not always the best fit.•A cheap alternative until planned downlinked trajectory will be in place.
Accurate prediction of aircraft position is becoming more and more important for the future of air traffic. Currently, the lack of information about flights prevents us to fulfill future demands for the needed accuracy in 4D trajectory prediction. Until we get the necessary information from aircraft and until new more accurate methods are implemented and used, we propose an alternative method for predicting aircraft performances using machine learning from historical data about past flights collected in a multidimensional database. In that way, we can improve existing applications by providing them better inputs for their trajectory calculations. Our method uses flight plan data to predict performance values, which are suited individually for each flight. The results show that based on recorded past aircraft performances and related flight data we can effectively predict performances for future flights based on how similar flights behaved in the past.
In the doctoral thesis we developed an interactive and user-adaptive information interface based on computer vision and machine learning methods. By using a camera-enhanced digital signage display we ...employed real-time computer vision algorithms to extract temporal, spatial, and demographic features of the observers, which are further used for observer specific broadcasting of digital signage contents. The algorithms were chosen and modified to optimize the balance between accuracy and time complexity, subjected to design-aim to perform in real-time and using conventional hardware. More particularly, we used the Mixture of Gaussians method for background segmentation, Viola & Jones method for face detection algorithm, Active Appearance Models for face alignment and POSIT algorithm for head pose estimation. The developed interface is used as the key research tool to explore three currently open problems in the field of human-computer interaction: dynamic anamorphosis, quantitative audience measurement study of digital signage in real-world environment, and modeling of the purchase decision process. In the first study, we developed a new interactive computer vision based method which adapts image projection to the changing position of the observer so that wherever the observer moves, he sees the same undeformed image. We call this capacity dynamic anamorphosis. We formalized the anamorphic transformation and proposed a real-time algorithm for tracking the 3D position of the observer's eyes and the re-computation of the anamorphic deformation. As an interesting application, we show that dynamic anamorphosis could be used to improve eye-contact in videoconferencing. In the second study, we used the developed interface to perform a quantitative audience measurement field study, which evaluates user attention. Temporal metrics of a person's dwell time, display in-view time and attention time are extracted using real-time image analysis. The system also determines demographic metrics of the gender and age group based on images of faces. The digital signage display was deployed in a real-world environment of a clothing boutique, where demographic and viewership data of 1294 store customers were recorded, manually verified and analysed. The analysis shows that 35% of customers specifically looked-at the display, having the average attention time of 0. 7 s. Interestingly, the attention time was substantially higher for men (1. 2 s) than for women (0. 4 s). In the third study, the interface is applied to model the purchase decision process, which is an interdisciplinary study, where data collected with the developed interface and subjected to machine learning are combined to model and analyze the decision and roles in a purchasing process. Finally, more generally, the developed system presents a contribution to the field of human-computer interaction and shows further possibilities for scientific use and applications, such as open problem of display blindness, development of new interactive methods for broadcasting of relevant content, and quantitative analysis of user behavior.
We present a quantitative study of digital signage audience measurement using computer vision. We developed a camera-enhanced digital signage display that acquires audience measurement metrics with ...computer vision algorithms. Temporal metrics of a person's dwell time, display in-view time and attention time are extracted. The system also determines demographic metrics of the gender and age group. The digital signage display was deployed in a real-world environment of a clothing boutique, where demographic and viewership data of 1294 store customers were recorded, manually verified and analysed. The analysis shows that 35% of customers specifically looked-at the display, having the average attention time of 0.7 s. Interestingly, the attention time was substantially higher for men (1.2 s) than for women (0.4 s). Age group comparison reveals that children (1-14 years) are the most responsive to the digital signage. Finally, the analysis shows that the average attention time is significantly higher when displaying the dynamic content (0.9 s) when compared with the static content (0.6 s).
Global corporations are characterized by a large number of employees and geographically dispersed offices. Moreover, the competitiveness in the global market requires them to invest in their human ...resources to be able to remain a step ahead of competition. Implementing large scale classical education in such environments is challenging and costly. Mobile e-learning (m-learning) allows users to tailor their professional training and education to their needs and time constraints. However, in self-paced education, it is very hard to keep user retention and engagement. To achieve the latter, we have designed and developed an m-learning platform for corporate environments based on the triggering persuasive technology principle that try to incite users in regularly using the platform. We have evaluated the application in-the-wild in corporate environments of differently sized companies with 300 users. Users were subjected to three different conditions: no triggering, simple regular triggering, and adaptive triggering. The results show that the use of adaptive triggering in m-learning increases user engagement as well as course completion rates more than simple regular triggering and no triggering.
•Most videoconferencing systems are hindered by lack of proper eye-contact.•A simple, yet efficient method for improving the perceived eye-contact between conversing parties is proposed.•The method ...is based on the properties of the human visual system.
When people talk to each other, eye contact is very important for a trustful and efficient communication. Video-conferencing systems were invented to enable such communication over large distances, recently using mostly Internet and personal computers. Despite low cost of such solutions, a broader acceptance and use of these communication means has not happened yet. One of the most important reasons for this situation is that it is almost impossible to establish eye contact between distant parties on the most common hardware configurations of such videoconferencing systems, where the camera for face capture is usually mounted above the computer monitor, where the face of the correspondent is observed. Different hardware and software solutions to this problem of missing eye contact have been proposed over the years. In this article we propose a simple solution that can improve the subjective feeling of eye contact, which is based on how people perceive 3D scenes displayed on slanted surfaces, and offer some experiments in support of the hypothesis.