Recognizing human actions in 3-D video sequences is an important open problem that is currently at the heart of many research domains including surveillance, natural interfaces and rehabilitation. ...However, the design and development of models for action recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, clothing and appearance. In this paper, we propose a new framework to extract a compact representation of a human action captured through a depth sensor, and enable accurate action recognition. The proposed solution develops on fitting a human skeleton model to acquired data so as to represent the 3-D coordinates of the joints and their change over time as a trajectory in a suitable action space. Thanks to such a 3-D joint-based framework, the proposed solution is capable to capture both the shape and the dynamics of the human body, simultaneously. The action recognition problem is then formulated as the problem of computing the similarity between the shape of trajectories in a Riemannian manifold. Classification using k-nearest neighbors is finally performed on this manifold taking advantage of Riemannian geometry in the open curve shape space. Experiments are carried out on four representative benchmarks to demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Comparative results with state-of-the-art methods are reported.
Automatic facial expression recognition is essential for many potential applications. Thus, having a clear overview on existing datasets that have been investigated within the framework of face ...expression recognition is of paramount importance in designing and evaluating effective solutions, notably for neural networks-based training. In this survey, we provide a review of more than eighty facial expression datasets, while taking into account both macro- and micro-expressions. The proposed study is mostly focused on spontaneous and in-the-wild datasets, given the common trend in the research is that of considering contexts where expressions are shown in a spontaneous way and in a real context. We have also provided instances of potential applications of the investigated datasets, while putting into evidence their pros and cons. The proposed survey can help researchers to have a better understanding of the characteristics of the existing datasets, thus facilitating the choice of the data that best suits the particular context of their application.
With fast increase in volume of mobile multimedia data, how to apply powerful deep learning methods to process data with real-time response becomes a major issue. Meanwhile, edge computing structure ...helps improve response time and user experience by bringing flexible computation and storage capabilities. Considering both technologies for successful AI-based applications, we propose an edge-computing driven and end-to-end framework to perform tasks of image enhancement and object detection under low-light conditions. The framework consists of a cloud-based enhancement and an edge-based detection stage. In the first stage, we establish connections between edge devices and cloud servers to input re-scaled illumination parts of low-light images, where enhancement subnetworks are dynamically and parallel coupled to compute enhanced illumination parts based on low-light context. During the edge-based detection stage, edge devices could accurately and rapidly detect objects based on cloud-computed informative feature map. Experimental results show the proposed method significantly improves detection performance in low-light conditions with low latency running on edge devices.
In this paper, we present a novel and original framework, which we dubbed mesh-local binary pattern (LBP), for computing local binary-like-patterns on a triangular-mesh manifold. This framework can ...be adapted to all the LBP variants employed in 2D image analysis. As such, it allows extending the related techniques to mesh surfaces. After describing the foundations, the construction and the main features of the mesh-LBP, we derive its possible variants and show how they can extend most of the 2D-LBP variants to the mesh manifold. In the experiments, we give evidence of the presence of the uniformity aspect in the mesh-LBP, similar to the one noticed in the 2D-LBP. We also report repeatability experiments that confirm, in particular, the rotation-invariance of mesh-LBP descriptors. Furthermore, we analyze the potential of mesh-LBP for the task of 3D texture classification of triangular-mesh surfaces collected from public data sets. Comparison with state-of-the-art surface descriptors, as well as with 2D-LBP counterparts applied on depth images, also evidences the effectiveness of the proposed framework. Finally, we illustrate the robustness of the mesh-LBP with respect to the class of mesh irregularity typical to 3D surface-digitizer scans.
Extending the concept of texture to the geometry of a mesh manifold surface is an emerging topic in computer vision. This concept is different from gluing images to the surface, but rather indicates ...the presence of relief patterns that locally change the surface geometry, showing some regular and repetitive patterns. The representation and the analysis of such relief patterns have several potential applications. In this paper, we propose an original and comprehensive framework to address this novel task, which redefines a large variety of local binary patterns on the mesh manifold domain. We also propose an efficient mesh re-sampling technique that enables uniform surface tessellation. We assess the different descriptive variants derived with this framework in terms of uniformity, repeatability and discriminative power. Afterward, we conduct an extensive experimentation on different datasets showcasing the competitiveness of our framework in classification and retrieval tasks, in terms of both accuracy and computational complexity, with respect to state-of-the-art methods.
Human pose estimation is an important Computer Vision problem, whose goal is to estimate the human body through joints. Currently, methods that employ deep learning techniques excel in the task of 2D ...human pose estimation. However, the use of 3D poses can bring more accurate and robust results. Since 3D pose labels can only be acquired in restricted scenarios, fully convolutional methods tend to perform poorly on the task. One strategy to solve this problem is to use 2D pose estimators, to estimate 3D poses in two steps using 2D pose inputs. Due to database acquisition constraints, the performance improvement of this strategy can only be observed in controlled environments, therefore domain adaptation techniques can be used to increase the generalization capability of the system by inserting information from synthetic domains. In this work, we propose a novel method called Domain Unified approach, aimed at solving pose misalignment problems on a cross-dataset scenario, through a combination of three modules on top of the pose estimator: pose converter, uncertainty estimator, and domain classifier. Our method led to a 44.1mm (29.24%) error reduction, when training with the SURREAL synthetic dataset and evaluating with Human3.6M over a no-adaption scenario, achieving state-of-the-art performance.
Facial Action Units (AUs) correspond to the deformation/contraction of individual facial muscles or their combinations. As such, each AU affects just a small portion of the face, with deformations ...that are asymmetric in many cases. Generating and analyzing AUs in 3D is particularly relevant for the potential applications it can enable. In this paper, we propose a solution for 3D AU detection and synthesis by developing on a newly defined 3D Morphable Model (3DMM) of the face. Differently from most of the 3DMMs existing in the literature, which mainly model global variations of the face and show limitations in adapting to local and asymmetric deformations, the proposed solution is specifically devised to cope with such difficult morphings. During a training phase, the deformation coefficients are learned that enable the 3DMM to deform to 3D target scans showing neutral and facial expression of the same individual, thus decoupling expression from identity deformations. Then, such deformation coefficients are used, on the one hand, to train an AU classifier, on the other, they can be applied to a 3D neutral scan to generate AU deformations in a subject-independent manner. The proposed approach for AU detection is validated on the Bosphorus dataset, reporting competitive results with respect to the state-of-the-art, even in a challenging cross-dataset setting. We further show the learned coefficients are general enough to synthesize realistic 3D face instances with AUs activation.
Methods to recognize humans’ facial expressions have been proposed mainly focusing on 2D still images and videos. In this paper, the problem of person-independent facial expression recognition is ...addressed using the 3D geometry information extracted from the 3D shape of the face. To this end, a completely automatic approach is proposed that relies on identifying a set of facial keypoints, computing SIFT feature descriptors of depth images of the face around sample points defined starting from the facial keypoints, and selecting the subset of features with maximum relevance. Training a Support Vector Machine (SVM) for each facial expression to be recognized, and combining them to form a multi-class classifier, an average recognition rate of 78.43% on the BU-3DFE database has been obtained. Comparison with competitor approaches using a common experimental setting on the BU-3DFE database shows that our solution is capable of obtaining state of the art results. The same 3D face representation framework and testing database have been also used to perform 3D facial expression retrieval (i.e., retrieve 3D scans with the same facial expression as shown by a target subject), with results proving the viability of the proposed solution.
Accurate and timely flood forecasting, facilitated by Remote Sensing technology, is crucial to mitigate the damage and loss of life caused by floods. However, despite years of research, accurate ...flood prediction still faces numerous challenges, including complex spatiotemporal features and varied flood patterns influenced by multivariable. Moreover, long-term flood forecasting is always tricky due to the constantly changing conditions of the surrounding environment. In this study, we propose a Heterogeneous Dynamic Temporal Graph Convolution Network (HD-TGCN) for flood forecasting. Specifically, we designed a Dynamic Temporal Graph Convolution Module (D-TGCM) to generate a dynamic adjacency matrix by incorporating a multi-head self-attention mechanism, enabling our model to capture the dynamic spatiotemporal features of flood data by utilizing temporal graph convolution operations on the dynamic matrix. Furthermore, to reflect the impact of multiple meteorological and hydrological features on the heterogeneity of flood data, we propose a novel approach that utilizes multiple parallel D-TGCM for processing heterogeneous graph data and implements a fusion mechanism to capture varied flood patterns influenced by multivariable. Experiments conducted on a real dataset in Wuyuan County, Jiangxi Province, demonstrate that HD-TGCN outperforms the state-of-the-art flood prediction models in MAE, NSE, and RMSE, with improvements of 80.32%, 0.15%, and 73.99%, respectively, providing a more accurate flood forecasting method that will play a critical role in future flood disaster prevention and control.
Deep learning for 3D vision Guo, Yulan; Wang, Hanyun; Clark, Ronald ...
IET computer vision,
October 2022, 2022-10-00, 2022-10-01, Volume:
16, Issue:
7
Journal Article