•A novel multi-sensor framework is proposed for Sign Language Recognition (SLR).•The framework recognize dynamic sign words performed by hearing impaired persons.•A recognition combination framework ...using Coupled-HMM (CHMM) is proposed for SLR.•Results shows higher accuracy on CHMM over uni-modal and other fusion approaches.
Recent development of low cost depth sensors such as Leap motion controller and Microsoft kinect sensor has opened up new opportunities for Human-Computer-Interaction (HCI). In this paper, we propose a novel multi-sensor fusion framework for Sign Language Recognition (SLR) using Coupled Hidden Markov Model (CHMM). CHMM provides interaction in state-space instead of observation states as used in classical HMM that fails to model correlation between inter-modal dependencies. The framework has been used to recognize dynamic isolated sign gestures performed by hearing impaired persons. The dataset has been tested using existing data fusion approaches. The best recognition accuracy has been achieved as high as 90.80% with CHMM. Our CHMM-based approach shows improvement in recognition performance over popular existing data fusion techniques.
Sign language facilitates communication between hearing impaired peoples and the rest of the society. A number of sign language recognition (SLR) systems have been developed by researchers, but they ...are limited to isolated sign gestures only. In this paper, we propose a modified long short-term memory (LSTM) model for continuous sequences of gestures or continuous SLR that recognizes a sequence of connected gestures. It is based on splitting of continuous signs into sub-units and modeling them with neural networks. Thus, the consideration of a different combination of sub-units is not required during training. The proposed system has been tested with 942 signed sentences of Indian Sign Language (ISL). These sign sentences are recognized using 35 different sign words. The average accuracy of 72.3% and 89.5% has been recorded on signed sentences and isolated sign words, respectively.
We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of ...their (skeletons') shapes as trajectories on Kendall's shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. 1, 2 to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporal-registration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions, as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. Using only the skeletal information, we achieve state-of-the-art classification results on these datasets.
In recent decades, great progress has been made in the contactless detection of infant and child respiratory motion. Systematic camera errors and disturbing environmental influences can have a strong ...impact on the signal quality of depth cameras. The prevalence of respiratory diseases in children and the central role that respiratory activity plays in the treatment of these conditions necessitate a robust and non-intrusive way of quantifying nocturnal respiratory activity in children. The aim of our study was to assess the robustness of various depth cameras (Microsoft Kinect V2, PMD CamBoard pico flexx, PMD CamBoard pico monstar, ORBBEC Astra Pro, Intel RealSense D435) regarding ambient factors typically found during the use in clinical and home settings. We investigated the influence of viewing angles, reflectivity and IR light on the signal quality. By correlating depth images with respiratory - belt signals, our analyses show that three cameras achieved adequate performance and might be considered for use in contactless assessment of infant or child respiration. Several cameras failed under some of the ambient conditions which are likely to be encountered in real life settings. Therefore, a careful camera selection regarding environmental influences is pivotal.
Computing and modifying in real-time the trajectory of an industrial robot involved in a Human-Robot Collaboration (HRC) scenario is a challenging problem, mainly because of two conflicting ...requirements: ensuring the human worker’s safety and completing the task assigned to the robot. This paper proposes a novel trajectory generation algorithm conceived to maximize productivity while taking into account safety requirements as actual constraints. At first, safety constraints are formulated by taking into account a manipulator and a set of arbitrarily-shaped convex obstacles. Then, a sensor fusion algorithm merges together the measurements acquired from different depth sensors and outputs a noise-free estimation of the kinematic configuration of a human worker moving inside the robotic cell. This estimation is then used to predict the space that the human worker can occupy within the robot stopping time in terms of a set of convex swept volumes. By considering these swept volumes as obstacles, the robot controller can modify the pre-programmed trajectory in order to enforce the safety constraints (thus avoiding collision with the human worker), while preventing task interruption. The proposed trajectory generation algorithm is validated through several experiments performed on an ABB IRB140 industrial robot.
Automatic Sign Language Recognition (SLR) systems are usually designed by means of recognizing hand and finger gestures. However, facial expressions play an important role to represent the emotional ...states during sign language communication, has not yet been analyzed to its fullest potential in SLR systems. A SLR system is incomplete without the signer’s facial expressions corresponding to the sign gesture. In this paper, we present a novel multimodal framework for SLR system by incorporating facial expression with sign gesture using two different sensors, namely Leap motion and Kinect. Sign gestures are recorded using Leap motion and simultaneously a Kinect is used to capture the facial data of the signer. We have collected a dataset of 51 dynamic sign word gestures. The recognition is performed using Hidden Markov Model (HMM). Next, we have applied Independent Bayesian Classification Combination (IBCC) approach to combine the decision of different modalities for improving recognition performance. Our analysis shows promising results with recognition rates of 96.05% and 94.27% for single and double hand gestures, respectively. The proposed multimodal framework achieves 1.84% and 2.60% gains as compared to uni-modal framework on single and double hand gestures, respectively.
Although camera and sensor noise are often disregarded, assumed negligible or dealt with in the context of denoising, in this paper we show that significant information can actually be deduced from ...camera noise about the captured scene and the objects within it. Specifically, we deal with depth cameras and their noise patterns. We show that from sensor noise alone, the object’s depth and location in the scene can be deduced. Sensor noise can indicate the source camera type, and within a camera type the specific device used to acquire the images. Furthermore, we show that noise distribution on surfaces provides information about the light direction within the scene as well as allows to distinguish between real and masked faces. Finally, we show that the size of depth shadows (missing depth data) is a function of the object’s distance from the background, its distance from the camera and the object’s size. Hence, can be used to authenticate objects location in the scene. This paper provides tools and insights into what can be learned from depth camera sensor noise.
Automatic identification of human interaction is a challenging task especially in dynamic environments with cluttered backgrounds from video sequences. Advancements in computer vision sensor ...technologies provide powerful effects in human interaction recognition (HIR) during routine daily life. In this paper, we propose a novel features extraction method which incorporates robust entropy optimization and an efficient Maximum Entropy Markov Model (MEMM) for HIR via multiple vision sensors. The main objectives of proposed methodology are: (1) to propose a hybrid of four novel features-i.e., spatio-temporal features, energy-based features, shape based angular and geometric features-and a motion-orthogonal histogram of oriented gradient (MO-HOG); (2) to encode hybrid feature descriptors using a codebook, a Gaussian mixture model (GMM) and fisher encoding; (3) to optimize the encoded feature using a cross entropy optimization function; (4) to apply a MEMM classification algorithm to examine empirical expectations and highest entropy, which measure pattern variances to achieve outperformed HIR accuracy results. Our system is tested over three well-known datasets: SBU Kinect interaction; UoL 3D social activity; UT-interaction datasets. Through wide experimentations, the proposed features extraction algorithm, along with cross entropy optimization, has achieved the average accuracy rate of 91.25% with SBU, 90.4% with UoL and 87.4% with UT-Interaction datasets. The proposed HIR system will be applicable to a wide variety of man-machine interfaces, such as public-place surveillance, future medical applications, virtual reality, fitness exercises and 3D interactive gaming.
The integration of mobile robotic platforms in human gait analysis offers the potential to develop multiple medical applications and achieve new discoveries. The aim of this paper is to present a ...first design and validation of a ROS-based mobile robotic platform for human gait analysis.
During the design stage, the model identification and the configuration of the control law were performed. The design of the control law required the integration of a lead compensator and a Filtered Smith Predictor (FSP). During the validation procedure, the accuracy of the system to retrieve kinematic gait data and the main descriptors of gait disorders was calculated with respect to the ground truth of a Vicon system. For this purpose, one hundred gait recordings were processed thanks to the collaboration of twenty participants. The participants walked in a one-way straight line gait.
Results showed high correlation and low error rates mainly in joint excursions from sagittal and transverse planes.
This gait analysis system demonstrated several advantages compared with the current approaches. The use of a mobile robotic platform allowed gait analysis in long tracking ranges and without space limitations. Furthermore, the design of a suitable control law allowed a smooth tracking of the person. This led to optimal results when assessing joint excursions.
This system represents a cost-effective and non-invasive alternative that could be used for human gait analysis applications.
•Non-invasive and cost-effective system for gait analysis in long tracking ranges.•Classical Control and Model Based Control Techniques for person following.•Validation of the system for kinematic gait analysis.•Mobile robotic platform for human gait analysis.•Depth sensor based mobile robotic platform.
Monocular depth estimation is a basic and critical task in computer vision that finds wide applications in various domains, including robot navigation and autonomous driving. A prevailing method ...nowadays is leveraging hybrid depth datasets obtained from various depth sensors to predict affine-invariant depth under supervised learning. However, the varying depth ranges in hybrid datasets can result in an unstable network. While some affine-invariant loss functions have been introduced, existing methods may lead to sub-optimal geometric structure, such as blurred boundaries and details. To tackle this issue, our approach is centered on reinforcing the local structural perception of images. Specifically, we propose a novel pixel-level supervised loss, called the windowed correlation regression loss. It computes the windowed Pearson correlation coefficient to constrain the similarity of data distribution within a local region. Additionally, we introduce a new coarse-to-fine multi-scale normal loss in conjunction with the former loss to further improve geometric accuracy. Our experimental results on six zero-shot datasets demonstrate that our method outperforms state-of-the-art methods. In terms of local geometric structural precision, our method achieves the sharper edges and more consistent local grayscale.