The services offered by robots are expected to increase in near future because of the shortage of worker and the low birthrate and aging population. People-detection and human-following functions are ...indispensable for service robots. Sensors, such as depth sensors, laser rangefinders (LRFs), and RGB cameras are often used for this purpose. However, the LRF is an infrared sensor; therefore, it is difficult to recognize a person. Color information in the camera is easily affected by the brightness of the room, and it can be difficult even to detect the same person from the front and from the back. Therefore, we propose a human-following robot that uses the torso information obtained from a depth sensor. We used the feature quantity of head height and torso area to distinguish among persons who have approximately the same height. In addition, using the proposed feature information made it possible to follow a person in a dark area even after losing sight of the person; this could be done regardless of whether the robot was before or after the person.
We present a novel approach to estimate the distance between a generic point in the Cartesian space and objects detected with a depth sensor. This information is crucial in many robotic applications, ...e.g., for collision avoidance, contact point identification, and augmented reality. The key idea is to perform all distance evaluations directly in the depth space. This allows distance estimation by considering also the frustum generated by the pixel on the depth image, which takes into account both the pixel size and the occluded points. Different techniques to aggregate distance data coming from multiple object points are proposed. We compare the Depth space approach with the commonly used Cartesian space or Configuration space approaches, showing that the presented method provides better results and faster execution times. An application to human-robot collision avoidance using a KUKA LWR IV robot and a Microsoft Kinect sensor illustrates the effectiveness of the approach.
Trajectory-based writing system refers to writing a linguistic character or word in free space by moving a finger, marker, or handheld device. It is widely applicable where traditional pen-up and ...pen-down writing systems are troublesome. Due to the simple writing style, it has a great advantage over the gesture-based system. However, it is a challenging task because of the non-uniform characters and different writing styles. In this research, we developed an air-writing recognition system using three-dimensional (3D) trajectories collected by a depth camera that tracks the fingertip. For better feature selection, the nearest neighbor and root point translation was used to normalize the trajectory. We employed the long short-term memory (LSTM) and a convolutional neural network (CNN) as a recognizer. The model was tested and verified by the self-collected dataset. To evaluate the robustness of our model, we also employed the 6D motion gesture (6DMG) alphanumeric character dataset and achieved 99.32% accuracy which is the highest to date. Hence, it verifies that the proposed model is invariant for digits and characters. Moreover, we publish a dataset containing 21,000 digits; which solves the lack of dataset in the current research.
Automated recognition of human activities or actions has great significance as it incorporates wide-ranging applications, including surveillance, robotics, and personal health monitoring. Over the ...past few years, many computer vision-based methods have been developed for recognizing human actions from RGB and depth camera videos. These methods include space-time trajectory, motion encoding, key poses extraction, space-time occupancy patterns, depth motion maps, and skeleton joints. However, these camera-based approaches are affected by background clutter and illumination changes and applicable to a limited field of view only. Wearable inertial sensors provide a viable solution to these challenges but are subject to several limitations such as location and orientation sensitivity. Due to the complementary trait of the data obtained from the camera and inertial sensors, the utilization of multiple sensing modalities for accurate recognition of human actions is gradually increasing. This paper presents a viable multimodal feature-level fusion approach for robust human action recognition, which utilizes data from multiple sensors, including RGB camera, depth sensor, and wearable inertial sensors. We extracted the computationally efficient features from the data obtained from RGB-D video camera and inertial body sensors. These features include densely extracted histogram of oriented gradient (HOG) features from RGB/depth videos and statistical signal attributes from wearable sensors data. The proposed human action recognition (HAR) framework is tested on a publicly available multimodal human action dataset UTD-MHAD consisting of 27 different human actions. K-nearest neighbor and support vector machine classifiers are used for training and testing the proposed fusion model for HAR. The experimental results indicate that the proposed scheme achieves better recognition results as compared to the state of the art. The feature-level fusion of RGB and inertial sensors provides the overall best performance for the proposed system, with an accuracy rate of 97.6%.
In this paper, we propose an articulated and generalized Gaussian kernel correlation (GKC)-based framework for human pose estimation. We first derive a unified GKC representation that generalizes the ...previous sum of Gaussians (SoG)-based methods for the similarity measure between a template and an observation both of which are represented by various SoG variants. Then, we develop an articulated GKC (AGKC) by integrating a kinematic skeleton in a multivariate SoG template that supports subject-specific shape modeling and articulated pose estimation for both the full body and the hands. We further propose a sequential (body/hand) pose tracking algorithm by incorporating three regularization terms in the AGKC function, including visibility, intersection penalty, and pose continuity. Our tracking algorithm is simple yet effective and computationally efficient. We evaluate our algorithm on two benchmark depth data sets. The experimental results are promising and competitive when compared with the state-of-the-art algorithms.
This paper presents a novel approach for depth video enhancement. Given a high-resolution color video and its corresponding low-quality depth video, we improve the quality of the depth video by ...increasing its resolution and suppressing noise. For that, a weighted mode filtering method is proposed based on a joint histogram. When the histogram is generated, the weight based on color similarity between reference and neighboring pixels on the color image is computed and then used for counting each bin on the joint histogram of the depth map. A final solution is determined by seeking a global mode on the histogram. We show that the proposed method provides the optimal solution with respect to L 1 norm minimization. For temporally consistent estimate on depth video, we extend this method into temporally neighboring frames. Simple optical flow estimation and patch similarity measure are used for obtaining the high-quality depth video in an efficient manner. Experimental results show that the proposed method has outstanding performance and is very efficient, compared with existing methods. We also show that the temporally consistent enhancement of depth video addresses a flickering problem and improves the accuracy of depth video.
•Accuracy of stair GRFs using depth sensor-driven musculoskeletal model was assessed.•Study subjects were ACL patients following ACL reconstruction surgery.•The estimation of GRFs was highly ...dependent on the evaluated force component.•This method has the potential as a cost-effective tool in the clinical setting.
Although stair ambulation should be included in the rehabilitation of the long-term effects of ACL injury on knee function, the assessment of kinetic parameter in the situation where stair gait can only be established using costly and cumbersome force platforms via conventional inverse dynamic analysis. Therefore, there is a need to develop a practical laboratory setup as an assessment tool of the stair gait abnormalities in lower extremity that arise from an ACL deficiency.
Can the use of a single depth sensor-driven full-body musculoskeletal gait model be considered an accurate assessment tool of the ground reaction forces (GRFs) during stair climbing for patients following ACL reconstruction (ACLR) surgery?
A total of 15 patients who underwent ACLR participated in this study. GRFs data during stair climbing was collected using a custom-built 3-step staircase with two embedded force platforms. A single depth sensor, commercially available and cost effective, was used to obtain participants’ depth map information to extract the full-body skeleton information. The AnyBody TM GaitFullBody model was utilized to estimate GRFs attained by 25 artificial muscle-like actuators placed under each foot. Mean differences between the measured and estimated GRFs were compared using paired samples t-tests. The ensemble curves of the GRFs were compared between both approaches during stance phase of the gait cycle.
The findings of this study showed that the estimation of the GRFs produced during staircase gait using a depth sensor-driven musculoskeletal model can produce acceptable results when compared to the traditional inverse dynamics modelling approach as an alternative tool in clinical settings for individuals who had undergone ACLR.
The introduced approach of full-body musculoskeletal modelling driven by a single depth sensor has the potential to be a cost-effective stair gait analysis tool for patients with ACL injury.