Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely ...time-consuming and tedious task. Due to this, there is a large demand for tools or techniques which can alleviate this burden. In this research we present a tool that sidesteps this problem, and produces joint transforms directly from raw marker data (a task commonly called "solving") in a way that is extremely robust to errors in the input data using the machine learning technique of
denoising.
Starting with a set of marker configurations, and a large database of skeletal motion data such as the CMU motion capture database CMU 2013b, we synthetically reconstruct marker locations using linear blend skinning and apply a unique
noise function
for corrupting this marker data - randomly removing and shifting markers to dynamically produce billions of examples of poses with errors similar to those found in real motion capture data. We then train a deep denoising feed-forward neural network to learn a mapping from this corrupted marker data to the corresponding transforms of the joints. Once trained, our neural network can be used as a replacement for the solving part of the motion capture pipeline, and, as it is very robust to errors, it completely removes the need for any manual clean-up of data. Our system is accurate enough to be used in production, generally achieving precision to within a few millimeters, while additionally being extremely fast to compute with low memory requirements.
Commercial motion-capture systems produce excellent in-studio reconstructions, but offer no comparable solution for acquisition in everyday environments. We present a system for acquiring motions ...almost anywhere. This wearable system gathers ultrasonic time-of-flight and inertial measurements with a set of inexpensive miniature sensors worn on the garment. After recording, the information is combined using an Extended Kalman Filter to reconstruct joint configurations of a body. Experimental results show that even motions that are traditionally difficult to acquire are recorded with ease within their natural settings. Although our prototype does not reliably recover the global transformation, we show that the resulting motions are visually similar to the original ones, and that the combined acoustic and intertial system reduces the drift commonly observed in purely inertial systems. Our final results suggest that this system could become a versatile input device for a variety of augmented-reality applications.
Improving work conditions in industry is a major challenge that can be addressed with new emerging technologies such as collaborative robots. Machine learning techniques can improve the performance ...of those robots, by endowing them with a degree of awareness of the human state and ergonomics condition. The availability of appropriate datasets to learn models and test prediction and control algorithms, however, remains an issue. This article presents a dataset of human motions in industry-like activities, fully labeled according to the ergonomics assessment worksheet EAWS, widely used in industries such as car manufacturing. Thirteen participants performed several series of activities, such as screwing and manipulating loads under different conditions, resulting in more than 5 hours of data. The dataset contains the participants’ whole-body kinematics recorded both with wearable inertial sensors and marker-based optical motion capture, finger pressure force, video recordings, and annotations by three independent annotators of the performed action and the adopted posture following the EAWS postural grid. Sensor data are available in different formats to facilitate their reuse. The dataset is intended for use by researchers developing algorithms for classifying, predicting, or evaluating human motion in industrial settings, as well as researchers developing collaborative robotics solutions that aim at improving the workers’ ergonomics. The annotation of the whole dataset following an ergonomics standard makes it valuable for ergonomics-related applications, but we expect its use to be broader in the robotics, machine learning, and human movement communities.
Recently, sparse-inertial human pose estimation (SI-HPE) with only a few IMUs has shown great potential in various fields. The most advanced work in this area achieved fairish results using only six ...IMUs. However, there are still two major issues that remain to be addressed. First, existing methods typically treat SI-HPE as a temporal sequential learning problem and often ignore the important spatial prior of skeletal topology. Second, there are far more synthetic data in their training data than real data, and the data distribution of synthetic data and real data is quite different, which makes it difficult for the model to be applied to more diverse real data. To address these issues, we propose “Graph-based Adversarial Inertial Poser (GAIP)”, which tracks body movements using sparse data from six IMUs. To make full use of the spatial prior, we design a multi-stage pose regressor with graph convolution to explicitly learn the skeletal topology. A joint position loss is also introduced to implicitly mine spatial information. To enhance the generalization ability, we propose supervising the pose regression with an adversarial loss from a discriminator, bringing the ability of adversarial networks to learn implicit constraints into full play. Additionally, we construct a real dataset that includes hip support movements and a synthetic dataset containing various motion categories to enrich the diversity of inertial data for SI-HPE. Extensive experiments demonstrate that GAIP produces results with more precise limb movement amplitudes and relative joint positions, accompanied by smaller joint angle and position errors compared to state-of-the-art counterparts. The datasets and codes are publicly available at https://cslinzhang.github.io/GAIP/ .
We present a real‐time system for character control that relies on the classification of locomotive actions in skeletal motion capture data. Our method is both progress dependent and style invariant. ...Two deep neural networks are used to correlate body shape and implicit dynamics to locomotive types and their respective progress. In comparison to related work, our approach does not require a setup step and enables the user to act in a natural, unconstrained manner. Also, our method displays better performance than the related work in scenarios where the actor performs sharp changes in direction and highly stylized motions while maintaining at least as good performance in other scenarios. Our motivation is to enable character control of non‐bipedal characters in virtual production and live immersive experiences, where mannerisms in the actor's performance may be an issue for previous methods.
We present a real‐time system for character control that relies on the classification of locomotive actions in skeletal motion capture data. Our method is both progress dependent and style invariant. Two deep neural networks are used to correlate body shape and implicit dynamics to locomotive types and their respective progress. In comparison to related work, our approach does not require a setup step and enables the user to act in a natural, unconstrained manner. Also, our method displays better performance than the related work in scenarios where the actor performs sharp changes in direction and highly stylized motions while maintaining at least as good performance in other scenarios. Our motivation is to enable character control of non‐bipedal characters in virtual production and live immersive experiences, where mannerisms in the actor's performance may be an issue for previous methods.
We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural ...methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.
ObjectivesPronator drift is one of many clinical signs that would benefit from detailed study, but this requires accurate measurement of movement in three dimensions. The Vicon system is currently ...considered to be the gold standard for measurement of limb kinetics in a movement analysis lab but it cannot be used at the bedside for many reasons. This study aimed to investigate a portable camera-based motion capture system (MCS) as a clinically-useful alternative.MethodsThe MCS used two commercially-available cameras arranged so as to permit stereoscopic calculation of depth (i.e. distance from the cameras), and therefore a 3-D representation of movement at the shoulder, elbow and wrist. Data were obtained simultaneously from both movement capture and Vicon systems while three normal subjects simulated four scenarios of the pronator drift test in each limb. Outputs from Vicon and MCS were analysed using Matlab to determine root mean square error (RMSE) in XYZ coordinates. A priori, an acceptable difference was considered to be an average RMSE of < 10 mm.ResultsCollectively, the studies generated 53,424 sets of data. The average RMSE in the XYZ axis was 14.9 mm (range 5.0-20.3 mm). Inaccuracy was greatest at the wrist during trials involving larger degrees of pronation.ConclusionThe motion capture system was able to generate a 3-D trajectory of limb motion but further refinement is needed before it can be used for the purposes of clinical measurement.
Automatic synthesis of realistic gestures promises to transform the fields of animation, avatars and communicative agents. In off‐line applications, novel tools can alter the role of an animator to ...that of a director, who provides only high‐level input for the desired animation; a learned network then translates these instructions into an appropriate sequence of body poses. In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters. In this paper we address some of the core issues towards these ends. By adapting a deep learning‐based motion synthesis method called MoGlow, we propose a new generative model for generating state‐of‐the‐art realistic speech‐driven gesticulation. Owing to the probabilistic nature of the approach, our model can produce a battery of different, yet plausible, gestures given the same input speech signal. Just like humans, this gives a rich natural variation of motion. We additionally demonstrate the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent. Such control can be leveraged to convey a desired character personality or mood. We achieve all this without any manual annotation of the data. User studies evaluating upper‐body gesticulation confirm that the generated motions are natural and well match the input speech. Our method scores above all prior systems and baselines on these measures, and comes close to the ratings of the original recorded motions. We furthermore find that we can accurately control gesticulation styles without unnecessarily compromising perceived naturalness. Finally, we also demonstrate an application of the same method to full‐body gesticulation, including the synthesis of stepping motion and stance.