Micro-Doppler signatures obtained from the Doppler radar are generally used for human activity classification. However, if the angle between the direction of motion and radar antenna broadside is ...greater than 60°, the micro-Doppler signatures generated by the radial motion of human body reduce significantly, thereby degrading the performance of the classification algorithm. For the accurate classification of different human activities irrespective of trajectory, we propose a new algorithm based on dual micro-motion signatures, namely, the micro-Doppler and interferometric micro-motion signatures, using an interferometric radar. First, the motion of different parts of the human body is simulated using motion capture (MOCAP) data, which is further utilized for radar echo signal generation. Second, time-varying Doppler and interferometric spectrograms obtained from time-frequency analysis of a single Doppler receiver and interferometric output data, respectively, are fed as input to the deep convolutional neural network (DCNN) for feature extraction and the training/testing process. The performance of the proposed algorithm is analyzed and compared with a micro-Doppler signatures-based classifier. Results show that a dual micro-motion-based DCNN classifier using an interferometric radar is capable of classifying different human activities with an accuracy level of 98%, where Doppler signatures diminish considerably, providing insufficient information for classification. Verification of the proposed classification algorithm based on dual micro-motion signatures is also performed using a real radar test dataset of different human walking patterns, and a classification accuracy level of approximately 90% is achieved.
We investigate how adversarial learning may be used for various animation tasks related to human motion synthesis. We propose a learning framework that we decline for building various models ...corresponding to various needs: a random synthesis generator that randomly produces realistic motion capture trajectories; conditional variants that allow controlling the synthesis by providing high-level features that the animation should match; a style transfer model that allows transforming an existing animation in the style of another one. Our work is built on the adversarial learning strategy that has been proposed in the machine learning field very recently (2014) for learning accurate generative models on complex data, and that has been shown to provide impressive results, mainly on image data. We report both objective and subjective evaluation results on motion capture data performed under emotion, the Emilya Dataset. Our results show the potential of our proposals for building models for a variety of motion synthesis tasks.
Industrial back support exoskeletons are a promising solution to alleviate lumbar musculoskeletal strain. Due to the complexity of spinal loading, evaluation of EMG data alone has been considered ...insufficient to assess their support effects, and complementary kinematic and dynamic data are required. However, the acquisition of marker-based kinematics is challenging with exoskeletons, as anatomical reference points, particularly on the pelvis, are occluded by exoskeleton structures. The aim of this study was therefore to develop and validate a method to reliably reconstruct the occluded pelvic markers. The movement data of six subjects, for whom pelvic markers could be placed while wearing an exoskeleton, were used to test the reconstructions and compare them to anatomical landmarks during lifting, holding and walking. Two separate approaches were used for the reconstruction. One used a reference coordinate system based on only exoskeleton markers (EXO), as has been suggested in the literature, while our proposed method adds a technical marker in the lumbar region (LUMB) to compensate for any shifting between exoskeleton and pelvis. Reconstruction with EXO yielded on average an absolute linear deviation of 54 mm ± 16 mm (mean ± 1SD) compared to anatomical markers. The additional marker in LUMB reduced mean deviations to 14 mm ± 7 mm (mean ± 1SD). Both methods were compared to reference values from the literature for expected variances due to marker placement and soft tissue artifacts. For LUMB 99% of reconstructions were within the defined threshold of 24 mm ±9 mm while for EXO 91% were outside.
Human Motion Generation: A Survey Zhu, Wentao; Ma, Xiaoxuan; Ro, Dongwoo ...
IEEE transactions on pattern analysis and machine intelligence,
04/2024, Volume:
46, Issue:
4
Journal Article
Peer reviewed
Open access
Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection ...technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.
Full‐body avatar presence is important for immersive social and environmental interactions in digital reality. However, current devices only provide three six degrees of freedom (DOF) poses from the ...headset and two controllers (i.e. three‐point trackers). Because it is a highly under‐constrained problem, inferring full‐body pose from these inputs is challenging, especially when supporting the full range of body proportions and use cases represented by the general population. In this paper, we propose a deep learning framework, DivaTrack, which outperforms existing methods when applied to diverse body sizes and activities. We augment the sparse three‐point inputs with linear accelerations from Inertial Measurement Units (IMU) to improve foot contact prediction. We then condition the otherwise ambiguous lower‐body pose with the predictions of foot contact and upper‐body pose in a two‐stage model. We further stabilize the inferred full‐body pose in a wide range of configurations by learning to blend predictions that are computed in two reference frames, each of which is designed for different types of motions. We demonstrate the effectiveness of our design on a large dataset that captures 22 subjects performing challenging locomotion for three‐point tracking, including lunges, hula‐hooping, and sitting. As shown in a live demo using the Meta VR headset and Xsens IMUs, our method runs in real‐time while accurately tracking a user's motion when they perform a diverse set of movements.
Markerless motion capture has the potential to perform movement analysis with reduced data collection and processing time compared to marker-based methods. This technology is now starting to be ...applied for clinical and rehabilitation applications and therefore it is crucial that users of these systems understand both their potential and limitations. This literature review aims to provide a comprehensive overview of the current state of markerless motion capture for both single camera and multi-camera systems. Additionally, this review explores how practical applications of markerless technology are being used in clinical and rehabilitation settings, and examines the future challenges and directions markerless research must explore to facilitate full integration of this technology within clinical biomechanics.
A scoping review is needed to examine this emerging broad body of literature and determine where gaps in knowledge exist, this is key to developing motion capture methods that are cost effective and practically relevant to clinicians, coaches and researchers around the world. Literature searches were performed to examine studies that report accuracy of markerless motion capture methods, explore current practical applications of markerless motion capture methods in clinical biomechanics and identify gaps in our knowledge that are relevant to future developments in this area.
Markerless methods increase motion capture data versatility, enabling datasets to be re-analyzed using updated pose estimation algorithms and may even provide clinicians with the capability to collect data while patients are wearing normal clothing. While markerless temporospatial measures generally appear to be equivalent to marker-based motion capture, joint center locations and joint angles are not yet sufficiently accurate for clinical applications. Pose estimation algorithms are approaching similar error rates of marker-based motion capture, however, without comparison to a gold standard, such as bi-planar videoradiography, the true accuracy of markerless systems remains unknown.
Current open-source pose estimation algorithms were never designed for biomechanical applications, therefore, datasets on which they have been trained are inconsistently and inaccurately labelled. Improvements to labelling of open-source training data, as well as assessment of markerless accuracy against gold standard methods will be vital next steps in the development of this technology.
Human movement analysis is a key area of research in robotics, biomechanics, and data science. It encompasses tracking, posture estimation, and movement synthesis. While numerous methodologies have ...evolved over time, a systematic and quantitative evaluation of these approaches using verifiable ground truth data of three-dimensional human movement is still required to define the current state of the art. This paper presents seven datasets recorded using inertial-based motion capture. The datasets contain professional gestures carried out by industrial operators and skilled craftsmen performed in real conditions in-situ. The datasets were created with the intention of being used for research in human motion modeling, analysis, and generation. The protocols for data collection are described in detail, and a preliminary analysis of the collected data is provided as a benchmark. The Gesture Operational Model, a hybrid stochastic-biomechanical approach based on kinematic descriptors, is utilized to model the dynamics of the experts' movements and create mathematical representations of their motion trajectories for analysis and quantifying their body dexterity. The models allowed accurate the generation of human professional poses and an intuitive description of how body joints cooperate and change over time through the performance of the task.
In two-player competitive sports, such as boxing and fencing, athletes often demonstrate efficient and tactical movements during a competition. In this paper, we develop a learning framework that ...generates control policies for physically simulated athletes who have many degrees-of-freedom. Our framework uses a two step-approach, learning basic skills and learning bout-level strategies, with deep reinforcement learning, which is inspired by the way that people how to learn competitive sports. We develop a policy model based on an encoder-decoder structure that incorporates an autoregressive latent variable, and a mixture-of-experts decoder. To show the effectiveness of our framework, we implemented two competitive sports, boxing and fencing, and demonstrate control policies learned by our framework that can generate both tactical and natural-looking behaviors. We also evaluate the control policies with comparisons to other learning configurations and with ablation studies.
Inertial measurement units (IMUs) are increasingly utilized as motion capture devices in human movement studies. Given their high portability, IMUs can be deployed in any environment, importantly ...those outside of the laboratory. However, a significant challenge limits the adoption of this technology; namely estimating the orientation of the IMUs to a common world frame, which is essential to estimating the rotations across skeletal joints. Common (probabilistic) methods for estimating IMU orientation rely on the ability to update the current orientation estimate using data provided by the IMU. The objective of this work is to present a novel error-state Kalman filter that yields highly accurate estimates of IMU orientation that are robust to poor measurement updates from fluctuations in the local magnetic field and/or highly dynamic movements. The method is validated with ground truth data collected with highly accurate orientation measurements provided by a coordinate measurement machine. As an example, the method yields IMU-estimated orientations that remain within 3.7 degrees (RMS error) over relatively long (25 cumulative minutes) trials even in the presence of large fluctuations in the local magnetic field. For comparison, ignoring the magnetic interference increases the RMS error to 12.8 degrees, more than a threefold increase.
Production-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, ...and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global MLP based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes. We also model the delta information as wrinkle maps for the physically-based textures, achieving high-quality 4K dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.