Human trajectory prediction is challenging and critical in various applications (e.g., autonomous vehicles and social robots). Because of the continuity and foresight of the pedestrian movements, the ...moving pedestrians in crowded spaces will consider both spatial and temporal interactions to avoid future collisions. However, most of the existing methods ignore the temporal correlations of interactions with other pedestrians involved in a scene. In this work, we propose a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians. Besides the spatial interactions captured by the graph attention mechanism at each time-step, we adopt an extra LSTM to encode the temporal correlations of interactions. Through comparisons with state-of-the-art methods, our model achieves superior performance on two publicly available crowd datasets (ETH and UCY) and produces more "socially" plausible trajectories for pedestrians.
Simulating temporal three-dimensional (3D) deformations of clothing worn by the human body is a core technology in computer graphics that plays a vital role in various fields, such as computer games, ...animation, and movies. Physics-based simulation and data-driven methods are two mainstream technologies used to generate clothing deformation. However, when it is necessary to quickly generate clothing animations of different body shapes and motions, the existing methods cannot balance efficiency and effectiveness. In this paper, we present a learning-based method, given human body shape and motion, which automatically synthesizes temporal 3D deformations of clothing in real-time. A temporal framework based on the transformer is designed to capture the correlation between clothing deformation and shape features of the moving human body, as well as the frame-level dependency. A feature fusion strategy is an innovation used to fuse the two features of shape and motion. We also perform post-processing on the penetration of clothing and the human body, generating collision-free cloth deformation sequences. To evaluate the method, we build a human motion dataset based on the large-scale public human body dataset AMASS, and further develop a clothing deformation dataset. We qualitatively and quantitatively demonstrate that our approach outperforms existing methods in terms of temporal clothing deformation with variable shape and motion, as well as producing realistic deformation at interactive rates.
Predicting the trajectories of pedestrians is an important and difficult task for many applications, such as robot navigation and autonomous driving. Most of the existing methods believe that an ...accurate prediction of the pedestrian intention can improve the prediction quality. These works tend to predict a fixed destination coordinate as the agent intention and predict the future trajectory accordingly. However, in the process of moving, the intention of a pedestrian could be a definite location or a general direction and area, and may change dynamically with the changes of surrounding. Thus, regarding the agent intention as a fixed 2-d coordinate is insufficient to improve the future trajectory prediction. To address this problem, we propose Dynamic Target Driven Network for pedestrian trajectory prediction (DTDNet), which employs a multi-precision pedestrian intention analysis module to capture this dynamic. To ensure that this extracted feature contains comprehensive intention information, we design three sub-tasks: predicting coarse-precision endpoint coordinate, predicting fine-precision endpoint coordinate and scoring scene sub-regions. In addition, we propose a original multi-precision trajectory data extraction method to achieve multi-resolution representation of future intention and make it easier to extract local scene information. We compare our model with previous methods on two publicly available datasets (ETH-UCY and Stanford Drone Dataset). The experimental results show that our DTDNet achieves better trajectory prediction performance, and conducts better pedestrian intention feature representation.
Predicting plausible and collisionless trajectories is critical in various applications, such as robotic navigation and autonomous driving. This is a challenging task due to two major factors. First, ...it is difficult for deep neural networks to understand how pedestrians move to avoid collisions and how they react to each other. Second, given observed trajectories, there are multiple possible and plausible trajectories followed by pedestrians. Although an increasing number of previous works have focused on modeling social interactions and multimodality, the trajectories generated by these methods still lead to many collisions. In this work, we propose CoL-GAN, a new attention-based generative adversarial network using a convolutional neural network as a discriminator, which is able to generate trajectories with fewer collisions. Through experimental comparisons with prior works on publicly available datasets, we demonstrate that Col-GAN achieves state-of-the-art performance in terms of accuracy and collision avoidance.
We propose a method for simulating cloth with meshes dynamically refined according to visual saliency. It is a common belief that it is preferable for the regions of an image being viewed to have ...more details than others. For a certain scene, a low-resolution cloth mesh is first simulated and rendered into images in the preview stage. Pixel saliency values of these images are predicted according to a pre-trained saliency prediction model. These pixel saliencies are then translated to a vertex saliency of the corresponding meshes. Vertex saliency, together with camera positions and a number of geometric features of surfaces, guides the dynamic remeshing for simulation in the production stage. To build the saliency prediction model, images extracted from various videos of clothing scenes were used as training data. Participants were asked to watch these videos and their eye motion was tracked. A saliency map is generated from the eye motion data for each extracted video frame image. Image feature vectors and map labels are sent to a Support Vector Machine for training to obtain a saliency prediction model. Our method greatly reduces the number of vertices and faces in the clothing model, and generates a speed-up of more than 3 × for scenes with single dressed character, while for multi-character scenes the speed-up is increased to more than 5×. The proposed technique can work together with view-dependency for offline simulation.
Image-based rendering (IBR) attempts to synthesize novel views using a set of observed images. Some IBR approaches (such as light fields) have yielded impressive high-quality results on small-scale ...scenes with dense photo capture. However, available wide-baseline IBR methods are still restricted by the low geometric accuracy and completeness of multi-view stereo (MVS) reconstruction on low-textured and non-Lambertian surfaces. The issues become more significant in large-scale outdoor scenes due to challenging scene content, e.g., buildings, trees, and sky. To address these problems, we present a novel IBR algorithm that consists of two key components. First, we propose a novel depth refinement method that combines MVS depth maps with monocular depth maps predicted via deep learning. A lookup table remap is proposed for converting the scale of the monocular depths to be consistent with the scale of the MVS depths. Then, the rescaled monocular depth is used as the constraint in the minimum spanning tree (MST)-based nonlocal filter to refine the per-view MVS depth. Second, we present an efficient shape-preserving warping algorithm that uses superpixels to generate the warped images and blend expected novel views of scenes. The proposed method has been evaluated on public MVS and view synthesis datasets, as well as newly captured large-scale outdoor datasets. In comparison with state-of-the-art methods, the experimental results demonstrated that the proposed method can obtain more complete and reliable depth maps for the challenging large-scale outdoor scenes, thereby resulting in more promising novel view synthesis.
This paper explores the evacuation behavior of crowds during terrorist attacks. We extend a floor field model for a simulation of dual-role crowds in a three-dimensional (3D) space. In this model, ...pedestrians can bypass obstacles and move to target positions when avoiding attackers. An attacker can bypass obstacles to pursue target pedestrians. In addition, pedestrians and attacker have their own field of view models. In the model, obstacles obstruct the evacuation route of pedestrians, causing pedestrians to fail to escape the attacker in time and be injured or killed. In addition, obstacles also play a protective role for pedestrians, because obstacles obstruct the attacker’s pursuit route, and especially high obstacles also block the attacker’s view. We conducted 300 experiments to study the effects of obstacles on the evacuation of pedestrians under threat of attack. The following conclusions are drawn: (1) A single obstacle in front of an exit is more conducive to evacuation than no obstacle, (2) higher-density obstacles can better protect pedestrians from being chased by attackers, and (3) the direction of the aisle formed between obstacles should be consistent with the direction of the exit so that pedestrians can be evacuated more efficiently to reduce the death toll. We discovered two interesting phenomena, namely, circle and dispersion, which help to explain why fewer deaths occur in the presence of high-density obstacles.
•An extended floor field model was used to simulate a dual-role crowd under terrorist attack.•An attacker’s field of view model in 3D space was added.•The greater the density of obstacles, the fewer deaths occurred.•Two interesting phenomena for pedestrians to avoid attackers emerged — circle and dispersion.•The aisle formed between obstacles should be in the same direction as the exit to maximize pedestrian protection.
Virtualized traffic via various simulation models and real‐world traffic data are promising approaches to reconstruct detailed traffic flows. A variety of applications can benefit from the virtual ...traffic, including, but not limited to, video games, virtual reality, traffic engineering and autonomous driving. In this survey, we provide a comprehensive review on the state‐of‐the‐art techniques for traffic simulation and animation. We start with a discussion on three classes of traffic simulation models applied at different levels of detail. Then, we introduce various data‐driven animation techniques, including existing data collection methods, and the validation and evaluation of simulated traffic flows. Next, we discuss how traffic simulations can benefit the training and testing of autonomous vehicles. Finally, we discuss the current states of traffic simulation and animation and suggest future research directions.
Virtualized traffic via various simulation models and real‐world traffic data are promising approaches to reconstruct detailed traffic flows. A variety of applications can benefit from the virtual traffic, including, but not limited to, video games, virtual reality, traffic engineering and autonomous driving. In this survey, we provide a comprehensive review on the state‐of‐the‐art techniques for traffic simulation and animation. We start with a discussion on three classes of traffic simulation models applied at different levels of detail. Then, we introduce various data‐driven animation techniques, including existing data collection methods, and the validation and evaluation of simulated traffic flows. Next, we discuss how traffic simulations can benefit the training and testing of autonomous vehicles.
Large display systems have been successfully applied in virtual reality domains because they can provide full sense of immersion through large visual space and high display resolution. However, only ...a few users can interact with these systems by using pen-like or marker-based devices. In addition, user experience and application mode are constrained in many areas. In this paper, we propose a novel application framework called "Groupnect", which gives users unique experience of group interaction in a large display system. By using optical tracking and 3D gesture recognition technologies, our approach can automatically recognize gesture-based control signals for 12 users simultaneously, and the backend system can trigger corresponding actions in real time. We conduct a user study and compare the results with a standard interaction mode. The results demonstrate that our approach greatly increases recorded objective activities and subjective efforts. Moreover, the physical and mental participation of users can be promoted by Groupnect. It indicates great potential to design novel applications in entertainment, education and training areas.
Trajectory prediction is a crucial and challenging task in many domains (e.g., autonomous driving and robot navigation). First, high-quality trajectory prediction methods need to capture the ...human–human interactions and human-scene interactions effectively to avoid collisions with moving agents and static obstacles. Moreover, it is indispensable for the approaches to be efficient and lightweight to reduce computing costs and economize public resources. To address these challenges, we propose a model with a Spatial–Temporal module and a heatmap module based on gated linear units. In the Spatial–Temporal module, an adaptive Graph Convolutional Network was proposed to capture the human–human interactions, which combines physical features with graph convolutional networks to speculate the agents’ implicit relationships. As for the human-scene interaction, we encode the sequential local heatmap around each agent in the heatmap module. The model includes two gated linear units to capture the correlations of the agent’s motion and dynamic changing trend of the surrounding scene, respectively. Compared with previous methods, our method is more lightweight and efficient with a smaller parameter size and shorter inference time. Meanwhile, our model achieves better experimental results on two publicly available datasets (ETH and UCY) and predicts more socially reasonable trajectories.