Object tracking from LiDAR point clouds, which are always incomplete, sparse, and unstructured, plays a crucial role in urban navigation. Some existing methods utilize a learned similarity network ...for locating the target, immensely limiting the advancements in tracking accuracy. In this study, we leveraged a powerful target discriminator and an accurate state estimator to robustly track target objects in challenging point cloud scenarios. Considering the complex nature of estimating the state, we extended the traditional Lucas and Kanade (LK) algorithm to 3D point cloud tracking. Specifically, we propose a state estimation subnetwork that aims to learn the incremental warp for updating the coarse target state. Moreover, to obtain a coarse state, we present a simple yet efficient discrimination subnetwork. It can project 3D shapes into a more discriminatory latent space by integrating the global feature into each point-wise feature. Experiments on KITTI and PandaSet datasets showed that compared with the most advanced of other methods, our proposed method can achieve significant improvements—in particular, up to 13.68% on KITTI.
We propose two-dimensional pose estimation from a single range image of the human body, using sparse regression with a componentwise clustering feature point representation (CCFPR) model. CCFPR ...includes primary feature points and secondary feature points. The primary feature points consist of the torso center and five extremal points of human body, and further serve to classify all body pixels as the points of six body components. The secondary feature points are given by the cluster centers of each of the five components other than the torso, using K-means cluster. The human pose is obtained by learning a sparse projection matrix, which maps CCFPR to the skeleton points of human body, based on the assumption that each skeleton point be represented by a combination of a few feature points of associated body components. Experimental results on both virtual data and real data show that, under the sparse regression model with a suitably selected cluster number, CCFPR outperforms the random decision forest approach and prediction results of Kinect sensor v2 .
Abstract
Traffic prediction is an important part of intelligent transportation system. Recently, graph convolution network (GCN) is introduced for traffic flow forecasting and achieves good ...performance due to its superiority of representing the graph traffic road structure network. Moreover, the dynamic GCN is put forward to model the temporal property of the traffic flow. Although great progress has been made, most GCN based traffic flow forecasting methods utilize a single graph for convolution, which is considered not enough to reveal the inherent property of traffic graph as it is influenced by many factors, for example weather, season and traffic accidents etc. In this paper, an exotic graph transformer based dynamic multiple graph convolution networks (GTDMGCN) is conceived for traffic flow forecasting. Instead of the single graph, multiple graphs are constructed to modulate the complex traffic network by the proposed graph transformer network. Additionally, a temporal gate convolution is proposed to get the temporal property of traffic flow. The proposed GTDMGCN model is evaluated on four real traffic datasets of PEMS03, PEMS04, PEMS07, PEMS08, and there are average increments of 9.78%, 7.80%, 5.96% under MAE, RMSE, and MAPE metrics compared with the current results.
One major drawback of 3D printing technologies is the low printing efficiency. In general, it takes 8-12 hours to print a normal 3D model. 3D printing technologies manufacture objects layer by layer ...where each layer is composed of one or more closed 2D polygons (named as slices in this work). As a result, the number of slices will directly affects the printing time. In addition, most 3D printing technologies need extra supporting structures to support overhang regions during printing, such as Fused Deposition Modeling (FDM), which will largely increase manufacturing time. We observe that both the slicing number and the amount of supporting structures are affected by printing direction. In this work, a novel printing direction optimization algorithm is proposed based on the amount of slices and overhang areas. In the proposed method, Genetic Algorithm is used to obtain optimal printing direction where the fitness function is designed as the weighted slice number and overhang areas. Experimental results show that the proposed algorithm is able to obtain an optimal printing direction to reduce the number of slices and overhang areas.
Due to the huge cost of manual annotations, the labelled data may not be sufficient to train a dynamic facial expression (DFR) recogniser with good performance. To address this, the authors propose a ...multi‐modal pre‐training method with a pseudo‐label guidance mechanism to make full use of unlabelled video data for learning informative representations of facial expressions. First, the authors build a pre‐training dataset of videos with aligned vision and audio modals. Second, the vision and audio feature encoders are trained through an instance discrimination strategy and a cross‐modal alignment strategy on the pre‐training data. Third, the vision feature encoder is extended as a dynamic expression recogniser and is fine‐tuned on the labelled training data. Fourth, the fine‐tuned expression recogniser is adopted to predict pseudo‐labels for the pre‐training data, and then start a new pre‐training phase with the guidance of pseudo‐labels to alleviate the long‐tail distribution problem and the instance‐class confliction. Fifth, since the representations learnt with the guidance of pseudo‐labels are more informative, a new fine‐tuning phase is added to further boost the generalisation performance on the DFR recognition task. Experimental results on the Dynamic Facial Expression in the Wild dataset demonstrate the superiority of the proposed method.
To reduce the dependency on expensive manual annotations for the dynamic expression recognition (DER) task, the authors propose a multi‐modal pre‐training method with a pseudo‐label guidance mechanism to make full use of unlabelled video data for learning informative representations of facial expressions, and achieve state‐of‐the‐art performance on the DER task.
Abstract
Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention‐based VQA models mine the implicit relationships between objects ...according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph‐based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficiency of visual edge relations. Besides, the scene graph of text modality is often ignored in these works. In this article, a novel Dual Scene Graph Enhancement Module (DSGEM) is proposed that exploits the relevant external knowledge to simultaneously construct two interpretable scene graph structures of image and text modalities, which makes the reasoning process more logical and precise. Specifically, the authors respectively build the visual and textual scene graphs with the help of commonsense knowledge and syntactic structure, which explicitly endows the specific semantics to each edge relation. Then, two scene graph enhancement modules are proposed to propagate the involved external and structural knowledge to explicitly guide the feature interaction between objects (nodes). Finally, the authors embed such two scene graph enhancement modules to existing VQA models to introduce the explicit relation reasoning ability. Experimental results on both VQA V2 and OK‐VQA datasets show that the proposed DSGEM is effective and compatible to various VQA architectures.
The recently emerged compressive sensing (CS) theory provides a whole new avenue for data gathering in wireless sensor networks with benefits of universal sampling and decentralized encoding. ...However, existing compressive sensing based data gathering approaches assume the sensed data has a known constant sparsity, ignoring that the sparsity of natural signals vary in temporal and spatial domain. In this paper, we present an adaptive data gathering scheme by compressive sensing for wireless sensor networks. By introducing autoregressive (AR) model into the reconstruction of the sensed data, the local correlation in sensed data is exploited and thus local adaptive sparsity is achieved. The recovered data at the sink is evaluated by utilizing successive reconstructions, the relation between error and measurements. Then the number of measurements is adjusted according to the variation of the sensed data. Furthermore, a novel abnormal readings detection and identification mechanism based on combinational sparsity reconstruction is proposed. Internal error and external event are distinguished by their specific features. We perform extensive testing of our scheme on the real data sets and experimental results validate the efficiency and efficacy of the proposed scheme. Up to about 8dB SNR gain can be achieved over conventional CS based method with moderate increase of complexity.
Many application scenarios involve sequential data, but most existing clustering methods do not well utilize the order information embedded in sequential data. In this paper, we study the subspace ...clustering problem for sequential data and propose a new clustering method, namely ordered sparse clustering with block-diagonal prior (BD-OSC). Instead of using the sparse normalizer in existing sparse subspace clustering methods, a quadratic normalizer for the data sparse representation is adopted to model the correlation among the data sparse coefficients. Additionally, a block-diagonal prior for the spectral clustering affinity matrix is integrated with the model to improve clustering accuracy. To solve the proposed BD-OSC model, which is a complex optimization problem with quadratic normalizer and block-diagonal prior constraint, an efficient algorithm is proposed. We test the proposed clustering method on several types of databases, such as synthetic subspace data set, human face database, video scene clips, motion tracks, and dynamic 3-D face expression sequences. The experiments show that the proposed method outperforms state-of-the-art subspace clustering methods.
Buses, as the most commonly used public transport, play a significant role in cities. Predicting bus traffic flow cannot only build an efficient and safe transportation network but also improve the ...current situation of road traffic congestion, which is very important for urban development. However, bus traffic flow has complex spatial and temporal correlations, as well as specific scenario patterns compared with other modes of transportation, which is one of the biggest challenges when building models to predict bus traffic flow. In this study, we explore bus traffic flow and its specific scenario patterns, then we build improved spatio-temporal residual networks to predict bus traffic flow, which uses fully connected neural networks to capture the bus scenario patterns and improved residual networks to capture the bus traffic flow spatio-temporal correlation. Experiments on Beijing transportation smart card data demonstrate that our method achieves better results than the four baseline methods.
Crowd Motion Editing Based on Mesh Deformation Zhang, Yong; Zhang, Xinyu; Zhang, Tao ...
International Journal of Digital Multimedia Broadcasting,
12/2020, Letnik:
2020
Journal Article
Recenzirano
Odprti dostop
Computer simulation is a significant technology on making great scenes of crowd in the film industry. However, current animation making process of crowd motion requires large manual operations which ...are time-consuming and inconvenient. To solve the above problem, this paper presents an editing method on the basis of mesh deformation that can rapidly and intuitively edit crowd movement trajectories from the perspective of time and space. The method is applied to directly generate and adjust the crowd movement as well as avoid the crash between crowd and obstacles. As for collisions within the crowd that come along with path modification problem, a time-based solution is put forward to avoid this situation by retaining relative positions of individuals. Moreover, an experiment based on a real venue was performed and the result indicates that the proposed method can not only simplify the editing operations but also improve the efficiency of crowd motion editing.