In this article, we propose a Poisson multi-Bernoulli (PMB) filter for extended object tracking (EOT), which directly estimates the set of object trajectories, using belief propagation (BP). The ...proposed filter propagates a PMB density on the posterior of sets of trajectories through the filtering recursions over time, where the PMB mixture (PMBM) posterior after the update step is approximated as a PMB. The efficient PMB approximation relies on several important theoretical contributions. First, we present a PMBM conjugate prior on the posterior of sets of trajectories for a generalized measurement model, in which each object generates an independent set of measurements. The PMBM density is a conjugate prior in the sense that both the prediction and the update steps preserve the PMBM form of the density. Second, we present a factor graph representation of the joint posterior of the PMBM set of trajectories and association variables for the Poisson spatial measurement model. Importantly, leveraging the PMBM conjugacy and the factor graph formulation enables an elegant treatment on undetected objects via a Poisson point process and efficient inference on sets of trajectories using BP, where the approximate marginal densities in the PMB approximation can be obtained without enumeration of different data association hypotheses. To achieve this, we present a particle-based implementation of the proposed filter, where smoothed trajectory estimates, if desired, can be obtained via single-object particle smoothing methods, and its performance for EOT with ellipsoidal shapes is evaluated in a simulation study.
Deep Affinity Network for Multiple Object Tracking Sun, ShiJie; Akhtar, Naveed; Song, HuanSheng ...
IEEE transactions on pattern analysis and machine intelligence,
01/2021, Letnik:
43, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Multiple Object Tracking (MOT) plays an important role in solving many fundamental problems in video analysis and computer vision. Most MOT methods employ two steps: Object Detection and Data ...Association. The first step detects objects of interest in every frame of a video, and the second establishes correspondence between the detected objects in different frames to obtain their tracks. Object detection has made tremendous progress in the last few years due to deep learning. However, data association for tracking still relies on hand crafted constraints such as appearance, motion, spatial proximity, grouping etc. to compute affinities between the objects in different frames. In this paper, we harness the power of deep learning for data association in tracking by jointly modeling object appearances and their affinities between different frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN) learns compact, yet comprehensive features of pre-detected objects at several levels of abstraction, and performs exhaustive pairing permutations of those features in any two frames to infer object affinities. DAN also accounts for multiple objects appearing and disappearing between video frames. We exploit the resulting efficient affinity computations to associate objects in the current frame deep into the previous frames for reliable on-line tracking. Our technique is evaluated on popular multiple object tracking challenges MOT15, MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics demonstrates that our approach is among the best performing techniques on the leader board for these challenges. The open source implementation of our work is available at https://github.com/shijieS/SST.git .
Hyperparameters are numerical pre-sets whose values are assigned prior to the commencement of a learning process. Selecting appropriate hyperparameters is often critical for achieving satisfactory ...performance in many vision problems, such as deep learning-based visual object tracking. However, it is often difficult to determine their optimal values, especially if they are specific to each video input. Most hyperparameter optimization algorithms tend to search a generic range and are imposed blindly on all sequences. In this paper, we propose a novel dynamical hyperparameter optimization method that adaptively optimizes hyperparameters for a given sequence using an action-prediction network leveraged on continuous deep Q-learning. Since the observation space for object tracking is significantly more complex than those in traditional control problems, existing continuous deep Q-learning algorithms cannot be directly applied. To overcome this challenge, we introduce an efficient heuristic strategy to handle high dimensional state space, while also accelerating the convergence behavior. The proposed algorithm is applied to improve two representative trackers, a Siamese-based one and a correlation-filter-based one, to evaluate its generalizability. Their superior performances on several popular benchmarks are clearly demonstrated. Our source code is available at https://github.com/shenjianbing/dqltracking .
Detection and Tracking Meet Drones Challenge Zhu, Pengfei; Wen, Longyin; Du, Dawei ...
IEEE transactions on pattern analysis and machine intelligence,
2022-Nov.-1, 2022-11-1, 20221101, Letnik:
44, Številka:
11
Journal Article
Recenzirano
Odprti dostop
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic ...understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized three challenge workshops in conjunction with ECCV 2018, ICCV 2019 and ECCV 2020, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first present a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms for the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from https://github.com/VisDrone/VisDrone-Dataset .
With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the ...existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filters. Consequently, the process of learning spatial filters can be approximated by the lasso regularization. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimization framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123, and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only ...its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.
In this paper, we propose an adaptive region proposal scheme with feature channel regularization to facilitate robust object tracking. We consider tracking as a linear regression problem and an ...ensemble of correlation filters is trained on-line to distinguish the foreground target from the background. Further, we integrate adaptively learned region proposals into an enhanced two-stream tracking framework based on correlation filters. For the tracking stream, we learn two-stage cascade correlation filters on deep convolutional features to ensure competitive tracking performance. For the detection stream, we employ adaptive region proposals, which are effective in recovering target objects from tracking failures caused by heavy occlusion or out-of-view movement. In contrast to traditional tracking-by-detection methods using random samples or sliding windows, we perform target re-detection over adaptively learned region proposals. Since region proposals naturally take the objectness information into account, we show that the proposed adaptive region proposals can handle the challenging scale estimation problem as well. In addition, we observe the channel redundancy and noisy of feature representation, especially for the convolutional features. Thus, we apply a channel regularization to the correlation filter learning. Extensive experimental validations on OTB, VOT and UAV-123 datasets demonstrate that the proposed method performs favorably against state-of-the-art tracking algorithms.
Hyperspectral (HS) video is able to capture abundant spectral, spatial, and temporal information of objects, which overcomes the limitations of common red-green-blue (RGB) video in complex scenarios ...such as similar appearances and background clutters. However, most of trackers apply hand-crafted features extracted from manually selected bands instead of deep features for object representations due to limited HS data and the band gap problem. Each HS image consists of many bands, and it is challenging to fully interact with the band information while maintaining tracking speed. To this end, this paper proposes a novel end-to-end spectral awareness interaction network with dynamic template (SPIRIT) for HS video object tracking. Firstly, a spectral awareness module is proposed to learn band contributions with consideration of nonlinear and global interactions between HS bands. It can also cooperate with the feature extraction module pre-trained with RGB data to attenuate the band gap and data-hungry. Secondly, an interaction module is proposed to achieve inter- and intra-band feature interactions to enhance tracking performance while improving efficiency. Furthermore, the proposed method contains a novel update module that evaluates the tracking confidence of the current state to adapt to object changes and attenuate tracking drifts. Extensive experiments demonstrate the superiority of our approach compared to state-of-the-arts while meeting real-time demands.