•A large-scale RGB-T dataset is contributed to online RGB-T object tracking. The benchmark with a dozen of baseline trackers and 5 evaluation metrics will be open to public.•A novel graph-based ...learning approach is proposed to learn robust RGB-T object feature representations.•A L1-optimization based sparse learning algorithm is proposed to mitigate the noises of initial weights.•Extensive experiments are conducted on the large-scale benchmark dataset, and we provide new insights and potential future research directions for RGB-T object tracking.
RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of thermal information to visible data. However, RGB-T research is limited by lacking a comprehensive evaluation platform. In this paper, we propose a large-scale video benchmark dataset for RGB-T tracking. It has three major advantages over existing ones: 1) Its size is sufficiently large for large-scale performance evaluation (total number of frames: 234K, maximum number of frames per sequence: 8K). 2) The alignment between RGB-T sequence pairs is highly accurate, which does not need pre- or post-processing. 3) The occlusion levels are annotated for occlusion-sensitive performance analysis of different tracking algorithms. Moreover, we propose a novel graph-based approach to learn a robust object representation for RGB-T tracking. In particular, the tracked object is represented with a graph with image patches as nodes. Given initial weights of nodes, this graph including graph structure, node weights and edge weights is dynamically learned in a unified optimization framework. Extensive experiments on the large-scale dataset are executed to demonstrate the effectiveness of the proposed tracker against other state-of-the-art tracking methods. We also provide new insights and potential research directions to the field of RGB-T object tracking.
High-Speed Tracking with Kernelized Correlation Filters Henriques, Joao F.; Caseiro, Rui; Martins, Pedro ...
IEEE transactions on pattern analysis and machine intelligence,
2015-March-1, 2015-Mar, 2015-3-1, 20150301, Volume:
37, Issue:
3
Journal Article
Peer reviewed
Open access
The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this ...classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies-any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.
Object tracking is a computer vision task that aims to locate and continuously follow the movement of an object in video frames, given an initial annotation. Despite its importance, this task can ...prove to be challenging due to factors such as occlusion, deformations, and fast motion. Reinforcement Learning (RL) has been proposed as a viable solution for addressing these challenges by adapting to changes in object appearance and effectively handling occlusions, which can improve system performance.
This study carries out a Systematic Literature Review on the use of Reinforcement Learning in object tracking between 2015 and 2023, by collecting and analyzing current trends, metrics, and benchmarks used in the field. Guidelines proposed by Kitchenham were used to conduct the research, resulting in 75 studies being accepted based on their score on the quality scale attributed by the authors of this review. The studies were categorized to present the current state of research based on metadata, trends for publication, RL approach, RL algorithm, Deep Learning use, object tracking type, and camera control. Additionally, an analysis was performed on the evaluation process for system performance, focusing on benchmarks and metrics for Single Object Tracking, Multiple Object Tracking, and Active Object Tracking. This study addresses a gap by conducting a comprehensive Systematic Literature Review focusing exclusively on Reinforcement Learning for Object Tracking. The review offers researchers an updated, detailed, and objective scientific overview of the field that can be incorporated into future studies.
•The first comprehensive survey on deep-learning-based trackers.•Review existing deep visual trackers from three different perspectives.•Large-scale benchmark evaluations of deep visual ...trackers.•Summarize cutting-edge research works and discuss future directions•Provide useful insights and conclusions for deep visual trackers.
Recently, deep learning has achieved great success in visual tracking. The goal of this paper is to review the state-of-the-art tracking methods based on deep learning. First, we introduce the background of deep visual tracking, including the fundamental concepts of visual tracking and related deep learning algorithms. Second, we categorize the existing deep-learning-based trackers into three classes according to network structure, network function and network training. For each categorize, we explain its analysis of the network perspective and analyze papers in different categories. Then, we conduct extensive experiments to compare the representative methods on the popular OTB-100, TC-128 and VOT2015 benchmarks. Based on our observations, we conclude that: (1) The usage of the convolutional neural network (CNN) model could significantly improve the tracking performance. (2) The trackers using the convolutional neural network (CNN) model to distinguish the tracked object from its surrounding background could get more accurate results, while using the CNN model for template matching is usually faster. (3) The trackers with deep features perform much better than those with low-level hand-crafted features. (4) Deep features from different convolutional layers have different characteristics and the effective combination of them usually results in a more robust tracker. (5) The deep visual trackers using end-to-end networks usually perform better than the trackers merely using feature extraction networks. (6) For visual tracking, the most suitable network training method is to per-train networks with video information and online fine-tune them with subsequent observations. Finally, we summarize our manuscript and highlight our insights, and point out the further trends for deep visual tracking.
Today, a new generation of artificial intelligence has brought several new research domains such as computer vision (CV). Thus, target tracking, the base of CV, has been a hotspot research domain. ...Correlation filter (CF)-based algorithm has been the basis of real-time tracking algorithms because of the high tracking efficiency. However, CF-based algorithms usually failed to track objects in complex environments. Therefore, this article proposes a fuzzy detection strategy to prejudge the tracking result. If the prejudge process determines that the tracking result is not good enough in the current frame, the stored target template is used for following tracking to avoid the template pollution. During testing on the OTB100 dataset, the experimental results show that the proposed auxiliary detection strategy improves the tracking robustness under complex environment by ensuring the tracking speed.
Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world ...scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years - predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.
Multi-object tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT ...evaluation metric, higher order tracking accuracy (HOTA), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers. HOTA decomposes into a family of sub-metrics which are able to evaluate each of five basic error types separately, which enables clear analysis of tracking performance. We evaluate the effectiveness of HOTA on the MOTChallenge benchmark, and show that it is able to capture important aspects of MOT performance not previously taken into account by established metrics. Furthermore, we show HOTA scores better align with human visual evaluation of tracking performance.