Concealed Object Detection Fan, Deng-Ping; Ji, Ge-Peng; Cheng, Ming-Ming ...
IEEE transactions on pattern analysis and machine intelligence,
10/2022, Volume:
44, Issue:
10
Journal Article
Peer reviewed
Open access
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between the ...concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K , which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc . Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network ( SINet ). Without any bells and whistles, SINet outperforms twelve cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page: http://mmcheng.net/cod .
Directly benefiting from the deep learning methods, object detection has witnessed a great performance boost in recent years. However, drone-view object detection remains challenging for two main ...reasons: (1) Objects of tiny-scale with more blurs w.r.t. ground-view objects offer less valuable information towards accurate and robust detection; (2) The unevenly distributed objects make the detection inefficient, especially for regions occupied by crowded objects. Confronting such challenges, we propose an end-to-end global-local self-adaptive network (GLSAN) in this paper. The key components in our GLSAN include a global-local detection network (GLDN), a simple yet efficient self-adaptive region selecting algorithm (SARSA), and a local super-resolution network (LSRN). We integrate a global-local fusion strategy into a progressive scale-varying network to perform more precise detection, where the local fine detector can adaptively refine the target's bounding boxes detected by the global coarse detector via cropping the original images for higher-resolution detection. The SARSA can dynamically crop the crowded regions in the input images, which is unsupervised and can be easily plugged into the networks. Additionally, we train the LSRN to enlarge the cropped images, providing more detailed information for finer-scale feature extraction, helping the detector distinguish foreground and background more easily. The SARSA and LSRN also contribute to data augmentation towards network training, which makes the detector more robust. Extensive experiments and comprehensive evaluations on the VisDrone2019-DET benchmark dataset and UAVDT dataset demonstrate the effectiveness and adaptivity of our method. Towards an industrial application, our network is also applied to a DroneBolts dataset with proven advantages. Our source codes have been available at https://github.com/dengsutao/glsan .
Detection and Tracking Meet Drones Challenge Zhu, Pengfei; Wen, Longyin; Du, Dawei ...
IEEE transactions on pattern analysis and machine intelligence,
2022-Nov.-1, 2022-11-1, 20221101, Volume:
44, Issue:
11
Journal Article
Peer reviewed
Open access
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic ...understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized three challenge workshops in conjunction with ECCV 2018, ICCV 2019 and ECCV 2020, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first present a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms for the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from https://github.com/VisDrone/VisDrone-Dataset .
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal ...information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed c ross-mod a l v iew-mixed transform er (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Thermal infrared (TIR) object detection plays a crucial role in diverse around-the-clock applications, such as search and rescue operations and wildlife protection. Achieving rapid and robust ...detection of small objects from an aerial perspective is particularly significant in these scenarios. However, the task is compounded by two interrelated challenges, rendering it even more tricky. For one, small objects only occupy a few pixels and contain limited information. For another, TIR sensors are typically low-resolution (LR) due to inherent challenges associated with the imaging mechanism of the TIR spectrum. In contrast, high-resolution (HR) RGB sensors are readily available due to their cost-effectiveness and widespread application. Recognizing the importance of HR information, especially in the context of small object detection, we propose a cross-modality high-resolution knowledge distillation framework (CMHRD), which leverages knowledge from the HR-RGB modality and provides a novel strategy for TIR small object detection. The proposed framework introduces three key components: a super-resolution generative distillation loss for cross-modal high-resolution representation learning, a cross-modality affinity distillation loss to extract scene-level cross-modality information, and a response distillation loss aimed at mimicking the HR prediction. To facilitate research on small object detection with HR-RGB and LR-TIR data, we have curated and annotated two datasets, namely NOAA-Seal and VTUAV-det-small. Experimental results on the NOAA-Seal demonstrate that CMHRD yields significant improvements, achieving a remarkable 6.39 mAP50 increase over a strong baseline without introducing additional computational cost during inference. Experiments on single-category dataset VTUAV-det-small and multi-category dataset RTDOD also show consistent improvements brought by CMHRD. The project is available at https://github.com/NNNNerd/CMHRD.
Aerial object detection, as object detection in aerial images captured from an overhead perspective, has been widely applied in urban management, industrial inspection, and other aspects. However, ...the performance of existing aerial object detection algorithms is hindered by variations in object scales and orientations attributed to the aerial perspective. This survey presents a comprehensive review of recent advances in aerial object detection. We start with some basic concepts of aerial object detection and then summarize the five imbalance problems of aerial object detection, including scale imbalance, spatial imbalance, objective imbalance, semantic imbalance, and class imbalance. Moreover, we classify and analyze relevant methods and especially introduce the applications of aerial object detection in practical scenarios. Finally, the performance evaluation is presented on two popular aerial object detection datasets VisDrone-DET and DOTA, and we discuss several future directions that could facilitate the development of aerial object detection.
High power alternating magnetic fields can cause lesions and pathological changes in living object bodies entering the wireless charging area. Also, as for metal objects, the eddy effect can cause ...the security risks such as fire and ignition hazards. Hence, metal object detection (MOD) and living object detection (LOD) are the necessary functions for a wireless power transfer (WPT) system. A shared method of MOD and LOD based on the quality factor of detection coils for electric vehicle wireless charging was proposed in this paper to realize the dual functions by the same detection coil array. Firstly, the effect of metal objects and living objects on the detection coil quality factor was analyzed in detail. Secondly, a method of real-time calculation of quality factor was proposed to detect whether the foreign object exists. Then, a distinction algorithm based on the phase variation trend of resonant capacitor voltage was designed to classify the type of foreign objects. Finally, a 3.3 kW experimental prototype, including a foreign object detection (FOD) system, was established to verify the validity. The advantage of the quality factor as the core indicator for FOD was analyzed in detail. Under the condition of an iron wafer, aluminum wafer, and beef chunk placed on the detection coil, the quality factor calculation results of the detection coil in contrast to those without foreign objects decreased by 80.8%, 50.5%, and 69.9%, respectively. Meanwhile, the phase difference of the excitation relative to the resonant capacitor voltage changed -61.3°, -74.6°, and 49.4°, respectively.
For commercialization of wireless stationary electric vehicles (EV) chargers, metal object detection (MOD) on a power supply coil and detection of position (DoP) of EVs are needed. In this paper, ...dual-purpose nonoverlapping coil sets for both MOD and DoP, which detect a variation of magnetic flux on the power supply coil, are newly proposed, where the proposed MOD and DoP methods make no contribution to any power losses. The existence of metal objects on the power supply coil is determined by an induced voltage difference of the nonoverlapping coil sets, whereas the position of the EV is determined by an induced voltage of the nonoverlapping coil sets. A sensing circuit, which has a variable resistor that is different from the conventional overlapping coil for MOD, can make the induced voltage difference zero even when the magnetic flux distribution is distorted by moving the pick-up coil. The proposed nonoverlapping coil sets with the sensing circuit have been demonstrated by simulations and experiments. When metallic coins and aluminum sheets are located on the power supply coil, the induced voltage difference of the coil sets, which is ideally zero without metal objects, significantly increases to 62.8 and 450 mV, respectively, which is more than ten times the value without metal objects throughout experiments. In addition, when the pick-up coil approaches the power supply coil, induced voltage of each coil set increased roughly 1.6 times at 10 cm air gap.