Concealed Object Detection Fan, Deng-Ping; Ji, Ge-Peng; Cheng, Ming-Ming ...
IEEE transactions on pattern analysis and machine intelligence,
10/2022, Volume:
44, Issue:
10
Journal Article
Peer reviewed
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between the ...concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K , which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc . Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network ( SINet ). Without any bells and whistles, SINet outperforms twelve cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page: http://mmcheng.net/cod .
Directly benefiting from the deep learning methods, object detection has witnessed a great performance boost in recent years. However, drone-view object detection remains challenging for two main ...reasons: (1) Objects of tiny-scale with more blurs w.r.t. ground-view objects offer less valuable information towards accurate and robust detection; (2) The unevenly distributed objects make the detection inefficient, especially for regions occupied by crowded objects. Confronting such challenges, we propose an end-to-end global-local self-adaptive network (GLSAN) in this paper. The key components in our GLSAN include a global-local detection network (GLDN), a simple yet efficient self-adaptive region selecting algorithm (SARSA), and a local super-resolution network (LSRN). We integrate a global-local fusion strategy into a progressive scale-varying network to perform more precise detection, where the local fine detector can adaptively refine the target's bounding boxes detected by the global coarse detector via cropping the original images for higher-resolution detection. The SARSA can dynamically crop the crowded regions in the input images, which is unsupervised and can be easily plugged into the networks. Additionally, we train the LSRN to enlarge the cropped images, providing more detailed information for finer-scale feature extraction, helping the detector distinguish foreground and background more easily. The SARSA and LSRN also contribute to data augmentation towards network training, which makes the detector more robust. Extensive experiments and comprehensive evaluations on the VisDrone2019-DET benchmark dataset and UAVDT dataset demonstrate the effectiveness and adaptivity of our method. Towards an industrial application, our network is also applied to a DroneBolts dataset with proven advantages. Our source codes have been available at https://github.com/dengsutao/glsan .
Detection and Tracking Meet Drones Challenge Zhu, Pengfei; Wen, Longyin; Du, Dawei ...
IEEE transactions on pattern analysis and machine intelligence,
2022-Nov.-1, 2022-11-1, 20221101, Volume:
44, Issue:
11
Journal Article
Peer reviewed
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic ...understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized three challenge workshops in conjunction with ECCV 2018, ICCV 2019 and ECCV 2020, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first present a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms for the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from https://github.com/VisDrone/VisDrone-Dataset .
Owing to the advancements in deep learning, object detection has made significant progress in estimating the positions and classes of multiple objects within an image. However, detecting objects of ...various scales within a single image remains a challenging problem. In this study, we suggest a scale-aware token matching to predict the positions and classes of objects for transformer-based object detection. We train a model by matching detection tokens with ground truth considering its size, unlike the previous methods that performed matching without considering the scale during the training process. We divide one detection token set into multiple sets based on scale and match each token set differently with ground truth, thereby, training the model without additional computation costs. The experimental results demonstrate that scale information can be assigned to tokens. Scale-aware tokens can independently learn scale-specific information by using a novel loss function, which improves the detection performance on small objects.
•Each token independently learn scale-specific information with our token matching method.•Only small-scale information is processed separately to mitigate the heterogeneity of small objects.•Improve small objects detection by treating objects of different scales as negative samples.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal ...information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed c ross-mod a l v iew-mixed transform er (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.
Aerial object detection, as object detection in aerial images captured from an overhead perspective, has been widely applied in urban management, industrial inspection, and other aspects. However, ...the performance of existing aerial object detection algorithms is hindered by variations in object scales and orientations attributed to the aerial perspective. This survey presents a comprehensive review of recent advances in aerial object detection. We start with some basic concepts of aerial object detection and then summarize the five imbalance problems of aerial object detection, including scale imbalance, spatial imbalance, objective imbalance, semantic imbalance, and class imbalance. Moreover, we classify and analyze relevant methods and especially introduce the applications of aerial object detection in practical scenarios. Finally, the performance evaluation is presented on two popular aerial object detection datasets VisDrone-DET and DOTA, and we discuss several future directions that could facilitate the development of aerial object detection.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
Thermal infrared (TIR) object detection plays a crucial role in diverse around-the-clock applications, such as search and rescue operations and wildlife protection. Achieving rapid and robust ...detection of small objects from an aerial perspective is particularly significant in these scenarios. However, the task is compounded by two interrelated challenges, rendering it even more tricky. For one, small objects only occupy a few pixels and contain limited information. For another, TIR sensors are typically low-resolution (LR) due to inherent challenges associated with the imaging mechanism of the TIR spectrum. In contrast, high-resolution (HR) RGB sensors are readily available due to their cost-effectiveness and widespread application. Recognizing the importance of HR information, especially in the context of small object detection, we propose a cross-modality high-resolution knowledge distillation framework (CMHRD), which leverages knowledge from the HR-RGB modality and provides a novel strategy for TIR small object detection. The proposed framework introduces three key components: a super-resolution generative distillation loss for cross-modal high-resolution representation learning, a cross-modality affinity distillation loss to extract scene-level cross-modality information, and a response distillation loss aimed at mimicking the HR prediction. To facilitate research on small object detection with HR-RGB and LR-TIR data, we have curated and annotated two datasets, namely NOAA-Seal and VTUAV-det-small. Experimental results on the NOAA-Seal demonstrate that CMHRD yields significant improvements, achieving a remarkable 6.39 mAP50 increase over a strong baseline without introducing additional computational cost during inference. Experiments on single-category dataset VTUAV-det-small and multi-category dataset RTDOD also show consistent improvements brought by CMHRD. The project is available at https://github.com/NNNNerd/CMHRD.
As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by ...deep learning-based solutions (named deep SOD). To enable in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at https://github.com/wenguanwang/SODsurvey .
Human activity depends on the oceans for food, transportation, leisure, and many more purposes. Oceans cover 70% of the Earth’s surface, but most of them are unknown to humankind. This is the reason ...why underwater imaging is a valuable resource asset to Marine Science. Images are acquired with observing systems, e.g. autonomous underwater vehicles or underwater observatories, that presently transmit all the raw data to land stations. However, the transfer of such an amount of data could be challenging, considering the limited power supply and transmission bandwidth of these systems. In this paper, we discuss these aspects, and in particular how it is possible to couple Edge and Cloud computing for effective management of the full processing pipeline according to the Compute Continuum paradigm.
•We present an object detection pipeline for Marine Science based on the Compute Continuum paradigm.•The Object detector has been trained for Mediterranean fishes, while related works normally consider tropical fishes.•The pipeline runs on the Jetson Nano board to reduce transmitted data size.•The work analyzes the execution time and power consumption of YoloV3, ULO, and Ulo Tiny.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Currently, a lot of work is focused on aerial object detection and has achieved good results. Though these methods have achieved promising results on the conventional datasets, it is still ...challenging to locate objects from the low-quality images captured in adverse weather conditions. Currently, there are limited approaches that combine aerial object detection with hazy conditions, and there are few publicly available datasets for real hazy weather base on aerial images. For this purpose, we propose a dataset HRSI, hazy remote sensing images in real world, which is mainly divided into three categories: airport, large vehicle, ship. All images in HRSI are from real hazy conditions. In addition, we propose a object detection model DFENet, a dehazing feature enhancement model for hazy remote sensing images, which is suitable for hazy weather. DFENet consists of two-branch and a dehazing module. The two-branch structure help to fully learn hazy and dehazing features. In order to avoid the impact of noise caused by the dehezing module, we also designed an haze-predict modul (HPM) to predict the information containing haze in the image. We introduce the cross-fuse modul (CFM) to utilize the information of haze to guide the feature fusion of two branches. By utilizing the information of haze, DFENet can dynamically adjust the feature weight in the two-branch to avoid the impact of noise generated by the dehazing module. Compared with traditional object detection methods, DFENet not only has good performance in hazy conditions, but also improves performance on clear conditions. We tested DFENet on DOTA, HRSI, and Foggy-DOTA to demonstrate that DFENet performs better under hazy conditions.