Timely fire alarms are crucial as they can save lives and avoid major economic losses. However, due to the complexity of the structure, the current mainstream DETR-based fire detection models are ...problematic in terms of practicality because they require large amounts of memory and long inference times. Meanwhile, high-quality fire detection datasets are very scarce, severely limiting the performance of the algorithms. To address these challenges and improve accuracy in complex fire environments, first, we introduce a dataset quality enhancement framework based on diffusion model (DDPM) to improve the quality of low-quality fire alarm datasets. Second, we propose a novel Deformable-DETR-based fire detection framework (FTA-DETR). Among the innovative optimizations of FTA-DETR, first, we introduce a trainable matrix in the encoder to compute features, which reduces the computational burden of the encoder, highlights compelling features, and significantly reduces the training time. Second, we improve the encoding block by alternately updating high-level and low-level features, greatly reducing the amount of feature computation required for effective detection. This encoder structure is compatible with any state-of-the-art transformer decoder. Next, to accommodate the multi-scale nature of fires and different environmental complexities, we modify the loss function to WiouV3, which not only speeds up the convergence of the model but also improves the performance. Finally, we smoothly combine FTA-DETR with an acceleration engine like TensorRT to improve inference speed with little loss of accuracy. The experiments show that the dataset quality enhancement framework based on the diffusion model generates high quality datasets, and the enhanced dataset can greatly improve the detection performance of FTA-DETR (mAP increased by 2.42%). Meanwhile, FTA-DETR outperforms almost all current fire detection frameworks in terms of detection accuracy and interference resistance, with accuracy reaching 98.32% and 99.21% on the two datasets, Mivia and FireNet, respectively, and precision reaching 94% on the BoWFire dataset. In addition, FTA-DETR after being paired with the TensorRT framework achieves an inference speed of 76 FPS on the Jetson Orin Nano, a small embedded device with very limited computational power. The code is available at https://github.com/wanggoat/FTA-detr.
Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based ...approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.
This paper presents a comparative analysis between two state-of-the-art object detection models, DETR and YOLOv8, focusing on their effectiveness in fruit detection for yield prediction in ...agriculture. The study begins with data acquisition, utilizing images and corresponding annotations to train and evaluate the models. Our approach employs a data-driven methodology, dividing the dataset into training and testing sets, with rigorous validation to ensure robustness.
For DETR, evaluation results demonstrate promising performance across various IoU thresholds, indicating its effectiveness in accurately localizing fruits within bounding boxes. Additionally, YOLOv8 exhibits substantial improvements in detection performance, achieving high precision and recall rates, particularly noteworthy for "orange" and "sweet_orange" classes. Notably, the model showcases commendable proficiency even in challenging scenarios.
In conclusion, both DETR and YOLOv8 offer valuable insights for precision farming, aiding farmers in yield prediction and harvest planning. While DETR demonstrates robustness and efficiency in fruit detection, YOLOv8 excels in high-precision detection, albeit with longer training times. These findings highlight the potential of advanced object detection models in revolutionizing agricultural practices, contributing to enhanced productivity and market equilibrium.
Aiming to address common defects such as scratches, cracks, bumps, and indentations on the surface of metal bipolar plates, this thesis proposes an algorithm called HPRT-DETR for detecting defects on ...metal bipolar plates. The algorithm aims to address issues such as small defects, complex backgrounds, and low detection accuracy. To enhance the performance of the algorithm, we adopt the DA to improve the AIFI module. This enhancement enables the algorithm to focus on the defective region, helping it capture more informative features. Meanwhile, we have implemented Zoom-cat scaling splicing and SSF to enhance the multi-scale feature fusion capability of the network in the CCFM module. Additionally, we have introduced the NWD metric loss to reduce sensitivity to small target locations, thereby improving detection accuracy and efficiency. Experimental validation shows that the enhanced HPRT-DETR model achieves improvements of 6.4, 1.7, and 4.7 percentage points in accuracy, recall, and average precision, respectively, compared to the original model. These results indicate that the enhanced model lays the foundation for automated production and intelligent inspection of metal bipolar plates.
•HPRT-DETR: New algorithm for metal bipolar plate defect detection.•Deformable attention mechanism boosts AIFI.•Spatially-Sensitive Features (SSF) and Zoom-CAT improve CCFM.•Introducing Normalized Wasserstein Distance (NWD) metric loss.•HPRT-DETR: Foundation for intelligent detection of metal plates.
Remote sensing object detection has been an important and challenging research hotspot in computer vision that is widely used in military and civilian fields. Recently, the combined detection model ...of CNN and Transformer has achieved good results, but the problem of poor detection performance of small objects still needs to be solved urgently. This letter proposes a deformable DETR-based framework for object detection in remote sensing images. Firstly, Multi-Scale Split Attention (MSSA) is designed to extract more detailed feature information by grouping. Next, we propose Multi-Scale Deformable Prescreening Attention (MSDPA) mechanism in decoding layer, which achieves the purpose of pre-screening, so that the encoder-decoder structure can obtain attention map more efficiently. Finally, the A-D loss function is applied to the prediction layer, increasing the attention of small objects and optimizing the IOU function. We conduct extensive experiments on the DOTA v1.5 dataset and the HRRSD dataset, which show that the reconstructed detection model is more suitable for remote sensing objects, especially for small objects. The average detection accuracy in DOTA dataset has improved by 4.4% (up to 75.6%), especially the accuracy of small objects has raised by 5%.
Optical Music Recognition (OMR) is an important way to digitize score images and has broad application prospects in fields such as the storage of music documents, music education and digital ...creation. As a new paradigm for object detection, DETR (detection transformer) has the ability to associate contextual information, which can be exploited to resolve the OMR task. However, the original DETR does not fit OMR well due to its high computational complexity and numerous parameters. To address the DETR defects and improve the recognition accuracy of OMR, we propose a novel multi-scale DETR (M-DETR) with a multi-scale feature fusion mechanism and improved attention mechanisms. First, a new multi-scale feature fusion mechanism is designed to let the backbone network of M-DETR get rich multi-scale information. Then, a key-region attention mechanism is incorporated based on the character that the key information is concentrated on a score image. Finally, the pre-context attention mechanism is introduced to make better use of the contextual association between recognition notes in music scores. Experiment results show that M-DETR achieves recognition accuracy of 90.6% for 7 typical small-sized notes, which is better than Faster R-CNN and YOLO v5, and the improvement rate is 10.02% compared to the original DETR algorithm. The results indicate that M-DETR is an effective way for the OMR task, which also provides a new solution for the detection of small-sized objects with contextual association.
•A backbone network with a feature fusion mechanism.•A key-region attention mechanism on the information of the head regions.•A pre-context attention mechanism with correlation among the targets in music scores.•A novel M-DETR algorithm to improve the recognition accuracy for OMR.
The ability to detect and track the dynamic objects in different scenes is fundamental to real-world applications, e.g ., autonomous driving and robot navigation. However, traditional Multi-Object ...Tracking (MOT) is limited to track objects belonging to the pre-defined closed-set categories. Recently, Generic MOT (GMOT) is proposed to track interested objects beyond pre-defined categories and it can be divided into Open-Vocabulary MOT (OVMOT) and Template-Image-based MOT (TIMOT). Taking the consideration that the expensive well pre-trained (vision-)language model and fine-grained category annotations are required to train OVMOT models, in this paper, we focus on TIMOT and propose a simple but effective method, Siamese-DETR. Only the commonly used detection datasets ( e.g ., COCO) are required for training. Different from existing TIMOT methods, which train a Single Object Tracking (SOT) based detector to detect interested objects and then apply a data association based MOT tracker to get the trajectories, we leverage the inherent object queries in DETR variants. Specifically: 1) The multi-scale object queries are designed based on the given template image, which are effective for detecting different scales of objects with the same category as the template image; 2) A dynamic matching training strategy is introduced to train Siamese-DETR on commonly used detection datasets, which takes full advantage of provided annotations; 3) The online tracking pipeline is simplified through a tracking-by-query manner by incorporating the tracked boxes in the previous frame as additional query boxes. The complex data association is replaced with the much simpler Non-Maximum Suppression (NMS). Extensive experimental results show that Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.