Small object detection is a challenging problem in computer vision. It has been widely applied in defense military, transportation, industry, etc. To facilitate in-depth understanding of small object ...detection, we comprehensively review the existing small object detection methods based on deep learning from five aspects, including multi-scale feature learning, data augmentation, training strategy, context-based detection and GAN-based detection. Then, we thoroughly analyze the performance of some typical small object detection algorithms on popular datasets, such as MS-COCO, PASCAL-VOC. Finally, the possible research directions in the future are pointed out from five perspectives: emerging small object detection datasets and benchmarks, multi-task joint learning and optimization, information transmission, weakly supervised small object detection methods and framework for small object detection task.
Estimating the pose of multiple animals is a challenging computer vision problem: frequent interactions cause occlusions and complicate the association of detected keypoints to the correct ...individuals, as well as having highly similar looking animals that interact more closely than in typical multi-human scenarios. To take up this challenge, we build on DeepLabCut, an open-source pose estimation toolbox, and provide high-performance animal assembly and tracking-features required for multi-animal scenarios. Furthermore, we integrate the ability to predict an animal's identity to assist tracking (in case of occlusions). We illustrate the power of this framework with four datasets varying in complexity, which we release to serve as a benchmark for future algorithm development.
Learning-based multi-view stereo (MVS) methods have demonstrated promising results. However, very few existing networks explicitly take the pixel-wise visibility into consideration, resulting in ...erroneous cost aggregation from occluded pixels. In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework
Vis-MVSNet
significantly improves depth accuracy in reconstruction scenes with severe occlusion. Extensive experiments are performed on
DTU
,
BlendedMVS
,
Tanks and Temples
and
ETH3D
datasets to justify the effectiveness of the proposed framework.
In this paper, we present a deep multi-task learning framework able to couple semantic segmentation and change detection using fully convolutional long short-term memory (LSTM) networks. In ...particular, we present a UNet-like architecture (LUNet) which models the temporal relationship of spatial feature representations using integrated fully convolutional LSTM blocks on top of every encoding level. In this way, the network is able to capture the temporal relationship of spatial feature vectors in all encoding levels without the need to downsample or flatten them, forming an end-to-end trainable framework. Moreover, we further enrich the L-UNet architecture with an additional decoding branch that performs semantic segmentation on the available semantic categories that are presented in the different input dates, forming a multi-task framework. Different loss quantities are also defined and combined together in a circular way to boost the overall performance. The developed methodology has been evaluated on three different datasets, i.e, the challenging bi-temporal high-resolution ONERA Satellite Change Detection (OSCD) Sentinel-2 dataset, the very high-resolution multitemporal dataset of the East Prefecture of Attica, Greece, and lastly, the multitemporal very high-resolution SpaceNet7 dataset. Promising quantitative and qualitative results demonstrated that the synergy among the tasks can boost up the achieved performances. In particular, the proposed multi-task framework contributed to a significant decrease of false positive detections, with F1 rate outperforming other state of the art methods by at least 2.1% and 4.9% in the Attica VHR and SpaceNet7 dataset case respectively. Our models and code can be found at: https://github.com/mpapadomanolaki/multi-task-L-UNet
Abstract
In recent years, computer vision technology in the field of science and technology has been stubbornly high, making a lot of scientific and technical personnel involved in the research gate ...technology. Depth learning, artificial intelligence in recent years, often around our ears, vision technology is occupying a place in the field of artificial intelligence. This paper mainly studies the recognition and localization of robots based on vision. The mobile robot is detected by the background difference method, and the background model is established by the Gaussian background modeling. Recognizing the color and number tags on the robot for matching the robot’s features. Real-time positioning of the robot under the premise of the same background.
Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive ...analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and have released software and evaluation code. We summarize important conclusions here: (1) Coarse pose estimation appears viable for scenes with isolated hands. However, high precision pose estimation required for immersive virtual reality and cluttered scenes (where hands may be interacting with nearby objects and surfaces) remain a challenge. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.