We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and egomotion estimation from videos. The three components are coupled by the nature of 3D scene ...geometry, jointly learned by our framework in an end-to-end manner. Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately. Furthermore, we propose an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively. Experimentation on the KITTI driving dataset reveals that our scheme achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.
Hierarchical Image Saliency Detection on Extended CSSD Shi, Jianping; Yan, Qiong; Xu, Li ...
IEEE transactions on pattern analysis and machine intelligence,
2016-April-1, 2016-Apr, 2016-4-1, 20160401, Volume:
38, Issue:
4
Journal Article
Peer reviewed
Complex structures commonly exist in natural images. When an image contains small-scale high-contrast patterns either in the background or foreground, saliency detection could be adversely affected, ...resulting erroneous and non-uniform saliency assignment. The issue forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. Different from varying patch sizes or downsizing images, we measure region-based scales. The final saliency values are inferred optimally combining all the saliency cues in different scales using hierarchical inference. Through our inference model, single-scale information is selected to obtain a saliency map. Our method improves detection quality on many images that cannot be handled well traditionally. We also construct an extended Complex Scene Saliency Dataset (ECSSD) to include complex but general natural images.
•A novel domain adaptation technique called Adaptive Batch Normalization (AdaBN).•The effectiveness of AdaBN is validated for both single source and multi-source domain adaptation tasks.•Experiments ...on the cloud detection for remote sensing images demonstrate the effectiveness of AdaBN in practical use.
Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain. Recent study (Tommasi et al., 2015) shows that a DNN has strong dependency towards the training dataset, and the learned features cannot be easily transferred to a different but relevant task without fine-tuning. In this paper, we propose a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN. By modulating the statistics from the source domain to the target domain in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks. In contrary to other deep learning domain adaptation methods, our method does not require additional components, and is parameter-free. It archives state-of-the-art performance despite its surprising simplicity. Furthermore, we demonstrate that our method is complementary with other existing methods. Combining AdaBN with existing domain adaptation treatments may further improve model performance.
Abnormal Event Detection at 150 FPS in MATLAB Lu, Cewu; Shi, Jianping; Jia, Jiaya
2013 IEEE International Conference on Computer Vision,
12/2013
Conference Proceeding, Journal Article
Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse ...combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140-150 frames per second on average when computing on an ordinary desktop PC using MATLAB.
The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based ...instance segmentation framework. Specifically, we enhance the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature. We present adaptive feature pooling, which links feature grid and all feature levels to make useful information in each level propagate directly to following proposal subnetworks. A complementary branch capturing different views for each proposal is created to further improve mask prediction. These improvements are simple to implement, with subtle extra computational overhead. Yet they are useful and make our PANet reach the 1st place in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. PANet is also state-of-the-art on MVD and Cityscapes.
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both ...3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins.
Pyramid Scene Parsing Network Hengshuang Zhao; Jianping Shi; Xiaojuan Qi ...
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2017-July
Conference Proceeding
Open access
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation ...through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully ...revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.
Recently, two-dimensional (2D) atomic sheets have inspired new ideas in nanoscience including topologically protected charge transport, , spatially separated excitons, and strongly anisotropic heat ...transport. Here, we report the intriguing observation of stable nonvolatile resistance switching (NVRS) in single-layer atomic sheets sandwiched between metal electrodes. NVRS is observed in the prototypical semiconducting (MX2, M = Mo, W; and X = S, Se) transitional metal dichalcogenides (TMDs), which alludes to the universality of this phenomenon in TMD monolayers and offers forming-free switching. This observation of NVRS phenomenon, widely attributed to ionic diffusion, filament, and interfacial redox in bulk oxides and electrolytes, − inspires new studies on defects, ion transport, and energetics at the sharp interfaces between atomically thin sheets and conducting electrodes. Our findings overturn the contemporary thinking that nonvolatile switching is not scalable to subnanometre owing to leakage currents. Emerging device concepts in nonvolatile flexible memory fabrics, and brain-inspired (neuromorphic) computing could benefit substantially from the wide 2D materials design space. A new major application, zero-static power radio frequency (RF) switching, is demonstrated with a monolayer switch operating to 50 GHz.