Abnormal Event Detection at 150 FPS in MATLAB Lu, Cewu; Shi, Jianping; Jia, Jiaya
2013 IEEE International Conference on Computer Vision,
12/2013
Conference Proceeding, Journal Article
Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse ...combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140-150 frames per second on average when computing on an ordinary desktop PC using MATLAB.
RMPE: Regional Multi-person Pose Estimation Hao-Shu Fang; Shuqin Xie; Yu-Wing Tai ...
2017 IEEE International Conference on Computer Vision (ICCV),
2017-Oct.
Conference Proceeding
Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. ...These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve 76:7 mAP on the MPII (multi person) dataset3. Our model and source codes are made publicly available.
Toward weather condition recognition, we emphasize the importance of regional cues in this paper and address a few important problems regarding appropriate representation, its differentiation among ...regions, and weather-condition feature construction. Our major contribution is, first, to construct a multi-class benchmark data set containing 65 000 images from six common categories for sunny, cloudy, rainy, snowy, haze, and thunder weather. This data set also benefits weather classification and attribute recognition. Second, we propose a deep learning framework named region selection and concurrency model (RSCM) to help discover regional properties and concurrency. We evaluate RSCM on our multi-class benchmark data and another public data set for weather recognition.
We propose a fine-grained recognition system that incorporates part localization, alignment, and classification in one deep neural network. This is a nontrivial process, as the input to the ...classification module should be functions that enable back-propagation in constructing the solver. Our major contribution is to propose a valve linkage function (VLF) for back-propagation chaining and form our deep localization, alignment and classification (LAC) system. The VLF can adaptively compromise the errors of classification and alignment when training the LAC model. It in turn helps update localization. The performance on fine-grained object data bears out the effectiveness of our LAC system.
Accurate whole-body multi-person pose estimation and tracking is an important yet challenging topic in computer vision. To capture the subtle actions of humans for complex behavior analysis, ...whole-body pose estimation including the face, body, hand and foot is essential over conventional body-only pose estimation. In this article, we present AlphaPose, a system that can perform accurate whole-body pose estimation and tracking jointly while running in realtime. To this end, we propose several new techniques: Symmetric Integral Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and Pose Aware Identity Embedding for jointly pose estimation and tracking. During training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain knowledge distillation to further improve the accuracy. Our method is able to localize whole-body keypoints accurately and tracks humans simultaneously given inaccurate bounding boxes and redundant detections. We show a significant improvement over current state-of-the-art methods in both speed and accuracy on COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose estimation dataset. Our model, source codes and dataset are made publicly available at https://github.com/MVIG-SJTU/AlphaPose .
Large repositories of 3D shapes provide valuable input for data-driven analysis and modeling tools. They are especially powerful once annotated with semantic information such as salient regions and ...functional parts. We propose a novel active learning method capable of enriching
massive
geometric datasets with
accurate
semantic region annotations. Given a shape collection and a user-specified region label our goal is to correctly demarcate the corresponding regions with minimal manual work. Our active framework achieves this goal by cycling between manually annotating the regions, automatically propagating these annotations across the rest of the shapes, manually verifying both human and automatic annotations, and learning from the verification results to improve the automatic propagation algorithm. We use a unified utility function that explicitly models the time cost of human input across all steps of our method. This allows us to jointly optimize for the set of models to annotate and for the set of models to verify based on the predicted impact of these actions on the human efficiency. We demonstrate that incorporating verification of all produced labelings within this unified objective improves both accuracy and efficiency of the active learning procedure. We automatically propagate human labels across a dynamic shape network using a conditional random field (CRF) framework, taking advantage of global shape-to-shape similarities, local feature similarities, and point-to-point correspondences. By combining these diverse cues we achieve higher accuracy than existing alternatives. We validate our framework on existing benchmarks demonstrating it to be significantly more efficient at using human input compared to previous techniques. We further validate its efficiency and robustness by annotating a massive shape dataset, labeling over 93,000 shape parts, across multiple model classes, and providing a labeled part collection more than
one order of magnitude
larger than existing ones.
Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in ...crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In this paper, we propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. Our model consists of two key components: joint-candidate single person pose estimation (SPPE) and global maximum joints association. With multi-peak prediction for each joint and global association using the graph model, our method is robust to inevitable interference in crowded scenes and very efficient in inference. The proposed method surpasses the state-of-the-art methods on CrowdPose dataset by 5.2 mAP and results on MSCOCO dataset demonstrate the generalization ability of our method.
We propose a deep learning approach for directly estimating relative atmospheric visibility from outdoor photos without relying on weather images or data that require expensive sensing or custom ...capture. Our data-driven approach capitalizes on a large collection of Internet images to learn rich scene and visibility varieties. The relative CNN-RNN coarse-to-fine model, where CNN stands for convolutional neural network and RNN stands for recurrent neural network, exploits the joint power of relative support vector machine, which has a good ranking representation, and the data-driven deep learning features derived from our novel CNN-RNN model. The CNN-RNN model makes use of shortcut connections to bridge a CNN module and an RNN coarse-to-fine module. The CNN captures the global view while the RNN simulates human's attention shift, namely, from the whole image (global) to the farthest discerned region (local). The learned relative model can be adapted to predict absolute visibility in limited scenarios. Extensive experiments and comparisons are performed to verify our method. We have built an annotated dataset consisting of about 40000 images with 0.2 million human annotations. The large-scale, annotated visibility data set will be made available to accompany this paper.
Given a single outdoor image, we propose a collaborative learning approach using novel weather features to label the image as either sunny or cloudy. Though limited, this two-class classification ...problem is by no means trivial given the great variety of outdoor images captured by different cameras where the images may have been edited after capture. Our overall weather feature combines the data-driven convolutional neural network (CNN) feature and well-chosen weather-specific features. They work collaboratively within a unified optimization framework that is aware of the presence (or absence) of a given weather cue during learning and classification. In this paper we propose a new data augmentation scheme to substantially enrich the training data, which is used to train a latent SVM framework to make our solution insensitive to global intensity transfer. Extensive experiments are performed to verify our method. Compared with our previous work and the sole use of a CNN classifier, this paper improves the accuracy up to 7-8 percent. Our weather image dataset is available together with the executable of our classifier.
Robust object grasping in cluttered scenes is vital to all robotic prehensile manipulation. In this paper, we present the GraspNet-1Billion benchmark that contains rich real-world captured cluttered ...scenarios and abundant annotations. This benchmark aims at solving two critical problems for the cluttered scenes parallel-finger grasping: the insufficient real-world training data and the lacking of evaluation benchmark. We first contribute a large-scale grasp pose detection dataset. Two different depth cameras based on structured-light and time-of-flight technologies are adopted. Our dataset contains 97,280 RGB-D images with over one billion grasp poses. In total, 190 cluttered scenes are collected, among which 100 are training set and 90 are for testing. Meanwhile, we build an evaluation system that is general and user-friendly. It directly reports a predicted grasp pose’s quality by analytic computation, which is able to evaluate any kind of grasp representation without exhaustively labeling the ground-truth. We further divide the test set into three difficulties to better evaluate algorithms’ generalization ability. Our dataset, accessing API and evaluation code, are publicly available at www.graspnet.net.