Learning Deep Features for One-Class Classification Perera, Pramuditha; Patel, Vishal M.
IEEE transactions on image processing,
2019-Nov., 2019-Nov, 2019-11-00, 20191101, Letnik:
28, Številka:
11
Journal Article
Recenzirano
We present a novel deep-learning-based approach for one-class transfer learning in which labeled data from an unrelated task is used for feature learning in one-class classification. The proposed ...method operates on top of a convolutional neural network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss and descriptiveness loss, are proposed along with a parallel CNN architecture. A template matching-based framework is introduced to facilitate the testing process. Extensive experiments on publicly available anomaly detection, novelty detection, and mobile active authentication datasets show that the proposed deep one-class (DOC) classification method achieves significant improvements over the state-of-the-art.
Severe weather conditions, such as rain and snow, adversely affect the visual quality of images captured under such conditions, thus rendering them useless for further usage and sharing. In addition, ...such degraded images drastically affect the performance of vision systems. Hence, it is important to address the problem of single image de-raining. However, the inherent ill-posed nature of the problem presents several challenges. We attempt to leverage powerful generative modeling capabilities of the recently introduced conditional generative adversarial networks (CGAN) by enforcing an additional constraint that the de-rained image must be indistinguishable from its corresponding ground truth clean image. The adversarial loss from GAN provides additional regularization and helps to achieve superior results. In addition to presenting a new approach to de-rain images, we introduce a new refined loss function and architectural novelties in the generator-discriminator pair for achieving improved results. The loss function is aimed at reducing artifacts introduced by GANs and ensure better visual quality. The generator sub-network is constructed using the recently introduced densely connected networks, whereas the discriminator is designed to leverage global and local information to decide if an image is real/fake. Based on this, we propose a novel single image de-raining method called image de-raining conditional generative adversarial network (ID-CGAN) that considers quantitative, visual, and also discriminative performance into the objective function. The experiments evaluated on synthetic and real images show that the proposed method outperforms many recent state-of-the-art single image de-raining methods in terms of quantitative and visual performances. Furthermore, the experimental results evaluated on object detection datasets using the Faster-RCNN also demonstrate the effectiveness of proposed method in improving the detection performance on images degraded by rain.
We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, ...HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.
•Survey on CNN-based approaches for crowd counting and density estimation.•Discussion on recent hand-crafted representations-based methods.•Recently datasets that pose various challenges are ...discussed.•Detailed analysis and comparison of results of CNN-based and traditional methods.•Discussion on future directions and trends for further progress.
Estimating count and density maps from crowd images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. In addition, techniques developed for crowd counting can be applied to related tasks in other fields of study such as cell microscopy, vehicle counting and environmental survey. The task of crowd counting and density map estimation is riddled with many challenges such as occlusions, non-uniform density, intra-scene and inter-scene variations in scale and perspective. Nevertheless, over the last few years, crowd count analysis has evolved from earlier methods that are often limited to small variations in crowd density and scales to the current state-of-the-art methods that have developed the ability to perform successfully on a wide range of scenarios. The success of crowd counting methods in the recent years can be largely attributed to deep learning and publications of challenging datasets. In this paper, we provide a comprehensive survey of recent Convolutional Neural Network (CNN) based approaches that have demonstrated significant improvements over earlier methods that rely largely on hand-crafted representations. First, we briefly review the pioneering methods that use hand-crafted representations and then we delve in detail into the deep learning-based approaches and recently published datasets. Furthermore, we discuss the merits and drawbacks of existing CNN-based approaches and identify promising avenues of research in this rapidly evolving field.
Sparse Representation-Based Open Set Recognition He Zhang; Patel, Vishal M.
IEEE transactions on pattern analysis and machine intelligence,
2017-Aug.-1, 2017-08-00, 2017-8-1, 20170801, Letnik:
39, Številka:
8
Journal Article
Recenzirano
We propose a generalized Sparse Representation-based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC ...algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms.
We present a novel convolutional neural network (CNN) based approach for one-class classification. The idea is to use a zero centered Gaussian noise in the latent space as the pseudo-negative class ...and train the network using the cross-entropy loss to learn a good representation as well as the decision boundary for the given class. A key feature of the proposed approach is that any pre-trained CNN can be used as the base network for one-class classification. The proposed one-class CNN is evaluated on the UMDAA-02 Face, Abnormality-1001, and FounderType-200 datasets. These datasets are related to a variety of one-class application problems such as user authentication, abnormality detection, and novelty detection. Extensive experiments demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. The source code is available at: github.com/otkupjnoz/oc-cnn.
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. In comparison to existing datasets, the proposed ...dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. Additionally, the dataset consists of a rich set of annotations at both image-level and head-level. Several recent methods are evaluated and compared on this dataset. The dataset can be downloaded from http://www.crowd-counting.com . Furthermore, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. The proposed method uses VGG16 as the backbone network and employs density map generated by the final layer as a coarse prediction to refine and generate finer density maps in a progressive fashion using residual learning. Additionally, the residual learning is guided by an uncertainty-based confidence weighting mechanism that permits the flow of only high-confidence residuals in the refinement path. The proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is evaluated on recent complex datasets, and it achieves significant improvements In errors.
Deep Multimodal Subspace Clustering Networks Abavisani, Mahdi; Patel, Vishal M.
IEEE journal of selected topics in signal processing,
12/2018, Letnik:
12, Številka:
6
Journal Article
Recenzirano
We present convolutional neural network based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages-multimodal encoder, self-expressive ...layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressiveness property and acquiring an affinity matrix corresponding to the data points. The decoder reconstructs the original input data. The network uses the distance between the decoder's reconstruction and the original input in its training. We investigate early, late, and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressive layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods.
Single image-based crowd counting has recently witnessed increased focus, but many leading methods are far from optimal, especially in highly congested scenes. In this paper, we present the ...Hierarchical Attention-based Crowd Counting Network (HA-CCN) that employs attention mechanisms at various levels to selectively enhance the features of the network. The proposed method, which is based on the VGG16 network, consists of a spatial attention module (SAM) and a set of global attention modules (GAM). SAM enhances low-level features in the network by infusing spatial segmentation information, whereas the GAM focuses on enhancing channel-wise information in the higher level layers. The proposed method is a single-step training framework, simple to implement and achieves the state-of-the-art results on different datasets. Furthermore, we extend the proposed counting network by introducing a novel set-up to adapt the network to different scenes and datasets via weak supervision using image-level labels. This new set up reduces the burden of acquiring labor intensive point-wise annotations for new datasets while improving the cross-dataset performance.
Deep Multitask Learning for Railway Track Inspection Gibert, Xavier; Patel, Vishal M.; Chellappa, Rama
IEEE transactions on intelligent transportation systems,
2017-Jan., 2017-1-00, 20170101, Letnik:
18, Številka:
1
Journal Article
Recenzirano
Railroad tracks need to be periodically inspected and monitored to ensure safe transportation. Automated track inspection using computer vision and pattern recognition methods has recently shown the ...potential to improve safety by allowing for more frequent inspections while reducing human errors. Achieving full automation is still very challenging due to the number of different possible failure modes, as well as the broad range of image variations that can potentially trigger false alarms. In addition, the number of defective components is very small, so not many training examples are available for the machine to learn a robust anomaly detector. In this paper, we show that detection performance can be improved by combining multiple detectors within a multitask learning framework. We show that this approach results in improved accuracy for detecting defects on railway ties and fasteners.