Person Re-Identification by Iterative Re-Weighted Sparse Ranking Lisanti, Giuseppe; Masi, Iacopo; Bagdanov, Andrew D. ...
IEEE transactions on pattern analysis and machine intelligence,
2015-Aug.-1, 2015-Aug, 2015-8-1, 20150801, Volume:
37, Issue:
8
Journal Article
Peer reviewed
Open access
In this paper we introduce a method for person re-identification based on discriminative, sparse basis expansions of targets in terms of a labeled gallery of known individuals. We propose an ...iterative extension to sparse discriminative classifiers capable of ranking many candidate targets. The approach makes use of soft- and hard- re-weighting to redistribute energy among the most relevant contributing elements and to ensure that the best candidates are ranked at each iteration. Our approach also leverages a novel visual descriptor which we show to be discriminative while remaining robust to pose and illumination variations. An extensive comparative evaluation is given demonstrating that our approach achieves state-of-the-art performance on single- and multi-shot person re-identification scenarios on the VIPeR, i-LIDS, ETHZ, and CAVIAR4REID datasets. The combination of our descriptor and iterative sparse basis expansion improves state-of-the-art rank-1 performance by six percentage points on VIPeR and by 20 on CAVIAR4REID compared to other methods with a single gallery image per person. With multiple gallery and probe images per person our approach improves by 17 percentage points the state-of-the-art on i-LIDS and by 72 on CAVIAR4REID at rank-1. The approach is also quite efficient, capable of single-shot person re-identification over galleries containing hundreds of individuals at about 30 re-identifications per second.
With the establishment of Industry 4.0 , machines are now required to interact with workers. By observing biometrics they can assess if humans are authorized, or mentally and physically fit to work. ...Understanding body language, makes human-machine interaction more natural, secure, and effective. Nonetheless, traditional cameras have limitations; low frame rate and dynamic range hinder a comprehensive human understanding. This poses a challenge, since faces undergo frequent instantaneous microexpressions. In addition, this is privacy-sensitive information that must be protected. We propose to model expressions with event cameras, bio-inspired vision sensors that have found application within the Industry 4.0 scope. They capture motion at millisecond rates and work under challenging conditions like low illumination and highly dynamic scenes. Such cameras are also privacy-preserving, making them extremely interesting for industry. We show that using event cameras, we can understand human reactions by only observing facial expressions. Comparison with red-green-blue (RGB)-based modeling demonstrates improved effectiveness and robustness.
Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image, that are more likely to contain the ...object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. We propose to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences. Different from tracking we only require a previous frame to improve both proposal and detection: no prediction based on local motion is performed, thus avoiding tracking errors. We obtain three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.
•Full pipeline for data augmentation for small object detection.•High-quality synthetic data combining a GAN with image inpainting and blending.•DS-GAN generates realistic small objects from larger ...ones.
Object detection accuracy on small objects, i.e., objects under 32 × 32 pixels, lags behind that of large ones. To address this issue, innovative architectures have been designed and new datasets have been released. Still, the number of small objects in many datasets does not suffice for training. The advent of the generative adversarial networks (GANs) opens up a new data augmentation possibility for training architectures without the costly task of annotating huge datasets for small objects. In this paper, we propose a full pipeline for data augmentation for small object detection which combines a GAN-based object generator with techniques of object segmentation, image inpainting, and image blending to achieve high-quality synthetic data. The main component of our pipeline is DS-GAN, a novel GAN-based architecture that generates realistic small objects from larger ones. Experimental results show that our overall data augmentation method improves the performance of state-of-the-art models up to 11.9% APs@.5 on UAVDT and by 4.7% APs@.5 on iSAID, both for the small objects subset and for a scenario where the number of training instances is limited.
Image compression is a need that arises in many circumstances. Unfortunately, whenever a lossy compression algorithm is used, artifacts will manifest. Image artifacts, caused by compression tend to ...eliminate higher frequency details and, in certain cases, may add noise or small image structures. There are two main drawbacks of this phenomenon. First, images appear much less pleasant to the human eye. Second, computer vision algorithms, such as object detectors, may be hindered and their performance reduced. Removing such artifacts means recovering the original image from a perturbed version of it. This means that one ideally should invert the compression process through a complicated nonlinear image transformation. We propose an image transformation approach based on a feedforward fully convolutional residual network model. We show that this model can be optimized either traditionally, directly optimizing an image similarity loss (SSIM), or using a generative adversarial approach (GAN). Our GAN is able to produce images with more photorealistic details than SSIM-based networks. We describe a novel training procedure based on subpatches and devise a novel testing protocol to evaluate restored images quantitatively. We show that our approach can be used as a preprocessing step for different computer vision tasks in case images are degraded by compression to a point that state-of-the art algorithms fail. In this case, our GAN-based approach obtains better performance than MSE or SSIM trained networks. Different from previously proposed approaches, we are able to remove artifacts generated at any QF by inferring the image quality directly from data.
In this paper, we present an efficient method for visual descriptors retrieval based on compact hash codes computed using a multiple k-means assignment. The method has been applied to the problem of ...approximate nearest neighbor (ANN) search of local and global visual content descriptors, and it has been tested on different datasets: three large scale standard datasets of engineered features of up to one billion descriptors (BIGANN) and, supported by recent progress in convolutional neural networks (CNNs), on CIFAR-10, MNIST, INRIA Holidays, Oxford 5K, and Paris 6K datasets; also, the recent DEEP1B dataset, composed by one billion CNN-based features, has been used. Experimental results show that, despite its simplicity, the proposed method obtains a very high performance that makes it superior to more complex state-of-the-art methods.
Compression artifacts arise in images whenever a lossy compression algorithm is applied. These artifacts eliminate details present in the original image, or add noise and small structures; because of ...these effects they make images less pleasant for the human eye, and may also lead to decreased performance of computer vision algorithms such as object detectors. To eliminate such artifacts, when decompressing an image, it is required to recover the original image from a disturbed version. To this end, we present a feed-forward fully convolutional residual network model trained using a generative adversarial framework. To provide a baseline, we show that our model can be also trained optimizing the Structural Similarity (SSIM), which is a better loss with respect to the simpler Mean Squared Error (MSE). Our GAN is able to produce images with more photorealistic details than MSE or SSIM based networks. Moreover we show that our approach can be used as a pre-processing step for object detection in case images are degraded by compression to a point that state-of-the art detectors fail. In this task, our GAN method obtains better performance than MSE or SSIM trained networks.
SMEMO: Social Memory for Trajectory Forecasting Marchetti, Francesco; Becattini, Federico; Seidenari, Lorenzo ...
IEEE transactions on pattern analysis and machine intelligence,
2024-June, 2024-Jun, 2024-6-00, 20240601, Volume:
46, Issue:
6
Journal Article
Peer reviewed
Open access
Effective modeling of human interactions is of utmost importance when forecasting behaviors such as future trajectories. Each individual, with its motion, influences surrounding agents since everyone ...obeys to social non-written rules such as collision avoidance or group following. In this paper we model such interactions, which constantly evolve through time, by looking at the problem from an algorithmic point of view, i.e., as a data manipulation task. We present a neural network based on an end-to-end trainable working memory, which acts as an external storage where information about each agent can be continuously written, updated and recalled. We show that our method is capable of learning explainable cause-effect relationships between motions of different agents, obtaining state-of-the-art results on multiple trajectory forecasting datasets.
The 3D Morphable Model (3DMM) is a powerful statistical tool for representing 3D face shapes. To build a 3DMM, a training set of face scans in full point-to-point correspondence is required, and its ...modeling capabilities directly depend on the variability contained in the training data. Thus, to increase the descriptive power of the 3DMM, establishing a dense correspondence across heterogeneous scans with sufficient diversity in terms of identities, ethnicities, or expressions becomes essential. In this manuscript, we present a fully automatic approach that leverages a 3DMM to transfer its dense semantic annotation across raw 3D faces, establishing a dense correspondence between them. We propose a novel formulation to learn a set of sparse deformation components with local support on the face that, together with an original non-rigid deformation algorithm, allow the 3DMM to precisely fit unseen faces and transfer its semantic annotation. We extensively experimented our approach, showing it can effectively generalize to highly diverse samples and accurately establish a dense correspondence even in presence of complex facial expressions. The accuracy of the dense registration is demonstrated by building a heterogeneous, large-scale 3DMM from more than 9,000 fully registered scans obtained by joining three large datasets together.