Efficient Training for Positive Unlabeled Learning Sansone, Emanuele; De Natale, Francesco G. B.; Zhou, Zhi-Hua
IEEE transactions on pattern analysis and machine intelligence,
2019-Nov.-1, 2019-Nov, 2019-11-1, 20191101, Letnik:
41, Številka:
11
Journal Article
Recenzirano
Odprti dostop
Positive unlabeled (PU) learning is useful in various practical situations, where there is a need to learn a classifier for a class of interest from an unlabeled data set, which may contain anomalies ...as well as samples from unknown classes. The learning task can be formulated as an optimization problem under the framework of statistical learning theory. Recent studies have theoretically analyzed its properties and generalization performance, nevertheless, little effort has been made to consider the problem of scalability, especially when large sets of unlabeled data are available. In this work we propose a novel scalable PU learning algorithm that is theoretically proven to provide the optimal solution, while showing superior computational and memory performance. Experimental evaluation confirms the theoretical evidence and shows that the proposed method can be successfully applied to a large variety of real-world problems involving PU learning.
Deep Learning for Mobile Multimedia Ota, Kaoru; Dao, Minh Son; Mezaris, Vasileios ...
ACM transactions on multimedia computing communications and applications,
08/2017, Letnik:
13, Številka:
3s
Journal Article
Recenzirano
Odprti dostop
Deep Learning (DL) has become a crucial technology for multimedia computing. It offers a powerful instrument to automatically produce high-level abstractions of complex multimedia data, which can be ...exploited in a number of applications, including object detection and recognition, speech-to- text, media retrieval, multimodal data analysis, and so on. The availability of affordable large-scale parallel processing architectures, and the sharing of effective open-source codes implementing the basic learning algorithms, caused a rapid diffusion of DL methodologies, bringing a number of new technologies and applications that outperform, in most cases, traditional machine learning technologies. In recent years, the possibility of implementing DL technologies on mobile devices has attracted significant attention. Thanks to this technology, portable devices may become smart objects capable of learning and acting. The path toward these exciting future scenarios, however, entangles a number of important research challenges. DL architectures and algorithms are hardly adapted to the storage and computation resources of a mobile device. Therefore, there is a need for new generations of mobile processors and chipsets, small footprint learning and inference algorithms, new models of collaborative and distributed processing, and a number of other fundamental building blocks. This survey reports the state of the art in this exciting research area, looking back to the evolution of neural networks, and arriving to the most recent results in terms of methodologies, technologies, and applications for mobile environments.
In this paper, we propose a forensic technique that is able to detect the application of a median filter to 1D data. The method relies on deterministic mathematical properties of the median filter, ...which lead to the identification of specific relationships among the sample values that cannot be found in the filtered sequences. Hence, their presence in the analyzed 1D sequence allows excluding the application of the median filter. Owing to its deterministic nature, the method ensures 0% false negatives, and although false positives (sequences not filtered classified as filtered) are theoretically possible, experimental results show that the false alarm rate is null for sufficiently long sequences. Furthermore, the proposed technique has the capability to locate with good precision a median filtered part of 1-D data and provides a good estimate of the window size used.
Mathematical morphology provides a large set of powerful non-linear image operators, widely used for feature extraction, noise removal or image enhancement. Although morphological filters might be ...used to remove artifacts produced by image manipulations, both on binary and gray level documents, little effort has been spent towards their forensic identification. In this paper we propose a non-trivial extension of a deterministic approach originally detecting erosion and dilation of binary images. The proposed approach operates on grayscale images and is robust to image compression and other typical attacks. When the image is attacked the method looses its deterministic nature and uses a properly trained SVM classifier, using the original detector as a feature extractor. Extensive tests demonstrate that the proposed method guarantees very high accuracy in filtering detection, providing 100% accuracy in discriminating the presence and the type of morphological filter in raw images of three different datasets. The achieved accuracy is also good after JPEG compression, equal or above 76.8% on all datasets for quality factors above 80. The proposed approach is also able to determine the adopted structuring element for moderate compression factors. Finally, it is robust against noise addition and it can distinguish morphological filter from other filters.
Camera calibration is a necessary preliminary step in computer vision for the estimation of the position of objects in the 3D world. Despite the intrinsic camera parameters can be easily computed ...offline, extrinsic parameters need to be computed each time a camera changes its position, thus not allowing for fast and dynamic network re-configuration. In this paper we present an unsupervised and automatic framework for the estimation of the extrinsic parameters of a camera network, which leverages on optimised 3D human mesh recovery from a single image, and which does not require the use of additional markers. We show how it is possible to retrieve the real-world position of the cameras in the network together with the floor plane, exploiting regular RGB images and with a weak prior knowledge of the internal parameters. Our framework can also work with a single camera and in real-time, allowing the user to add, re-position, or remove cameras from the network in a dynamic fashion.
Understanding the subjective meaning of a visual query, by converting it into numerical parameters that can be extracted and compared by a computer, is the paramount challenge in the field of ...intelligent image retrieval, also referred to as the ¿semantic gap¿ problem. In this paper, an innovative approach is proposed that combines a relevance feedback (RF) approach with an evolutionary stochastic algorithm, called particle swarm optimizer (PSO), as a way to grasp user's semantics through optimized iterative learning. The retrieval uses human interaction to achieve a twofold goal: 1) to guide the swarm particles in the exploration of the solution space towards the cluster of relevant images; 2) to dynamically modify the feature space by appropriately weighting the descriptive features according to the users' perception of relevance. Extensive simulations showed that the proposed technique outperforms traditional deterministic RF approaches of the same class, thanks to its stochastic nature, which allows a better exploration of complex, nonlinear, and highly-dimensional solution spaces.
Clustering images according to their acquisition devices is a well-known problem in multimedia forensics, which is typically faced by means of camera sensor pattern noise (SPN). Such an issue is ...challenging since SPN is a noise-like signal, hard to be estimated, and easy to be attenuated or destroyed by many factors. Moreover, the high dimensionality of SPN hinders large-scale applications. Existing approaches are typically based on the correlation among SPNs in the pixel domain, which might not be able to capture intrinsic data structure in the union of vector subspaces. In this paper, we propose an accurate clustering framework, which exploits linear dependences among SPNs in their intrinsic vector subspaces. Such dependences are encoded under sparse representations, which are obtained by solving an LASSO problem with non-negativity constraint. The proposed framework is highly accurate in a number of clusters' estimation and image association. Moreover, our framework is scalable to the number of images and robust against double JPEG compression as well as the presence of outliers, owning big potential for real-world applications. Experimental results on Dresden and Vision database show that our proposed framework can adapt well to both medium-scale and large-scale contexts and outperforms the state-of-the-art methods.
•This paper presents a prototype to assist blind people in indoor environments.•The prototype incorporates recognition and guidance units.•It comprises also a voice-user interface.•Tests in a public ...indoor space demonstrate promising capabilities.
Assistive technologies for blind people are showing a fast growth, providing useful tools to support daily activities and to improve social inclusion. Most of these technologies are mainly focused on helping blind people to navigate and avoid obstacles. Other works emphasize on providing them assistance to recognize their surrounding objects. Very few of them however couple both aspects (i.e., navigation and recognition). With the aim to address the aforesaid needs, we describe in this paper an innovative prototype, which offers the capabilities to (i) move autonomously and to (ii) recognize multiple objects in public indoor environments. It incorporates lightweight hardware components (camera, IMU, and laser sensors), all mounted on a reasonably-sized integrated device to be placed on the chest. It requires the indoor environment to be ‘blind-friendly’, i.e., prior information about it should be prepared and loaded in the system beforehand. Its algorithms are mainly based on advanced computer vision and machine learning approaches. The interaction between the user and the system is performed through speech recognition and synthesis modules. The prototype offers to the user the possibility to (i) walk across the site to reach the desired destination, avoiding static and mobile obstacles, and (ii) ask the system through vocal interaction to list the prominent objects in the user's field of view. We illustrate the performances of the proposed prototype through experiments conducted in a blind-friendly indoor space equipped at our Department premises.
In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually ...based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, therefore limiting the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization.
Diversification of search results allows for better and faster search, gaining knowledge about different perspectives and viewpoints on retrieved information sources. Recently various methods for ...diversification of image retrieval results have been proposed, mainly using textual information or techniques imported from the natural language processing domain. However, images contain much more information than their textual descriptions and the use of visual features deserves special attention in this context. Visual saliency provides information about parts of the image perceived as most important, which are instinctively targeted by humans when shooting a photo or looking at a picture. For this reason we propose to exploit such information to improve diversification of search results. To this purpose, we introduce a saliency-based method to re-rank the results of a query and we show that it can achieve significantly better performances as compared to the baseline approach. Experimental validation conducted on a number of queries applied to various datasets demonstrates the potential of the use of saliency information for the diversification of image retrieval results.