In this paper, we propose a novel multiclass classifier for the open-set recognition scenario. This scenario is the one in which there are no a priori training samples for some classes that might ...appear during testing. Usually, many applications are inherently open set. Consequently, successful closed-set solutions in the literature are not always suitable for real-world recognition problems. The proposed open-set classifier extends upon the Nearest-Neighbor (NN) classifier. Nearest neighbors are simple, parameter independent, multiclass, and widely used for closed-set problems. The proposed Open-Set NN (OSNN) method incorporates the ability of recognizing samples belonging to classes that are unknown at training time, being suitable for open-set recognition. In addition, we explore evaluation measures for open-set problems, properly measuring the resilience of methods to unknown classes during testing. For validation, we consider large freely-available benchmarks with different open-set recognition regimes and demonstrate that the proposed OSNN significantly outperforms their counterparts in the literature.
A Survey on Learning to Hash Wang, Jingdong; Zhang, Ting; Song, Jingkuan ...
IEEE transactions on pattern analysis and machine intelligence,
04/2018, Letnik:
40, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions ...to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.
Nearest neighbor search is a fundamental and essential operation in applications from many domains, such as databases, machine learning, multimedia, and computer vision. Because exact searching ...results are not efficient for a high-dimensional space, a lot of efforts have turned to approximate nearest neighbor search. Although many algorithms have been continuously proposed in the literature each year, there is no comprehensive evaluation and analysis of their performance. In this paper, we conduct a comprehensive experimental evaluation of many state-of-the-art methods for approximate nearest neighbor search. Our study (1) is cross-disciplinary (i.e., including 19 algorithms in different domains, and from practitioners) and (2) has evaluated a diverse range of settings, including 20 datasets, several evaluation metrics, and different query workloads. The experimental results are carefully reported and analyzed to understand the performance results. Furthermore, we propose a new method that achieves both high query efficiency and high recall empirically on majority of the datasets under a wide range of settings.
•The weighted k-nearest neighbor rule outperforms the 1-nearest neighbor classifier.•Recommendations for choosing weighting schemes.•Recommendations for choosing the constraint width r and the ...neighborhood size k.
Time-series classification has been addressed by a plethora of machine-learning techniques, including neural networks, support vector machines, Bayesian approaches, and others. It is an accepted fact, however, that the plain vanilla 1-nearest neighbor (1NN) classifier, combined with an elastic distance measure such as Dynamic Time Warping (DTW), is competitive and often superior to more complex classification methods, including the majority-voting k-nearest neighbor (kNN) classifier. With this paper we continue our investigation of the kNN classifier on time-series data and the impact of various classic distance-based vote weighting schemes by considering constrained versions of four common elastic distance measures: DTW, Longest Common Subsequence (LCS), Edit Distance with Real Penalty (ERP), and Edit Distance on Real sequence (EDR). By performing experiments on the entire UCR Time Series Classification Archive we show that weighted kNN is able to consistently outperform 1NN. Furthermore, we provide recommendations for the choices of the constraint width parameter r, neighborhood size k, and weighting scheme, for each mentioned elastic distance measure.
Approaches to combine local manifold learning (LML) and the k -nearest-neighbor ( k NN) classifier are investigated for hyperspectral image classification. Based on supervised LML (SLML) and k NN, a ...new SLML-weighted k NN (SLML-W k NN) classifier is proposed. This method is appealing as it does not require dimensionality reduction and only depends on the weights provided by the kernel function of the specific ML method. Performance of the proposed classifier is compared to that of unsupervised LML (ULML) and SLML for dimensionality reduction in conjunction with the k NN (ULML- k NN and SLML- k NN). Three LML methods, locally linear embedding (LLE), local tangent space alignment (LTSA), and Laplacian eigenmaps, are investigated with these classifiers. In experiments with Hyperion and AVIRIS hyperspectral data, the proposed SLML-W k NN performed better than ULML- k NN and SLML- k NN, and the highest accuracies were obtained using weights provided by supervised LTSA and LLE.
The k-nearest neighbor (kNN) rule is a classical non-parametric classification algorithm in pattern recognition, and has been widely used in many fields due to its simplicity, effectiveness and ...intuitiveness. However, the classification performance of the kNN algorithm suffers from the choice of a fixed and single value of k for all queries in the search stage and the use of simple majority voting rule in the decision stage.
In this paper, we propose a new kNN-based algorithm, called locally adaptive k-nearest neighbor algorithm based on discrimination class (DC-LAKNN). In our method, the role of the second majority class in classification is for the first time considered. Firstly, the discrimination classes at different values of k are selected from the majority class and the second majority class in the k-neighborhood of the query. Then, the adaptive k value and the final classification result are obtained according to the quantity and distribution information on the neighbors in the discrimination classes at each value of k.
Extensive experiments on eighteen real-world datasets from UCI (University of California, Irvine) Machine Learning Repository and KEEL (Knowledge Extraction based on Evolutionary Learning) Repository show that the DC-LAKNN algorithm achieves better classification performance compared to standard kNN algorithm and nine other state-of-the-art kNN-based algorithms.
•Propose GMDKNN method.•Design categorical multi-generalized mean distances.•Design categorical nested generalized mean distance for classification.
K-nearest neighbor (KNN) rule is a well-known ...non-parametric classifier that is widely used in pattern recognition. However, the sensitivity of the neighborhood size k always seriously degrades the KNN-based classification performance, especially in the case of the small sample size with the existing outliers. To overcome this issue, in this article we propose a generalized mean distance-based k-nearest neighbor classifier (GMDKNN) by introducing multi-generalized mean distances and the nested generalized mean distance that are based on the characteristic of the generalized mean. In the proposed method, multi-local mean vectors of the given query sample in each class are calculated by adopting its class-specific k nearest neighbors. Using the achieved k local mean vectors per class, the corresponding k generalized mean distances are calculated and then used to design the categorical nested generalized mean distance. In the classification phase, the categorical nested generalized mean distance is used as the classification decision rule and the query sample is classified into the class with the minimum nested generalized mean distance among all the classes. Extensive experiments on the UCI and KEEL data sets, synthetic data sets, the KEEL noise data sets and the UCR time series data sets are conducted by comparing the proposed method to the state-of-art KNN-based methods. The experimental results demonstrate that the proposed GMDKNN performs better and has the less sensitiveness to k. Thus, our proposed GMDKNN with the robust and effective classification performance could be a promising method for pattern recognition in some expert and intelligence systems.
<inline-formula> <tex-math notation="LaTeX">{k} </tex-math></inline-formula> nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple ...implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed <inline-formula> <tex-math notation="LaTeX">{k} </tex-math></inline-formula> value (even though set by experts) to all test samples. Previous solutions assign different <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> values. In the test stage, the kTree fast outputs the optimal <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> value for each test sample, and then, the kNN classification can be conducted using the learned optimal <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed <inline-formula> <tex-math notation="LaTeX">{k} </tex-math></inline-formula> value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different <inline-formula> <tex-math notation="LaTeX">{k} </tex-math></inline-formula> values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.
We consider decomposition for a controlled-R
gate with a standard set of universal gates. For this problem, a method exists that uses a single ancillary qubit to reduce the number of gates. In this ...work, we extend this method to three ends. First, we find a method that can decompose into fewer gates than the best known results in decomposition of controlled-R
. We also confirm that the proposed method reduces the total number of gates of the quantum Fourier transform. Second, we propose another efficient decomposition that can be mapped to a nearest-neighbor architecture with only local CNOT gates. Finally, we find a method that can minimize the depth to 5 gate steps in a nearest-neighbor architecture with only local CNOT gates.
•Efficient kNN search based on feature learning, clustering, and adaptive k values.•Several proposals to automatically optimize the search parameters.•Comprehensive experimentation with 10 datasets ...of different typology and size.•Results demonstrate that the proposal outperforms state-of-the-art methods.
The k-Nearest Neighbor (kNN) algorithm is widely used in the supervised learning field and, particularly, in search and classification tasks, owing to its simplicity, competitive performance, and good statistical properties. However, its inherent inefficiency prevents its use in most modern applications due to the vast amount of data that the current technological evolution generates, being thus the optimization of kNN-based search strategies of particular interest. This paper introduces the caKD+ algorithm, which tackles this limitation by combining the use of feature learning techniques, clustering methods, adaptive search parameters per cluster, and the use of pre-calculated K-Dimensional Tree structures, and results in a highly efficient search method. This proposal has been evaluated using 10 datasets and the results show that caKD+ significantly outperforms 16 state-of-the-art efficient search methods while still depicting such an accurate performance as the one by the exhaustive kNN search.