•In this paper, we study the embedding of labels together with the group information with an objective to build an efficient multi-label classification.•We assume the existence of a low-dimensional ...space onto which the feature vectors and label vectors can be embedded.•We ensure that labels belonging to the same group share the same sparsity pattern in their low-rank representations.•The proposed method has three major stages namely (1) Identification of groups of labels; (2) Sparsity-invariant embedding of label groups; and (3) Embedding of feature matrix to the same low-rank space.•Extensive comparative studies validate the effectiveness of the proposed method against the state-of-the-art multi-label learning approaches.
Multi-label learning is concerned with the classification of data with multiple class labels. This is in contrast to the traditional classification problem where every data instance has a single label. Due to the exponential size of output space, exploiting intrinsic information in feature and label spaces has been the major thrust of research in recent years and use of parametrization and embedding have been the prime focus. Researchers have studied several aspects of embedding which include label embedding, input embedding, dimensionality reduction and feature selection. These approaches differ from one another in their capability to capture other intrinsic properties such as label correlation, local invariance etc. We assume here that the input data form groups and as a result, the label matrix exhibits a sparsity pattern and hence the labels corresponding to objects in the same group have similar sparsity. In this paper, we study the embedding of labels together with the group information with an objective to build an efficient multi-label classifier. We assume the existence of a low-dimensional space onto which the feature vectors and label vectors can be embedded. In order to achieve this, we address three sub-problems namely; (1) Identification of groups of labels; (2) Embedding of label vectors to a low rank-space so that the sparsity characteristic of individual groups remains invariant; and (3) Determining a linear mapping that embeds the feature vectors onto the same set of points, as in stage 2, in the low-dimensional space. We compare our method with seven well-known algorithms on twelve benchmark data sets. Our experimental analysis manifests the superiority of our proposed method over state-of-art algorithms for multi-label learning.
Recommender systems, especially collaborative filtering (CF) based recommender systems, has been playing an important role in many e-commerce applications. As the information being searched over the ...internet is rapidly increasing, users often face the difficulty of finding items of his/her own interest and recommender systems often provides help in such tasks. Recent studies show that, as the item space increases, and the number of items rated by the users become very less, issues like sparsity arise. To mitigate the sparsity problem, transfer learning techniques are being used wherein the data from dense domain (source) is considered in order to predict the missing entries in the sparse domain (target). In this paper, we propose a novel transfer learning approach called Transfer of Codebook via Hinge loss or TCH for cross-domain recommendation when both domains have no overlap of users and items. In our approach constructing the codebook and transferring the same knowledge from source to target domain is done in a novel way. We employ a similar formulation of co-clustering technique to obtain the codebook (cluster-level rating pattern) of source domain. By making use of hinge loss function we transfer the learnt codebook of the source domain to target. The use of hinge loss as a loss function is novel and has not been tried before in transfer learning. We demonstrate that our technique improves the approximation of the target matrix on benchmark datasets.
•We propose a model for cross-domain recommendation to address the data sparsity.•The proposed method is useful when the domains do not share common users or items.•Co-clustering technique is used to construct the codebook from the source domain.•Learnt codebook gets transferred to the target domain in a novel way via hinge loss.•We validate the performance of the proposed method on benchmark datasets.
•Multi-label learning deals with the classification of data with multiple labels.•Output space with many labels is tackle by modeling inter-label correlations.•Use of parametrization and embedding ...have been the prime focus.•A piecewise-linear embedding using maximum margin matrix factorization is proposed.•Our experimental analysis manifests the superiority of our proposed method.
Multi-label learning is concerned with the classification of data with multiple class labels. This is in contrast to the traditional classification problem where every data instance has a single label. Multi-label classification (MLC) is a major research area in the machine learning community and finds application in several domains such as computer vision, data mining and text classification. Due to the exponential size of the output space, exploiting intrinsic information in feature and label spaces has been the major thrust of research in recent years and use of parametrization and embedding have been the prime focus in MLC. Most of the existing methods learn a single linear parametrization using the entire training set and hence, fail to capture nonlinear intrinsic information in feature and label spaces. To overcome this, we propose a piecewise-linear embedding which uses maximum margin matrix factorization to model linear parametrization. We hypothesize that feature vectors which conform to similar embedding are similar in some sense. Combining the above concepts, we propose a novel hierarchical matrix factorization method for multi-label classification. Practical multi-label classification problems such as image annotation, text categorization and sentiment analysis can be directly solved by the proposed method. We compare our method with six well-known algorithms on twelve benchmark datasets. Our experimental analysis manifests the superiority of our proposed method over state-of-art algorithm for multi-label learning.
Recommender systems usually employ techniques like collaborative filtering for providing recommendations on items/services. Maximum Margin Matrix Factorization (MMMF) is an effective collaborative ...filtering approach. MMMF suffers from the data sparsity problem, i.e., the number of items rated by the users are very small as compared to the very large item space. Recently, techniques like cross-domain collaborative filtering (transfer learning) is suggested for addressing the data sparsity problem. In this paper, we propose a model for transfer learning in collaborative filtering through MMMF to address the data sparsity issue. The latent feature matrices involved in MMMF are clustered and combined to generate a cluster-level rating pattern called codebook and a codebook transfer is used for transfer of information. Transferring of codebook and finding the predicted rating matrix is done in a novel way by introducing a softness constraint into the optimization function. We have experimented our methods with different levels of sparsity using benchmark datasets. Results from experiments show that our model approximates the target matrix well.
•We propose a model for transfer learning in collaborative filtering.•Latent factor model is used to extract the knowledge from source domain.•We have constructed cluster-level rating pattern using latent factor matrices.•Cluster-level rating pattern is transferred from the source to the target domain.•We validate the performance of proposed method on benchmark datasets.
•We propose an alternative and new MMMF scheme for discrete-valued rating matrix.•Our work draws motivation of recent advent of proximal support vector machines.•The propose method overcomes the ...problem of overtting.•We validate our hypothesis by conducting experiments on real and synthetic datasets.
Maximum Margin Matrix Factorization (MMMF) has been a successful learning method in collaborative filtering research. For a partially observed ordinal rating matrix, the focus is on determining low-norm latent factor matrices U (of users) and V (of items) so as to simultaneously approximate the observed entries under some loss measure and predict the unobserved entries. When the rating matrix contains only two levels (±1), rows of V can be viewed as points in k-dimensional space and rows of U as decision hyperplanes in this space separating +1 entries from −1 entries. The concept of optimizing a loss function to determine the separating hyperplane is prevalent in support vector machines (SVM) research and when hinge/smooth hinge loss is used, the hyperplanes act as a maximum-margin separator. In MMMF, a rating matrix with multiple discrete values is treated by specially extending hinge loss function to suit multiple levels. MMMF is an efficient technique for collaborative filtering but it has several shortcomings. A prominent shortcoming is an overfitting problem wherein if learning iteration is prolonged to decrease the training error the generalization error grows. In this paper, we propose an alternative and new maximum margin factorization scheme for discrete-valued rating matrix to overcome the problem of overfitting. Our work draws motivation from a recent work on proximal support vector machines (PSVMs) wherein two parallel hyperplanes are used for binary classification and points are classified by assigning them to the class corresponding to the closest of two parallel hyperplanes. In other words, proximity to decision hyperplane is used as the classifying criterion. We show that a similar concept can be used to factorize the rating matrix if the loss function is suitably defined. The present scheme of matrix factorization has advantages over MMMF (similar to the advantages of PSVM over standard SVM). We validate our hypothesis by carrying out experiments on real and synthetic datasets.
•We describe virtual user strategy and then examine its properties.•The experimental results show that virtual strategy achieve better Precision and Recall.•We propose incremental algorithms to ...update virtual user profile.•A new measure called monotonicity is introduced to judge the efficiency of a recommender system.•Virtual-user profile + Monotonicity yield recommendations having higher accuracy on benchmark datasets.
In this paper, we propose a novel virtual user strategy using precedence relations and develop a new scheme for group recommender systems. User profiles are provided in terms of the precedence relations of items as used by group members. A virtual user for a group is constructed by taking transitive precedence of items of all members into consideration. The profile of the virtual user represents the combined profile of the group. There has not been any earlier attempt to define virtual user profile using precedence relations. We show that the proposed framework exhibits many interesting properties. Earlier approaches construct virtual user profile by considering the set of common items used by all members of the group. In the present work, we propose a method of computing weightage for each item, not necessarily common to all members, using transitive precedence. We also introduce a new measure called monotonicity to measure the performance of any recommender system. In a top-k recommendation, monotonicity tries to measure the number of items continued to be recommended when a technique is utilized incrementally. We experimented extensively for different combinations of parameter settings and for different group sizes on MovieLens data. We show that our framework has better performance in terms of precision and recall when compared with other methods. We show that our recommendation framework exhibits robust monotonicity.
Face recognition is one of the most important and widely applicable research problems in the subject area of machine learning and computer vision. Extraction of features, local or holistic, is the ...fundamental step and subspace method has been a natural choice for facial feature extraction. Among these, methods like PCA, ICA, LDA aim to reduce the dimension of the data while retaining the statistical separation property between distinct classes. Unlike the traditional ICA, in which the whole face image is stretched into a vector before calculating the independent components (ICs), Block ICA (B-ICA) partitions the facial images into blocks and takes the block as the training vector. Since the dimensionality of the training vector in B-ICA is much smaller than that in traditional ICA, reduction in face recognition error is expected. The objective of ICA is to find a separation matrix and it is achieved by a process of optimization, such as maximization of non-Gaussianity, maximum likelihood estimation, and minimization of mutual information. We observe here that the gradient-based learning can be efficiently and effectively achieved by the application of swarm-based optimization. We propose here the application of our Gradient-based Swarm Optimization method for Block ICA, where gradient information is combined with conventional swarm search to optimize the contrast function. We compare our method with B-ICA on three benchmark image data sets and show that our method achieved a better recognition rate compared to B-ICA in different block sizes with 70
%
, 80
%
and 90
%
data used for training the model.
•We propose an efficient algorithm that can compute skyline probability exactly for reasonably large database.•We introduce the concept of zero-contributing set which has zero effect in the signed ...aggregate of joint probabilities.•We propose an incremental algorithm to compute skyline probability in dynamic scenarios wherein objects are added incrementally.•The theoretical concepts developed helps to devise an efficient technique to compute skyline probability of all objects in the database.•Detailed experimental analysis for real and synthetic datasets are reported to corroborate our findings.
Efficient computation of skyline probability over uncertain preferences has not received much attention in the database community as compared to skyline probability computation over uncertain data. All known algorithms for probabilistic skyline computation over uncertain preferences attempt to find inexact value of skyline probability by resorting to sampling or to approximation scheme. Exact computation of skyline probability for database with uncertain preferences of moderate size is not possible with any of the existing algorithms. In this paper, we propose an efficient algorithm that can compute skyline probability exactly for reasonably large database. The inclusion–exclusion principle is used to express skyline probability in terms of joint probabilities of all possible combination. In this regard we introduce the concept of zero-contributing set which has zero effect in the signed aggregate of joint probabilities. Our algorithm employs a prefix-based k-level absorption to identify zero-contributing sets. It is shown empirically that only a very small portion of exponential search space remains after level wise application of prefix-based absorption. Thus it becomes possible to compute skyline probability with respect to large datasets. Detailed experimental analysis for real and synthetic datasets are reported to corroborate this claim. We also propose an incremental algorithm to compute skyline probability in dynamic scenarios wherein objects are added incrementally. Moreover, the theoretical concepts developed in this paper help to devise an efficient technique to compute skyline probability of all objects in the database. We show that the exponential search space is pruned once and then for each individual object skyline probability can be derived by inspecting a portion of the pruned lattice. We also use a concept of revival of absorbed pairs. We believe that this process is more efficient than computing the skyline probability individually.
Clustering ensemble, also referred to as consensus clustering, has emerged as a method of combining an ensemble of different clusterings to derive a final clustering that is of better quality and ...robust than any single clustering in the ensemble. Normally clustering ensemble algorithms in the literature combine all the clusterings without learning the ensemble. But by learning the ensemble, one can define the merit of a clustering or even a cluster in it, in forming a quality consensus. In this work, we propose a cluster-level surprisal measure to define the merit of a clustering that reflects both levels of agreement as well as disagreement among clusters. Using the proposed measure of merit, we devise a polynomial heuristics that judiciously selects a subset of clusterings from the ensemble that contribute positively in forming the consensus. We also empirically show that consensus achieved by our proposed method performs better in terms of quality compared to well-known clustering ensemble algorithms on different benchmark datasets.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
For determining skyline objects for an uncertain database with uncertain preferences, it is necessary to compute the skyline probability of a given object with respect to other objects. The problem ...boils down to computing the probability of the union of events from the probabilities of all possible joint probabilities. Linear Bonferroni bound is concerned with computing the bounds on the probability of the union of events with partial information. We use this technique to estimate the skyline probability of an object and propose a polynomial-time algorithm for computing sharp upper bound. We show that the use of partial information does not affect the quality of solution but helps in improving the efficiency. We formulate the problem as a Linear Programming Problem (LPP) and characterize a set of feasible points that is believed to contain all extreme points of the LPP. The maximization of the objective function over this set of points is equivalent to a bi-polar quadratic optimization problem. We use a spectral relaxation technique to solve the bi-polar quadratic optimization problem. The proposed algorithm is of O(n3) time complexity and is the first ever polynomial-time algorithm to determine skyline probability. We show that the bounds computed by our proposed algorithm determine almost the same set of skyline objects as that with the deterministic algorithm. Experimental results are presented to corroborate this claim.
•In this paper we propose three different bounds on skyline probability.•Computation of skyline points using these bounds is more accurate than the earlier sampling based technique.•Our algorithm determines almost the same set of skyline objects as that with the exact method.•We validate our hypothesis by experiments on real and synthetic datasets.