Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to ...low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
In this paper, we study the problem of segmenting tracked feature point trajectories of multiple moving objects in an image sequence. Using the affine camera model, this problem can be cast as the ...problem of segmenting samples drawn from multiple linear subspaces. In practice, due to limitations of the tracker, occlusions, and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or corrupted entries. In this paper, we develop a robust subspace separation scheme that deals with these practical issues in a unified mathematical framework. Our methods draw strong connections between lossy compression, rank minimization, and sparse representation. We test our methods extensively on the Hopkins155 motion segmentation database and other motion sequences with outliers and missing data. We compare the performance of our methods to state-of-the-art motion segmentation methods based on expectation-maximization and spectral clustering. For data without outliers or missing information, the results of our methods are on par with the state-of-the-art results and, in many cases, exceed them. In addition, our methods give surprisingly good performance in the presence of the three types of pathological trajectories mentioned above. All code and results are publicly available at http://perception.csl.uiuc.edu/coding/motion/.
•An estimate of the Bayes cost is proposed as the loss to train neural networks for ordinal classification of imbalanced data.•The network parameters, as well as the decision thresholds, are updated ...during training to minimize the Bayes cost.•The neural network architecture has a single neuron in the output layer (one-dimensional input space).•Both shallow networks and deep networks can be used.•Experiments with real data show the accuracy and flexibility of the proposed method, specially in imbalanced problems.
Ordinal classification of imbalanced data is a challenging problem that appears in many real world applications. The challenge is to simultaneously consider the order of the classes and the class imbalance, which can notably improve the performance metrics. The Bayesian formulation allows to deal with these two characteristics jointly: It takes into account the prior probability of each class and the decision costs, which can be used to include the imbalance and the ordinal information, respectively. We propose to use the Bayesian formulation to train neural networks, which have shown excellent results in many classification tasks. A loss function is proposed to train networks with a single neuron in the output layer and a threshold based decision rule. The loss is an estimate of the Bayesian classification cost, based on the Parzen windows estimator, which is fitted for a thresholded decision. Experiments with several real datasets show that the proposed method provides competitive results in different scenarios, due to its high flexibility to specify the relative importance of the errors in the classification of patterns of different classes, considering the order and independently of the probability of each class.
Materials that exhibit high nonlinear optical (NLO) susceptibilities are considered as promising candidates for a wide range of photonic and electronic applications. Here, we argue that the ...ferroelectric nematic (N
F
) materials have sufficient potentialities to become materials for the next-generation of NLO devices. We have carried out a study of the efficiency of optical second-harmonic generation in a prototype N
F
material, finding a nonlinear susceptibility of 5.6 pm.V
−1
in the transparent regime, one of the highest ever reported in ferroelectric liquid crystals. Given the fact that the studied molecule was not specifically designed for NLO applications, we conclude there is still margin to obtain N
F
materials with enhanced properties that should allow their practical use.
This paper presents a novel approach to visual saliency that relies on a contextually adapted representation produced through adaptive whitening of color and scale features. Unlike previous models, ...the proposal is grounded on the specific adaptation of the basis of low level features to the statistical structure of the image. Adaptation is achieved through decorrelation and contrast normalization in several steps in a hierarchical approach, in compliance with coarse features described in biological visual systems. Saliency is simply computed as the square of the vector norm in the resulting representation. The performance of the model is compared with several state-of-the-art approaches, in predicting human fixations using three different eye-tracking datasets. Referring this measure to the performance of human priority maps, the model proves to be the only one able to keep the same behavior through different datasets, showing free of biases. Moreover, it is able to predict a wide set of relevant psychophysical observations, to our knowledge, not reproduced together by any other model before.
► Novel model of saliency based on the contextual adaptation of low level features. ► Outperforms existing models in predicting fixations, in both performance and robustness. ► Comparison with single subject priority performance reveals strong design biases in other models. ► Improved capability of reproducing psychophysical results.
Combination approaches provide an interesting way to improve adaptive filter performance. In this paper, we study the mean-square performance of a convex combination of two transversal filters. The ...individual filters are independently adapted using their own error signals, while the combination is adapted by means of a stochastic gradient algorithm in order to minimize the error of the overall structure. General expressions are derived that show that the method is universal with respect to the component filters, i.e., in steady-state, it performs at least as well as the best component filter. Furthermore, when the correlation between the a priori errors of the components is low enough, their combination is able to outperform both of them. Using energy conservation relations, we specialize the results to a combination of least mean-square filters operating both in stationary and in nonstationary scenarios. We also show how the universality of the scheme can be exploited to design filters with improved tracking performance.
Generalized principal component analysis (GPCA) Vidal, R.; Yi Ma; Sastry, S.
IEEE transactions on pattern analysis and machine intelligence,
12/2005, Letnik:
27, Številka:
12
Journal Article
Recenzirano
Odprti dostop
This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces ...with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a high-dimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as k-subspaces and expectation maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.
We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data point can be expressed as ...a linear combination of the representatives and formulate the problem of finding the representatives as a sparse multiple measurement vector problem. In our formulation, both the dictionary and the measurements are given by the data matrix, and the unknown sparse codes select the representatives via convex optimization. In general, we do not assume that the data are low-rank or distributed around cluster centers. When the data do come from a collection of low-rank models, we show that our method automatically selects a few representatives from each low-rank model. We also analyze the geometry of the representatives and discuss their relationship to the vertices of the convex hull of the data. We show that our framework can be extended to detect and reject outliers in datasets, and to efficiently deal with new observations and large datasets. The proposed framework and theoretical foundations are illustrated with examples in video summarization and image classification using representatives.
The number of host species infected by a mistletoe (host range) is critical in that it influences prevalence, virulence and overall distribution of the parasite; however, macroecological analyses of ...this life history feature are lacking for many regions. The Andean-Patagonian forest, found along the southern Andes from 35 °S to Tierra del Fuego at 55 °S, contains 12 mistletoe species in three families (Loranthaceae, Misodendraceae and Santalaceae). By tabulating herbarium records, the host ranges and geographical distributions of these mistletoes were explored. Our results show that these parasites occur on 43 plant species in 24 families but with varying degrees of specificity. All Misodendrum species and Desmaria mutabilis (Loranthaceae) are specialists that use Nothofagus as their primary hosts. Tristerix and Notanthera (Loranthaceae) and Antidaphne and Lepidoceras (Santalaceae) are generalists parasitizing more than six host species from several genera and families. Although many of the mistletoe species are sympatric, there is low overlap in host use. Our data show that in the southern South American bioregion, generalist mistletoes have smaller geographic ranges than specialists. This contrast with a previous hypothesis that predicted mistletoes with large geographic ranges would also have large host ranges, and conversely, less diverse regions would have more specialised mistletoes.