Linear Discriminant analysis (LDA) has been widely used for face recognition. However, when identifying faces in the wild, the existence of outliers that deviate significantly from the rest of data ...can arbitrarily skew the desired solution. This usually deteriorates LDA's performance dramatically, thus preventing it from mass deployment in real-world applications. To handle this problem, we propose an effective distance metric learning method based LDA, namely Euler LDA-L21 (e-LDA-L21). e-LDA-L21 is carried out in two stages, in which each image is mapped into a complex space by Euler transform in the first stage and the ℓ2,1-norm is adopted as the distance metric in the second stage. This not only reveals nonlinear features but also exploits the geometric structure of data. To solve e-LDA-L21 efficiently, we propose an iterative algorithm, which is a closed-form solution at each iteration with convergence guaranteed. Finally, we extend e-LDA-L21 to Euler 2DLDA-L21 (e-2DLDA-L21) which further exploits the spatial information embedded in image pixels. Experimental results on several face databases demonstrate its superiority over the state-of-the-art algorithms.
In this paper, we conduct a large dimensional study of regularized discriminant analysis classifiers with its two popular variants known as regularized LDA and regularized QDA. The analysis is based ...on the assumption that the data samples are drawn from a Gaussian mixture model with different means and covariances and relies on tools from random matrix theory (RMT). We consider the regime in which both the data dimension and training size within each class tends to infinity with fixed ratio. Under mild assumptions, we show that the probability of misclassification converges to a deterministic quantity that describes in closed form the performance of these classifiers in terms of the class statistics as well as the problem dimension. The result allows for a better understanding of the underlying classification algorithms in terms of their performances in practical large but finite dimensions. Further exploitation of the results permits to optimally tune the regularization parameter with the aim of minimizing the probability of misclassification. The analysis is validated with numerical results involving synthetic as well as real data from the USPS dataset yielding a high accuracy in predicting the performances and hence making an interesting connection between theory and practice.
Recently, an absolute value inequalities discriminant analysis criterion with robustness and sparseness for supervised dimensionality reduction was studied. However, it obtains discriminant ...directions one by one through greedy search, which makes the sparseness of multiple discriminant directions unexplainable. In addition, it relaxes the original problem into a series of linear programming problems which makes it time consuming. In this paper, we construct a novel linear discriminant analysis with robustness and sparseness jointly through the <inline-formula> <tex-math notation="LaTeX">L_{1} </tex-math></inline-formula>-norm and <inline-formula> <tex-math notation="LaTeX">L_{2,1} </tex-math></inline-formula>-norm. The proposed approach obtains all the discriminant directions simultaneously, and rather than solving linear programming problems, it is solved by a more effective alternating direction method of multipliers. The effectiveness of the proposed method is supported by preliminary experimental results on two artificial datasets, some benchmark datasests and two face image datasets.
Bearings are critical components in induction motors and brushless direct current motors. Bearing failure is the most common failure mode in these motors. By implementing health monitoring and fault ...diagnosis of bearings, unscheduled maintenance and economic losses caused by bearing failures can be avoided. This paper introduces trace ratio linear discriminant analysis (TR-LDA) to deal with high-dimensional non-Gaussian fault data for dimension reduction and fault classification. Motor bearing data with single-point faults and generalized-roughness faults are used to validate the effectiveness of the proposed method for fault diagnosis. Comparisons with other conventional methods, such as principal component analysis, local preserving projection, canonical correction analysis, maximum margin criterion, LDA, and marginal Fisher analysis, show the superiority of TR-LDA in fault diagnosis.
The main purpose of this study was to build multivariate classification models using water quality monitoring data for the hydrographic basin of the Gualaxo do Norte River, Minas Gerais state, ...Brazil, which was impacted in 2015 by the rupture of a containment structure for iron ore tailings. A total of 27 points were evaluated, covering areas affected and unaffected by the disaster, with monitoring of chemical, physical, and microbiological variables during the period from July 2016 to June 2017. Multivariate classification techniques were applied to the data, with the aim of developing models to determine when the impacted locations would present characteristics equivalent to those existing prior to the rupture. Classification models constructed using PLS-DA and LDA were able to predict three classes: unaffected main river, affected main river, and tributaries. The first technique was able to clearly differentiate the three classes for the data evaluated, achieving averages corresponding to 90% accuracy. The second method was consistent with the first, identifying the chloride content, conductivity, turbidity, and alkalinity as discriminatory variables, among those monitored, with the relationships among the parameters being coherent with the environmental conditions of the region. The model, with a correct classification rate of 91.67%, enabled identification of the behavior of new samples, using only these easily measured variables. In summary, application of the multivariate statistical tools allowed the development of models capable of providing information about the recovery process of an ecosystem impacted by the greatest environmental disaster to have occurred in Brazil.
Display omitted
•Assessment of the long-term environmental effects of the Fundão dam failure, Brazil.•Multivariate classification models to assess the profile impacted and non-impacted.•Samples collected in the dry season already showed pre-disaster characteristics.
Discriminant analysis, as a popular supervised classification method, has been successfully used in fault diagnosis, which, however, involves a linear combination of all variables, and thus may ...result in poor model interpretability and inaccurate classification performance. In this paper, a sparse exponential discriminant analysis (SEDA) algorithm is proposed for addressing those issues. The sparse discriminant model is developed by introducing the penalty of lasso or elastic net into the exponential discriminant analysis algorithm, so that the key variables responsible for the fault can be automatically selected. Since the formulated model is nonconvex, it is recast as an iterative convex optimization problem using the minorization-maximization algorithm. After that, a feasible gradient direction method is developed to solve the optimization problem effectively. The sparse solutions indicate the key faulty information to improve classification performance, and thus distinguish different faults more accurately. A simulation process and a real industrial process are used to test the performance of the proposed method, and the experimental results show that the SEDA algorithm can isolate the faulty variables and simplify the discriminant model by discarding variables with little significance.
Geometric Mean for Subspace Selection Dacheng Tao; Xuelong Li; Xindong Wu ...
IEEE transactions on pattern analysis and machine intelligence,
02/2009, Letnik:
31, Številka:
2
Journal Article
Recenzirano
Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the ...Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Display omitted
•Discriminant analysis techniques are able to classify infrared spectra from unknown colon cells with excellent accuracy.•The classification accuracy does not change when different ...spectral ranges are analysed.•The excellent accuracy supports clinical translation of infrared spectroscopy and multivariate analysis.
Colorectal cancer is one of the most diagnosed types of cancer in developed countries. Current diagnostic methods are partly dependent on pathologist experience and laboratories instrumentation. In this study, we used Fourier Transform Infrared (FTIR) spectroscopy in transflection mode, combined with Principal Components Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares – Discriminant Analysis (PLS-DA), to build a classification algorithm to diagnose colon cancer in cell samples, based on absorption spectra measured in two spectral ranges of the mid-infrared spectrum. In particular, PCA technique highlights small biochemical differences between healthy and cancerous cells: these are related to the larger lipid content in the former compared with the latter and to the larger relative amount of protein and nucleic acid components in the cancerous cells compared with the healthy ones. Comparison of the classification accuracy of PCA-LDA and PLS-DA methods applied to FTIR spectra measured in the 1000–1800 cm−1 (low wavenumber range, LWR) and 2700–3700 cm−1 (high wavenumber range, HWR) remarks that both algorithms are able to classify hidden class FTIR spectra with excellent accuracy (100 %) in both spectral regions. This is a hopeful result for clinical translation of infrared spectroscopy: in fact, it makes reliable the predictions obtained using FTIR measurements carried out only in the HWR, in which the glass slides used in clinical laboratories are transparent to IR radiation.
It has been recognized that wildfire, followed by large precipitation events, triggers both flooding and debris flows in mountainous regions. The ability to predict and mitigate these hazards is ...crucial in protecting public safety and infrastructure. A need for advanced modeling techniques was highlighted by re-evaluating existing prediction models from the literature. Data from 15 individual burn basins in the intermountain western United States, which contained 388 instances and 26 variables, were obtained from the United States Geological Survey (USGS). After randomly selecting a subset of the data to serve as a validation set, advanced predictive modeling techniques, using machine learning, were implemented using the remaining training data. Tenfold cross-validation was applied to the training data to ensure nearly unbiased error estimation and also to avoid model over-fitting. Linear, nonlinear, and rule-based predictive models including naïve Bayes, mixture discriminant analysis, classification trees, and logistic regression models were developed and tested on the validation dataset. Results for the new non-linear approaches were nearly twice as successful as those for the linear models, previously published in debris flow prediction literature. The new prediction models advance the current state-of-the-art of debris flow prediction and improve the ability to accurately predict debris flow events in wildfire-prone intermountain western United States.
Abstract
Introduction
Individuals who report having insomnia may or may not display quantitative sleep impairment. Furthermore, possessing an insomnia identity is associated with a range of health ...difficulties, regardless of actual sleep pattern. The present study examined which facets of health and functioning are most relevant in differentiating individuals who have an insomnia identity from those who do not.
Methods
Community-dwelling adults from an epidemiological survey (N = 608) were classified into four groups based on whether they had good or poor sleep (determined from two weeks of sleep diaries using validated criteria), and whether they had a complaint of insomnia. Stepwise discriminant analysis (Wilks’ Λ, input p = .05) was conducted to investigate which of 17 demographic, health, substance use, and daytime functioning measures significantly maximized separation among the four sleep groups.
Results
On the first discriminant function extracted by the stepwise analysis (λ = .35, Rc = .51, p < .001), group centroids were separated according to the following sequence: non-complaining good sleepers (-.47; n = 320), non-complaining poor sleepers (-.06; n = 88), complaining good sleepers (.37; n = 69), complaining poor sleepers (1.00; n = 131). Structural coefficients suggested that adults with greater total count of diseases (r = .71), depression (r = .69), anxiety (r = .61), and age (r = .38) tended to be classified as having an insomnia complaint at the first order, and as having quantitatively impaired sleep at the second order.
Conclusion
On age and daytime functioning measures, participants were distinguishable principally by whether they endorsed an insomnia identity, and secondarily by whether they displayed actual deficits in sleep pattern. Given that the most relevant discriminators included disease count, depression, and anxiety, future studies might examine whether improvements in overall health or subjective psychological distress have repercussions for holding onto an insomnia identity, and whether changes in insomnia complaint can occur irrespective of quantitative sleep. Research might also continue exploring other potentially salient discriminators of insomnia identity, such as trait neuroticism or sleep-related social comparisons.
Support (If Any)
Research supported by National Institute on Aging: #AG12136, #AG14738.