Heavy metal pollution can result in the degradation of the soil, air and water bodies' quality affecting the health of all living organism. We analyze the spatial distribution of the concentrations ...of soil heavy metal and relationship with natural or anthropogenic origin. The analysis was performed in Principality of Asturias (mountain region of NW of Spain), where soil heavy metal pollution has become a severe problem. First, a standard Principal Components Analysis (PCA) was performed over a population of 334 soil samples to identify the sources of fourteen heavy metal and metalloids (Ag, As, Ba, Hg, Cd, Co, Cr, Cu, Mn, Mo, Ni, Pb, Sb, Zn). Due to the high geological heterogeneity of the territory, the PCA analysis was improved using a variant of PCA known as Geographically Weighted Principal Components Analysis (GWPCA). The first six principal components in a standard PCA account for about 57% of soil heavy metal variability but when GWPCA is performed this figure increases to >80% in some areas. We conclude that GWPC1 corresponds to a geogenetic component with changing winning variables adapted to the geological characteristics of the territory (lithology and mining), while GWPC2 corresponds to a factor related to atmospheric pollution including heavy metal released from vegetation cover via wildfires.
•GWPCA method improves the explanation of the spatial distribution of soil heavy metal.•Winning variables from GWPCA analysis are related to geogenetic and atmospheric sources.•Ashes from wildfires could be a source of soil heavy metal pollution.
We develop the necessary methodology to conduct principal component analysis at high frequency. We construct estimators of realized eigenvalues, eigenvectors, and principal components, and provide ...the asymptotic distribution of these estimators. Empirically, we study the high-frequency covariance structure of the constituents of the S&P 100 Index using as little as one week of high-frequency data at a time, and examines whether it is compatible with the evidence accumulated over decades of lower frequency returns. We find a surprising consistency between the low- and high-frequency structures. During the recent financial crisis, the first principal component becomes increasingly dominant, explaining up to 60% of the variation on its own, while the second principal component drives the common variation of financial sector stocks. Supplementary materials for this article are available online.
Quantum principal component analysis Lloyd, Seth; Mohseni, Masoud; Rebentrost, Patrick
Nature physics,
09/2014, Volume:
10, Issue:
9
Journal Article
Peer reviewed
Open access
The usual way to reveal properties of an unknown quantum state, given many copies of a system in that state, is to perform measurements of different observables and to analyse the results ...statistically1, 2. For non-sparse but low-rank quantum states, revealing eigenvectors and corresponding eigenvalues in classical form scales super-linearly with the system dimension3, 4, 5, 6. Here we show that multiple copies of a quantum system with density matrix ρ can be used to construct the unitary transformation e-iρt. As a result, one can perform quantum principal component analysis of an unknown low-rank density matrix, revealing in quantum form the eigenvectors corresponding to the large eigenvalues in time exponentially faster than any existing algorithm. We discuss applications to data analysis, process tomography and state discrimination.
Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms, such as ...principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (e.g. nose in a face image). On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace. To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE). In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively. Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error.
Robust principal component analysis? Candès, Emmanuel J.; Li, Xiaodong; Ma, Yi ...
Journal of the ACM,
05/2011, Volume:
58, Issue:
3
Journal Article
Peer reviewed
Open access
This article is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We ...prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components
exactly
by solving a very convenient convex program called
Principal Component Pursuit
; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the ℓ
1
norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
Sleep scoring is used as a diagnostic technique in the diagnosis and treatment of sleep disorders. Automated sleep scoring is crucial, since the large volume of data should be analyzed visually by ...the sleep specialists which is burdensome, time-consuming tedious, subjective, and error prone. Therefore, automated sleep stage classification is a crucial step in sleep research and sleep disorder diagnosis. In this paper, a robust system, consisting of three modules, is proposed for automated classification of sleep stages from the single-channel electroencephalogram (EEG). In the first module, signals taken from Pz-Oz electrode were denoised using multiscale principal component analysis. In the second module, the most informative features are extracted using discrete wavelet transform (DWT), and then, statistical values of DWT subbands are calculated. In the third module, extracted features were fed into an ensemble classifier, which can be called as rotational support vector machine (RotSVM). The proposed classifier combines advantages of the principal component analysis and SVM to improve classification performances of the traditional SVM. The sensitivity and accuracy values across all subjects were 84.46% and 91.1%, respectively, for the five-stage sleep classification with Cohen's kappa coefficient of 0.88. Obtained classification performance results indicate that, it is possible to have an efficient sleep monitoring system with a single-channel EEG, and can be used effectively in medical and home-care applications.
By dealing with the crowding problem caused by incipient faults, this brief will develop a new fault detection and diagnosis (FDD) scheme called probability-relevant principal component analysis from ...the probability view point. The proposed methodology cooperates with Kullback-Leibler divergence from the information field and Bayesian inference from the machine learning domain. Compared with the standard FDD methods under the framework of multivariate statistical analysis, this new FDD scheme is more sensitive to faults under an acceptable false alarm ratio, especially to incipient faults; moreover, it is more accurate in diagnosing faults with the aid of improved fault detectability. The effectiveness of the proposed FDD method is illustrated by mathematical analysis and geometric descriptions, and validated via a numerical example and a real experimental setup on the electric drive system of a high-speed train.
Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level ...vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g., high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.
Traditional monitoring algorithms use the normal data for modeling, which are universal for different types of faults. However, these algorithms may perform poorly sometimes because of the lack of ...fault information. In order to further increase the fault detection rate while preserving the universality of the algorithm, a novel dynamic weight principal component analysis (DWPCA) algorithm and a hierarchical monitoring strategy are proposed. In the first layer, the dynamic PCA is used for fault detection and diagnosis, if no fault is detected, the following DWPCA-based second layer monitoring will be triggered. In the second layer, the principal components (PCs) are weighted according to its ability in distinguishing between the normal and fault conditions, then the PCs which own larger weight are selected to construct the monitoring model. Compared to the DPCA method, the proposed DWPCA algorithm establishes the monitoring model by combining the information of fault. Afterward, the DWPCA-based variable relative contribution and a novel control limit for the variable relative contribution are presented for the fault diagnosis. Finally, the superiority of the proposed method is demonstrated by a numerical case and the Tennessee Eastman process.
This paper estimates the association of financial development with energy poverty in Latin America through the entropy method, the Principal Component Analysis (PCA), and an econometric analysis. ...Energy poverty scores of Latin American countries are less than unitary, making up for 17.54 percent, which implies that 17.45 percent of the residents did not attain the efficiency frontier of adequate energy consumption. Within all quantiles, the results elucidate that increases in energy poverty are attributed to low levels of financial development. The study provides policy recommendations for energy poverty alleviation through financial development.
•We analysed the association of multidimensional energy poverty and financial advancement in the Latin American region.•Financial development and FDI as the key variables in alleviating energy poverty.•Energy poverty scores of Latin American countries are less than unitary.•17.45 percent of the residents did not attain the efficiency frontier of adequate energy consumption.•Increases in energy poverty are attributed to low levels of financial development.