Biosignal measurement and processing is increasingly being deployed in ambulatory situations particularly in connected health applications. Such an environment dramatically increases the likelihood ...of artifacts which can occlude features of interest and reduce the quality of information available in the signal. If multichannel recordings are available for a given signal source, then there are currently a considerable range of methods which can suppress or in some cases remove the distorting effect of such artifacts. There are, however, considerably fewer techniques available if only a single-channel measurement is available and yet single-channel measurements are important where minimal instrumentation complexity is required. This paper describes a novel artifact removal technique for use in such a context. The technique known as ensemble empirical mode decomposition with canonical correlation analysis (EEMD-CCA) is capable of operating on single-channel measurements. The EEMD technique is first used to decompose the single-channel signal into a multidimensional signal. The CCA technique is then employed to isolate the artifact components from the underlying signal using second-order statistics. The new technique is tested against the currently available wavelet denoising and EEMD-ICA techniques using both electroencephalography and functional near-infrared spectroscopy data and is shown to produce significantly improved results.
A steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) can either achieve high classification accuracy in the case of sufficient training data or suppress the training ...stage at the cost of low accuracy. Although some researches attempted to conquer the dilemma between performance and practicality, a highly effective approach has not yet been established. In this paper, we propose a canonical correlation analysis (CCA)-based transfer learning framework for improving the performance of an SSVEP BCI and reducing its calibration effort. Three spatial filters are optimized by a CCA algorithm with intra- and inter-subject EEG data (IISCCA), two template signals are estimated separately with the EEG data from the target subject and a set of source subjects and six coefficients are yielded by correlation analysis between a testing signal and each of the two templates after they are filtered by each of the three spatial filters. The feature signal used for classification is extracted by the sum of squared coefficients multiplied by their signs and the frequency of the testing signal is recognized by template matching. To reduce the individual discrepancy between subjects, an accuracy-based subject selection (ASS) algorithm is developed for screening those source subjects whose EEG data are more similar to those of the target subject. The proposed ASS-IISCCA integrates both subject-specific models and subject-independent information for the frequency recognition of SSVEP signals. The performance of ASS-IISCCA was evaluated on a benchmark data set with 35 subjects and compared with the state-of-the-art algorithm task-related component analysis (TRCA). The results show that ASS-IISCCA can significantly improve the performance of SSVEP BCIs with a small number of training trials from a new user, thus helping to facilitate their applications in real world.
•CCA with regularization is used for quality and process monitoring concurrently.•The method retains that efficiency of CCA for quality prediction from process data.•CCA is enhanced to exploit the ...variance using subsequent PCA decompositions.•Monitoring statistics are developed for monitoring in respective subspaces.•Numerical simulations are used to demonstrate the effectiveness of the method.
Canonical correlation analysis (CCA) is a well-known data analysis technique that extracts multidimensional correlation structure between two sets of variables. CCA focuses on maximizing the correlation between quality and process data, which leads to the efficient use of latent dimensions. However, CCA does not focus on exploiting the variance or the magnitude of variations in the data, making it rarely used for quality and process monitoring. In addition, it suffers from collinearity problems that often exist in the process data. To overcome this shortcoming of CCA, a modified CCA method with regularization is developed to extract correlation between process variables and quality variables. Next, to handle the issue that CCA focuses only on correlation but ignores variance information, a new concurrent CCA (CCCA) modeling method with regularization is proposed to exploit the variance and covariance in the process-specific and quality-specific spaces. The CCCA method retains the CCA's efficiency in predicting the quality while exploiting the variance structure for quality and process monitoring using subsequent principal component decompositions. The corresponding monitoring statistics and control limits are then developed in the decomposed subspaces. Numerical simulation examples and the Tennessee Eastman process are used to demonstrate the effectiveness of the CCCA-based monitoring method.
We propose an efficient algorithm for solving orthogonal canonical correlation analysis (OCCA) in the form of trace-fractional structure and orthogonal linear projections. Even though orthogonality ...has been widely used and proved to be a useful criterion for visualization, pattern recognition and feature extraction, existing methods for solving OCCA problem are either numerically unstable by relying on a deflation scheme, or less efficient by directly using generic optimization methods. In this paper, we propose an alternating numerical scheme whose core is the sub-maximization problem in the trace-fractional form with an orthogonality constraint. A customized self-consistent-field (SCF) iteration for this sub-maximization problem is devised. It is proved that the SCF iteration is globally convergent to a KKT point and that the alternating numerical scheme always converges. We further formulate a new trace-fractional maximization problem for orthogonal multiset CCA and propose an efficient algorithm with an either Jacobi-style or Gauss-Seidel-style updating scheme based on the SCF iteration. Extensive experiments are conducted to evaluate the proposed algorithms against existing methods, including real-world applications of multi-label classification and multi-view feature extraction. Experimental results show that our methods not only perform competitively to or better than the existing methods but also are more efficient.
Fault detection based on canonical correlation analysis (CCA) has received increased attention due to its efficiency in exploring the relationship between input and output. However, traditional CCA ...may generate redundant features in both the input and output projections while maximizing the correlations. In this paper, sparse dynamic canonical correlation analysis (SDCCA) is developed for dealing with the fault detection of dynamic processes. Through posing sparsity in the extraction of features, the interpretability of canonical variates is enhanced attributed to the sparsity of canonical weights. Based on the SDCCA model, the T2 monitoring metric is established for fault detection. Moreover, the upper control limit (UCL) based on T2 monitoring metrics is determined by the kernel density estimation (KDE) method to avoid the violation of the Gaussian assumption. The superiority of the proposed SDCCA‐based fault detection method is illustrated through a comparative study of the Tennessee Eastman process benchmark.
A Survey of Multi-View Representation Learning Li, Yingming; Yang, Ming; Zhang, Zhongfei
IEEE transactions on knowledge and data engineering,
10/2019, Volume:
31, Issue:
10
Journal Article
Peer reviewed
Open access
Recently, multi-view representation learning has become a rapidly growing direction in machine learning and data mining areas. This paper introduces two categories for multi-view representation ...learning: multi-view representation alignment and multi-view representation fusion. Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment. Representative examples are canonical correlation analysis (CCA) and its several extensions. Then, from the perspective of representation fusion, we investigate the advancement of multi-view representation learning that ranges from generative methods including multi-modal topic learning, multi-view sparse coding, and multi-view latent space Markov networks, to neural network-based methods including multi-modal autoencoders, multi-view convolutional neural networks, and multi-modal recurrent neural networks. Further, we also investigate several important applications of multi-view representation learning. Overall, this survey aims to provide an insightful overview of theoretical foundation and state-of-the-art developments in the field of multi-view representation learning and to help researchers find the most appropriate tools for particular applications.
Generalized Canonical Correlation Analysis (GCCA) is an important tool that finds numerous applications in data mining, machine learning, and artificial intelligence. It aims at finding 'common' ...random variables that are strongly correlated across multiple feature representations (views) of the same set of entities. CCA and to a lesser extent GCCA have been studied from the statistical and algorithmic points of view, but not as much from the standpoint of linear algebra. This paper offers a fresh algebraic perspective on GCCA based on a (bi-)linear generative model that naturally captures its essence. It is shown that from a linear algebra point of view, GCCA is tantamount to subspace intersection; and conditions under which the common subspace of the different views is identifiable are provided. A novel GCCA algorithm is proposed based on subspace intersection, which scales up to handle large GCCA tasks. Synthetic as well as real data experiments are provided to showcase the effectiveness of the proposed approach.
The potential to study and improve different aspects of our lives is ever growing thanks to the abundance of data available in today’s modern society. Scientists and researchers often need to analyze ...data from different sources; the observations, which only share a subset of the variables, cannot always be paired to detect common individuals. This is the case, for example, when the information required to study a certain phenomenon is coming from different sample surveys. Statistical matching is a common practice to combine these data sets. In this paper, we investigate and extend to statistical matching two methods based on Kernel Canonical Correlation Analysis (KCCA) and Super-Organizing Map (Super-OM). These methods are designed to deal with various variable types, sample weights and incompatibilities among categorical variables. In the first case, we use KCCA, a non-linear extension of CCA, to create canonical variables that we can compare in the two data sets. In the second case, Super-OM uses organizing maps to create subgroups of individuals who share the same characteristics. We use the 2017 Belgian Statistics on Income and Living Conditions (SILC) and we compare the performance of the proposed statistical matching methods by means of a cross-validation technique, as if the data were available from two separate sources. The results indicate that our proposed methods are superior to existing methods because they preserve the distribution of generated variables while also providing good predictions. Existing methods typically only achieve one or the other. These new techniques open the door to improving statistical matching in other applications such as medicine, economics, …
•Two novel statistical matching techniques are proposed.•They handle most incompatibilities between categorical features.•These techniques handle sample weights.•Experimental comparisons are done with three different data sets using several criteria.•Our criterion on distributions is 8 times better on small data than regression method.
Droughts often evolve gradually and cover large areas, and therefore, affect many people and activities. This motivates developing techniques to integrate different satellite observations, to cover ...large areas, and understand spatial and temporal variability of droughts. In this study, we apply probabilistic techniques to generate satellite derived meteorological, hydrological, and hydro-meteorological drought indices for the world's 156 major river basins covering 2003–2016. The data includes Terrestrial Water Storage (TWS) estimates from the Gravity Recovery And Climate Experiment (GRACE) mission, along with soil moisture, precipitation, and evapotranspiration reanalysis. Different drought characteristics of trends, occurrences, areal-extent, and frequencies corresponding to 3-, 6-, 12-, and 24-month timescales are extracted from these indices. Drought evolution within selected basins of Africa, America, and Asia is interpreted. Canonical Correlation Analysis (CCA) is then applied to find the relationship between global hydro-meteorological droughts and satellite derived Sea Surface Temperature (SST) changes. This relationship is then used to extract regions, where droughts and teleconnections are strongly interrelated. Our numerical results indicate that the 3- to 6-month hydrological droughts occur more frequently than the other timescales. Longer memory of water storage changes (than water fluxes) has found to be the reason of detecting extended hydrological droughts in regions such as the Middle East and Northern Africa. Through CCA, we show that the El Niño Southern Oscillation (ENSO) has major impact on the magnitude and evolution of hydrological droughts in regions such as the northern parts of Asia and most parts of the Australian continent between 2006 and 2011, as well as droughts in the Amazon basin, South Asia, and North Africa between 2010 and 2012. The Indian ocean Dipole (IOD) and North Atlantic Oscillation (NAO) are found to have regional influence on the evolution of hydrological droughts.
Display omitted
•Using GRACE TWS results in more intense drought indices than soil-moisture reanalysis.•The areal extent of the 2003–2016 hydrological droughts is generally increasing.•Droughts of the Middle East, America, and South Asia are intense and being worsened.•SST and CCA are efficient to explore teleconnections and droughts hot spots.•The 2006 and 2011 droughts in Asia and Australia are largely correlated with ENSO.
A Survey on Canonical Correlation Analysis Yang, Xinghao; Liu, Weifeng; Liu, Wei ...
IEEE transactions on knowledge and data engineering,
2021-June-1, 2021-6-1, Volume:
33, Issue:
6
Journal Article
Peer reviewed
In recent years, the advances in data collection and statistical analysis promotes canonical correlation analysis (CCA) available for more advanced research. CCA is the main technique for two-set ...data dimensionality reduction such that the correlation between the pairwise variables in the common subspace is mutually maximized. Over 80-years of developments, a number of CCA models have been proposed according to different machine learning mechanisms. However, the field lacks an insightful review for the state-of-art developments. This survey targets to provide a well-organized overview for CCA and its extensions. Specifically, we first review the CCA theory from the perspective of both model formation and model optimization. The association between two popular solution methods, i.e., eigen value decomposition (EVD) and singular value decomposition (SVD), are discussed. Following that, we present a taxonomy of current progresses and classify them into seven groups: 1) multi-view CCA, 2) probabilistic CCA, 3) deep CCA, 4) kernel CCA, 5) discriminative CCA, 6) sparse CCA and 7) locality preserving CCA. For each group, we demonstrate two or three representative mathematical models, identifying their strengths and limitations. We summarize the representative applications and numerical results of these seven groups in real-world practices, collecting the data sets and open-sources for implementation. In the end, we provide several promising future research directions that can improve the current state of the art.