Canonical correlation analysis (CCA) has been widely used in the detection of the steady-state visual evoked potentials (SSVEPs) in brain-computer interfaces (BCIs). The standard CCA method, which ...uses sinusoidal signals as reference signals, was first proposed for SSVEP detection without calibration. However, the detection performance can be deteriorated by the interference from the spontaneous EEG activities. Recently, various extended methods have been developed to incorporate individual EEG calibration data in CCA to improve the detection performance. Although advantages of the extended CCA methods have been demonstrated in separate studies, a comprehensive comparison between these methods is still missing. This study performed a comparison of the existing CCA-based SSVEP detection methods using a 12-class SSVEP dataset recorded from 10 subjects in a simulated online BCI experiment. Classification accuracy and information transfer rate (ITR) were used for performance evaluation. The results suggest that individual calibration data can significantly improve the detection performance. Furthermore, the results showed that the combination method based on the standard CCA and the individual template based CCA (IT-CCA) achieved the highest performance.
Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the ...correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.
It has been shown that the deregulation of miRNAs is associated with the development and progression of many human diseases. To reduce time and cost of biological experiments, a number of algorithms ...have been proposed for predicting miRNA-disease associations. However, the existing methods rarely investigated the cause-and-effect mechanism behind these associations, which hindered further biomedical follow-ups.
In this study, we presented a CCA-based model in which the possible molecular causes of miRNA-disease associations were comprehensively revealed by extracting correlated sets of genes and diseases based on the co-occurrence of miRNAs in target gene profiles and disease profiles. Our method directly suggested the underlying genes involved, which could be used for experimental tests and confirmation. The inference of associated diseases of a new miRNA was made by taking into account the weight vectors of the extracted sets. We extracted 60 pairs of correlated sets from 404 miRNAs with two profiles for 2796 target genes and 362 diseases. The extracted diseases could be considered as possible outcomes of miRNAs regulating the target genes which appeared in the same set, some of which were supported by independent source of information. Furthermore, we tested our method on the 404 miRNAs under the condition of 5-fold cross validations and received an AUC value of 0.84606. Finally, we extensively inferred miRNA-disease associations for 100 new miRNAs and some interesting prediction results were validated by established databases.
The encouraging results demonstrated that our method could provide a biologically relevant prediction and interpretation of associations between miRNAs and diseases, which were of great usefulness when guiding biological experiments for scientific research.
Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical ...correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, two-stage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning.
TSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by non-negative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among high-dimensional data and multiplicative interactions among variables.
TSKCCA can identify nonlinear associations among high-dimensional data more reliably than previous nonlinear CCA methods.
Recently, brain-computer interface (BCI) systems developed based on steady-state visual evoked potential (SSVEP) have attracted much attention due to their high information transfer rate (ITR) and ...increasing number of targets. However, SSVEP-based methods can be improved in terms of their accuracy and target detection time. We propose a new method based on canonical correlation analysis (CCA) to integrate subject-specific models and subject-independent information and enhance BCI performance. We propose to use training data of other subjects to optimize hyperparameters for CCA-based model of a specific subject. An ensemble version of the proposed method is also developed for a fair comparison with ensemble task-related component analysis (TRCA). The proposed method is compared with TRCA and extended CCA methods. A publicly available, 35-subject SSVEP benchmark dataset is used for comparison studies and performance is quantified by classification accuracy and ITR. The ITR of the proposed method is higher than those of TRCA and extended CCA. The proposed method outperforms extended CCA in all conditions and TRCA for time windows greater than 0.3 s. The proposed method also outperforms TRCA when there are limited training blocks and electrodes. This study illustrates that adding subject-independent information to subject-specific models can improve performance of SSVEP-based BCIs.
Autism spectrum disorder (autism) is a complex neurodevelopmental condition with pronounced behavioral, cognitive, and neural heterogeneities across individuals. Here, our goal was to characterize ...heterogeneity in autism by identifying patterns of neural diversity as reflected in BOLD fMRI in the way individuals with autism engage with a varied array of cognitive tasks.
All analyses were based on the EU-AIMS/AIMS-2-TRIALS multisite Longitudinal European Autism Project (LEAP) with participants with autism (n = 282) and typically developing (TD) controls (n = 221) between 6 and 30 years of age. We employed a novel task potency approach which combines the unique aspects of both resting state fMRI and task-fMRI to quantify task-induced variations in the functional connectome. Normative modelling was used to map atypicality of features on an individual basis with respect to their distribution in neurotypical control participants. We applied robust out-of-sample canonical correlation analysis (CCA) to relate connectome data to behavioral data.
Deviation from the normative ranges of global functional connectivity was greater for individuals with autism compared to TD in each fMRI task paradigm (all tasks p < 0.001). The similarity across individuals of the deviation pattern was significantly increased in autistic relative to TD individuals (p < 0.002). The CCA identified significant and robust brain-behavior covariation between functional connectivity atypicality and autism-related behavioral features.
Individuals with autism engage with tasks in a globally atypical way, but the particular spatial pattern of this atypicality is nevertheless similar across tasks. Atypicalities in the tasks originate mostly from prefrontal cortex and default mode network regions, but also speech and auditory networks. We show how sophisticated modeling methods such as task potency and normative modeling can be used toward unravelling complex heterogeneous conditions like autism.
Genetic variability and diversity of genotypes are very important for all living organisms. Knowledge of the genetic diversity is a potential tool for pre-breeding parental selection. The present ...experiment was conducted at two locations (Isfahan, Khuzestan) under field conditions during the 2017–2018 growing season, with fifteen short day onion genotypes which were evaluated by multivariate methods. Nine quantitative traits were studied. MANOVA showed that the locations, varieties and location × variety interaction were significantly different for all nine traits. Significant positive correlation observed for two locations for yield and single weight (0.85 in Khuzestan and 0.61 in Isfahan), yield and bulb height (0.52 in Khuzestan and 0.55 in Isfahan), bulb height and index shape (0.68 in Khuzestan and 0.70 in Isfahan) and bulb diameter and single weight (0.81 in Khuzestan and 0.66 in Isfahan). Further, yield was significantly correlated with dry matter: positively in Isfahan (0.62), and negatively in Khuzestan (–0.54). In Khuzestan, the first two canonical variants explained 79.19% of the total variation between the varieties; however, the greatest variation was found for the Saba and Behbahan improved population. The first two canonical variables explained 86.76% of the total variation between the varieties in Isfahan. Saba and Behbahan improved population varieties were the smallest, while Paliz and Early Super Select were the largest. The Saba and Behbahan improved population, as the most diverse genotypes, were recommended for further inclusion in future crop improvement programs.
Early prediction of the potential for neurological recovery after resuscitation from cardiac arrest is difficult but important. Currently, no clinical finding or combination of findings are ...sufficient to accurately predict or preclude favorable recovery of comatose patients in the first 24 to 48 hours after resuscitation. Thus, life-sustaining therapy is often continued for several days in patients whose irrecoverable injury is not yet recognized. Conversely, early withdrawal of life-sustaining therapy increases mortality among patients who otherwise might have gone on to recover. In this work, we present Canonical Autocorrelation Analysis (CAA) and Canonical Autocorrelation Embeddings (CAE), novel methods suitable for identifying complex patterns in high-resolution multivariate data often collected in highly monitored clinical environments such as intensive care units. CAE embeds sets of datapoints onto a space that characterizes their latent correlation structures and allows direct comparison of these structures through the use of a distance metric. The methodology may be particularly suitable when the unit of analysis is not just an individual datapoint but a dataset, as for instance in patients for whom physiological measures are recorded over time, and where changes of correlation patterns in these datasets are informative for the task at hand. We present a proof of concept to illustrate the potential utility of CAE by applying it to characterize electroencephalographic recordings from 80 comatose survivors of cardiac arrest, aiming to identify patients who will survive to hospital discharge with favorable functional recovery. Our results show that with very low probability of making a Type 1 error, we are able to identify 32.5% of patients who are likely to have a good neurological outcome, some of whom have otherwise unfavorable clinical characteristics. Importantly, some of these had 5% predicted chance of favorable recovery based on initial illness severity measures alone. Providing this information to support clinical decision-making could motivate the continuation of life-sustaining therapies for these patients.
One of the critical stages in drug development is the identification of potential side effects for promising drug leads. Large-scale clinical experiments aimed at discovering such side effects are ...very costly and may miss subtle or rare side effects. Previous attempts to systematically predict side effects are sparse and consider each side effect independently. In this work, we report on a novel approach to predict the side effects of a given drug, taking into consideration information on other drugs and their side effects. Starting from a query drug, a combination of canonical correlation analysis and network-based diffusion is applied to predict its side effects. We evaluate our method by measuring its performance in a cross validation setting using a comprehensive data set of 692 drugs and their known side effects derived from package inserts. For 34% of the drugs, the top scoring side effect matches a known side effect of the drug. Remarkably, even on unseen data, our method is able to infer side effects that highly match existing knowledge. In addition, we show that our method outperforms a prediction scheme that considers each side effect separately. Our method thus represents a promising step toward shortcutting the process and reducing the cost of side effect elucidation.
Knowledge of contagion among economies is a relevant issue in economics. The canonical model of contagion is an alternative in this case. Given the existence of endogenous variables in the model, ...instrumental variables can be used to decrease the bias of the OLS estimator. In the presence of heteroskedastic disturbances this paper proposes the use of conditional volatilities as instruments. Simulation is used to show that the homoscedastic and heteroskedastic estimators which use them as instruments have small bias. These estimators are preferable in comparison with the OLS estimator and their asymptotic distribution can be used to construct confidence intervals.