We propose a new sparsification method for the singular value decomposition-called the constrained singular value decomposition (CSVD)-that can incorporate multiple constraints such as sparsification ...and orthogonality for the left and right singular vectors. The CSVD can combine different constraints because it implements each constraint as a projection onto a convex set, and because it integrates these constraints as projections onto the intersection of multiple convex sets. We show that, with appropriate sparsification constants, the algorithm is guaranteed to converge to a stable point. We also propose and analyze the convergence of an efficient algorithm for the specific case of the projection onto the balls defined by the norms L1 and L2. We illustrate the CSVD and compare it to the standard singular value decomposition and to a non-orthogonal related sparsification method with: 1) a simulated example, 2) a small set of face images (corresponding to a configuration with a number of variables much larger than the number of observations), and 3) a psychometric application with a large number of observations and a small number of variables. The companion R-package, csvd, that implements the algorithms described in this paper, along with reproducible examples, are available for download from https://github.com/vguillemot/csvd.
Introduction
Post-traumatic stress disorder (PTSD) is associated with hippocampal system structural and functional impairments. Neurobiological models of PTSD posit that contextual memory for ...traumatic events is impaired due to hippocampal system dysfunction whilst memory of sensory details is enhanced due to amygdalar impact on sensory cortices. If hippocampal system dysfunction is a core feature of PTSD, then non-traumatic hippocampal-dependent cognitive functions such as scene construction, spatial processing, and memory should also be impaired in individuals with PTSD.
Methods
Forty-six trauma survivors, half diagnosed with PTSD, performed two tasks that involved spatial processing. The first was a scene construction task which requires conjuring-up spatially coherent multimodal scenarios, completed by all participants. Twenty-six participants (PTSD:
n
= 13) also completed a navigation task in a virtual environment, and underwent structural T1, T2 and diffusion-tensor MRI to quantify gray and white matter integrity. We examined the relationship between spatial processing, neural integrity, and symptom severity in a multiple factor analysis.
Results
Overall, patients with PTSD showed impaired performance in both tasks compared to controls. Scenes imagined by patients were less vivid, less detailed, and generated less sense of presence; importantly they had disproportionally reduced spatial coherence between details. Patients also made more errors during virtual navigation. Two components of the multiple factor analysis captured group differences. The first component explained 25% of the shared variance: participants that constructed less spatially coherent scenes also made more navigation errors and had reduced white matter integrity to long association tracts and tracts connecting the hippocampus, thalamus, and cingulate. The second component explained 20% of the variance: participants who generated fewer scene details, with less spatial coherence between them, had smaller hippocampal, parahippocampal and isthmus cingulate volumes. These participants also had increased white matter integrity to the right hippocampal cingulum bundle.
Conclusion
Our results suggest that patients with PTSD are impaired at imagining even neutral spatially coherent scenes and navigating through a complex spatial environment. Patients that showed reduced spatial processing more broadly had reduced hippocampal systems volumes and abnormal white matter integrity to tracts implicated in multisensory integration.
Lower extremity open revascularization is a treatment option for peripheral artery disease that carries significant peri-operative risks; however, outcome prediction tools remain limited. Using ...machine learning (ML), we developed automated algorithms that predict 30-day outcomes following lower extremity open revascularization. The National Surgical Quality Improvement Program targeted vascular database was used to identify patients who underwent lower extremity open revascularization for chronic atherosclerotic disease between 2011 and 2021. Input features included 37 pre-operative demographic/clinical variables. The primary outcome was 30-day major adverse limb event (MALE; composite of untreated loss of patency, major reintervention, or major amputation) or death. Our data were split into training (70%) and test (30%) sets. Using tenfold cross-validation, we trained 6 ML models. Overall, 24,309 patients were included. The primary outcome of 30-day MALE or death occurred in 2349 (9.3%) patients. Our best performing prediction model was XGBoost, achieving an area under the receiver operating characteristic curve (95% CI) of 0.93 (0.92-0.94). The calibration plot showed good agreement between predicted and observed event probabilities with a Brier score of 0.08. Our ML algorithm has potential for important utility in guiding risk mitigation strategies for patients being considered for lower extremity open revascularization to improve outcomes.
Large and complex studies are now routine, and quality assurance and quality control (QC) procedures ensure reliable results and conclusions. Standard procedures may comprise manual verification and ...double entry, but these labour-intensive methods often leave errors undetected. Outlier detection uses a data-driven approach to identify patterns exhibited by the majority of the data and highlights data points that deviate from these patterns. Univariate methods consider each variable independently, so observations that appear odd only when two or more variables are considered simultaneously remain undetected. We propose a data quality evaluation process that emphasizes the use of multivariate outlier detection for identifying errors, and show that univariate approaches alone are insufficient. Further, we establish an iterative process that uses multiple multivariate approaches, communication between teams, and visualization for other large-scale projects to follow.
We illustrate this process with preliminary neuropsychology and gait data for the vascular cognitive impairment cohort from the Ontario Neurodegenerative Disease Research Initiative, a multi-cohort observational study that aims to characterize biomarkers within and between five neurodegenerative diseases. Each dataset was evaluated four times: with and without covariate adjustment using two validated multivariate methods - Minimum Covariance Determinant (MCD) and Candès' Robust Principal Component Analysis (RPCA) - and results were assessed in relation to two univariate methods. Outlying participants identified by multiple multivariate analyses were compiled and communicated to the data teams for verification.
Of 161 and 148 participants in the neuropsychology and gait data, 44 and 43 were flagged by one or both multivariate methods and errors were identified for 8 and 5 participants, respectively. MCD identified all participants with errors, while RPCA identified 6/8 and 3/5 for the neuropsychology and gait data, respectively. Both outperformed univariate approaches. Adjusting for covariates had a minor effect on the participants identified as outliers, though did affect error detection.
Manual QC procedures are insufficient for large studies as many errors remain undetected. In these data, the MCD outperforms the RPCA for identifying errors, and both are more successful than univariate approaches. Therefore, data-driven multivariate outlier techniques are essential tools for QC as data become more complex.
Lower extremity endovascular revascularization for peripheral artery disease carries nonnegligible perioperative risks; however, outcome prediction tools remain limited. Using machine learning, we ...developed automated algorithms that predict 30-day outcomes following lower extremity endovascular revascularization.
The National Surgical Quality Improvement Program targeted vascular database was used to identify patients who underwent lower extremity endovascular revascularization (angioplasty, stent, or atherectomy) for peripheral artery disease between 2011 and 2021. Input features included 38 preoperative demographic/clinical variables. The primary outcome was 30-day postprocedural major adverse limb event (composite of major reintervention, untreated loss of patency, or major amputation) or death. Data were split into training (70%) and test (30%) sets. Using 10-fold cross-validation, 6 machine learning models were trained using preoperative features. The primary model evaluation metric was area under the receiver operating characteristic curve. Overall, 21 886 patients were included, and 30-day major adverse limb event/death occurred in 1964 (9.0%) individuals. The best performing model for predicting 30-day major adverse limb event/death was extreme gradient boosting, achieving an area under the receiver operating characteristic curve of 0.93 (95% CI, 0.92-0.94). In comparison, logistic regression had an area under the receiver operating characteristic curve of 0.72 (95% CI, 0.70-0.74). The calibration plot showed good agreement between predicted and observed event probabilities with a Brier score of 0.09. The top 3 predictive features in our algorithm were (1) chronic limb-threatening ischemia, (2) tibial intervention, and (3) congestive heart failure.
Our machine learning models accurately predict 30-day outcomes following lower extremity endovascular revascularization using preoperative data with good discrimination and calibration. Prospective validation is warranted to assess for generalizability and external validity.
Background Carotid endarterectomy (CEA) is a major vascular operation for stroke prevention that carries significant perioperative risks; however, outcome prediction tools remain limited. The authors ...developed machine learning algorithms to predict outcomes following CEA. Methods and Results The National Surgical Quality Improvement Program targeted vascular database was used to identify patients who underwent CEA between 2011 and 2021. Input features included 36 preoperative demographic/clinical variables. The primary outcome was 30-day major adverse cardiovascular events (composite of stroke, myocardial infarction, or death). The data were split into training (70%) and test (30%) sets. Using 10-fold cross-validation, 6 machine learning models were trained using preoperative features. The primary metric for evaluating model performance was area under the receiver operating characteristic curve. Model robustness was evaluated with calibration plot and Brier score. Overall, 38 853 patients underwent CEA during the study period. Thirty-day major adverse cardiovascular events occurred in 1683 (4.3%) patients. The best performing prediction model was XGBoost, achieving an area under the receiver operating characteristic curve of 0.91 (95% CI, 0.90-0.92). In comparison, logistic regression had an area under the receiver operating characteristic curve of 0.62 (95% CI, 0.60-0.64), and existing tools in the literature demonstrate area under the receiver operating characteristic curve values ranging from 0.58 to 0.74. The calibration plot showed good agreement between predicted and observed event probabilities with a Brier score of 0.02. The strongest predictive feature in our algorithm was carotid symptom status. Conclusions The machine learning models accurately predicted 30-day outcomes following CEA using preoperative data and performed better than existing tools. They have potential for important utility in guiding risk-mitigation strategies to improve outcomes for patients being considered for CEA.
Regional changes to cortical thickness in individuals with neurodegenerative and cerebrovascular diseases (CVD) can be estimated using specialized neuroimaging software. However, the presence of ...cerebral small vessel disease, focal atrophy, and cortico-subcortical stroke lesions, pose significant challenges that increase the likelihood of misclassification errors and segmentation failures.
The main goal of this study was to examine a correction procedure developed for enhancing FreeSurfer's (FS's) cortical thickness estimation tool, particularly when applied to the most challenging MRI obtained from participants with chronic stroke and CVD, with varying degrees of neurovascular lesions and brain atrophy.
In 155 CVD participants enrolled in the Ontario Neurodegenerative Disease Research Initiative (ONDRI), FS outputs were compared between a fully automated, unmodified procedure and a corrected procedure that accounted for potential sources of error due to atrophy and neurovascular lesions. Quality control (QC) measures were obtained from both procedures. Association between cortical thickness and global cognitive status as assessed by the Montreal Cognitive Assessment (MoCA) score was also investigated from both procedures.
Corrected procedures increased "Acceptable" QC ratings from 18 to 76% for the cortical ribbon and from 38 to 92% for tissue segmentation. Corrected procedures reduced "Fail" ratings from 11 to 0% for the cortical ribbon and 62 to 8% for tissue segmentation. FS-based segmentation of T1-weighted white matter hypointensities were significantly greater in the corrected procedure (5.8 mL vs. 15.9 mL,
< 0.001). The unmodified procedure yielded no significant associations with global cognitive status, whereas the corrected procedure yielded positive associations between MoCA total score and clusters of cortical thickness in the left superior parietal (
= 0.018) and left insula (
= 0.04) regions. Further analyses with the corrected cortical thickness results and MoCA subscores showed a positive association between left superior parietal cortical thickness and Attention (
< 0.001).
These findings suggest that correction procedures which account for brain atrophy and neurovascular lesions can significantly improve FS's segmentation results and reduce failure rates, thus maximizing power by preventing the loss of our important study participants. Future work will examine relationships between cortical thickness, cerebral small vessel disease, and cognitive dysfunction due to neurodegenerative disease in the ONDRI study.
Psychological research often involves complex datasets that cannot easily be analyzed using traditional statistical methods. Multiblock Discriminant Correspondence Analysis (multiblock dica, also ...called mudica) examines group differences in large, structured categorical datasets and identifies blocks of variables that contribute to these differences. Data for this illustration were obtained from a study on mental health literacy (N = 648) that included 33 questions that were arranged into four blocks: etiology, symptoms, treatment, and general knowledge of psychological disorders. With non-parametric inference tests and results displayed as intuitive maps, mudica revealed differences in performance across groups not readily detectable using standard methods.
•Correspondence Analysis (CA) of Temporal Check-All-That-Apply (T-CATA) data can lead to misinterpretation.•Canonical CA and Conditional CA are evaluated as alternative approaches.•Canonical CA ...emphasizes temporal effects common to all products.•Conditional CA removes common temporal effects.•Together they facilitate a richer and more accurate interpretation.
Temporal Check-All-That-Apply (TCATA) extends classical Check-All-That-Apply (CATA) by adding a temporal dimension to the evaluation. Because TCATA extends CATA, an obvious visualization of product-attribute associations over time is to treat product x time combinations as individual observations and then use classical Correspondence Analysis (CA) to visualize the associations. Often the CA results and visualization emphasize the chronological features. However, this approach could lead to misinterpretations as time is not just a feature but also a confound. Because of time, all products might show convergence to, e.g., off flavor, which is produced only by a few observations that provide a relative but not an absolute peak in this attribute.
Therefore, we suggest alternative CA approaches to analyze TCATA data that emphasize (Canonical CA, CanCA) or remove (Escofier’s Conditional CA, ConCA) temporal effects. Generally, CanCA was designed to analyze CA data in the presence of row and column covariates; it is related to canonical correlation analysis. When there is only one set of covariates (e.g., row), CanCA is more akin to redundancy analysis. Here, we use external row information – time and product – to emphasize the overall temporal profile applying to all products. CanCA nicely displays the main product differences within the attribute space. CanCA better emphasizes than CA the unique properties of each product over time. Escofier’s conditional CA (ConCA) removes confounding effects such as time. ConCA provides two features for TCATA: (1) effects adjusted for time and (2) more appropriate measures of strength of association that can be used with CA for better visualization.
We exemplify the proposed methods by means of data from a study on orange squashes. The relevance of off flavor is (correctly) found to be largely de-emphasized compared to standard CA: CanCA shows off flavor as an average effect because of time and ConCA shows off flavor does not contribute to the overall effect. Together CanCA and ConCA facilitate a richer, more detailed, and potentially more accurate interpretation of the data. The approaches can be equally used for Temporal Dominance of Sensations (TDS) data.
We present a generalization of mean-centered partial least squares correlation called multiblock barycentric discriminant analysis (MUBADA) that integrates multiple regions of interest (ROIs) to ...analyze functional brain images of cerebral blood flow or metabolism obtained with SPECT or PET. To illustrate MUBADA we analyzed data from 104 participants comprising Alzheimer's disease (AD) patients, frontotemporal dementia (FTD) patients, and elderly normal controls. Brain images were analyzed via 28 ROIs (59,845 voxels) selected for clinical relevance. This is a discriminant analysis (DA) question with several blocks (one per ROI) and with more variables than observations, a configuration that precludes using DA. MUBADA revealed two factors explaining 74% and 26% of the total variance: Factor 1 isolated FTD, and Factor 2 isolated AD. A random effects model correctly classified 64% (chance = 33%) of "new" participants (p < 0.0001). MUBADA identified ROIs that best discriminated groups: ROIs separating FTD were bilateral inferior, middle frontal, left inferior, and middle temporal gyri, while ROIs separating AD were bilateral thalamus, inferior parietal gyrus, inferior temporal gyrus, left precuneus, middle frontal, and middle temporal gyri. MUBADA classified participants at levels comparable to standard methods (i.e., SVM, PCA-LDA, and PLS-DA) but provided information (e.g., discriminative ROIs and voxels) not easily accessible to these methods.