Computerized assessments are already used to derive accurate and reliable measures of cognitive function. Web-based cognitive assessment could improve the accessibility and flexibility of research ...and clinical assessment, widen participation, and promote research recruitment while simultaneously reducing costs. However, differences in context may influence task performance.
This study aims to determine the comparability of an unsupervised, web-based administration of the Cambridge Neuropsychological Test Automated Battery (CANTAB) against a typical in-person lab-based assessment, using a within-subjects counterbalanced design. The study aims to test (1) reliability, quantifying the relationship between measurements across settings using correlational approaches; (2) equivalence, the extent to which test results in different settings produce similar overall results; and (3) agreement, by quantifying acceptable limits to bias and differences between measurement environments.
A total of 51 healthy adults (32 women and 19 men; mean age 36.8, SD 15.6 years) completed 2 testing sessions, which were completed on average 1 week apart (SD 4.5 days). Assessments included equivalent tests of emotion recognition (emotion recognition task ERT), visual recognition (pattern recognition memory PRM), episodic memory (paired associate learning PAL), working memory and spatial planning (spatial working memory SWM and one touch stockings of Cambridge), and sustained attention (rapid visual information processing RVP). Participants were randomly allocated to one of the two groups, either assessed in-person in the laboratory first (n=33) or with unsupervised web-based assessments on their personal computing systems first (n=18). Performance indices (errors, correct trials, and response sensitivity) and median reaction times were extracted. Intraclass and bivariate correlations examined intersetting reliability, linear mixed models and Bayesian paired sample t tests tested for equivalence, and Bland-Altman plots examined agreement.
Intraclass correlation (ICC) coefficients ranged from ρ=0.23-0.67, with high correlations in 3 performance indices (from PAL, SWM, and RVP tasks; ρ≥0.60). High ICC values were also seen for reaction time measures from 2 tasks (PRM and ERT tasks; ρ≥0.60). However, reaction times were slower during web-based assessments, which undermined both equivalence and agreement for reaction time measures. Performance indices did not differ between assessment settings and generally showed satisfactory agreement.
Our findings support the comparability of CANTAB performance indices (errors, correct trials, and response sensitivity) in unsupervised, web-based assessments with in-person and laboratory tests. Reaction times are not as easily translatable from in-person to web-based testing, likely due to variations in computer hardware. The results underline the importance of examining more than one index to ascertain comparability, as high correlations can present in the context of systematic differences, which are a product of differences between measurement environments. Further work is now needed to examine web-based assessments in clinical populations and in larger samples to improve sensitivity for detecting subtler differences between test settings.
Background
Verbal‐Paired Associates (VPA) is widely used measure of memory. However, the memorability characteristics of the word‐pairs are not well‐understood, and the task currently suffers from ...ceiling effects (Uttl et al. 2002). Here we describe a data driven process for word‐pair selection, yielding precise estimates of memorability at the item and list level. This allows tuning of task difficulty to participant characteristics, and the development of many well‐calibrated parallel forms for longitudinal automated testing.
Method
For model development we recruited a total of 185 participants aged 18‐40 years without psychiatric diagnoses, chronic medication, or historical head injury using the Prolific online testing platform. We used Cambridge Cognition’s NeuroVocalix software to automatically deliver and score Learning and Recall phases of VPA.
In a series of experiments, we explored the contributions of item‐level, pair‐level, list‐level, and study‐level properties to the probability of subsequent recall. At the item‐level, we considered word‐concreteness and word‐frequency. To capture the semantic relatedness of word‐pairs, we trained the GloVe neural network model (Pennington et al, 2014) to predict the co‐occurrence of words from a corpus of transcribed English. This yields high‐dimensional vectors, from which we could calculate the semantic cosine distance between items in a word‐pair. At the list‐level we considered order effects (primacy and recency), and at the study level we considered practice effects.
Data from these experiments was combined in a linear mixed effects logistic regression model to yield predicted memorability estimates for new sets of word‐pairs. These test sets were then deployed with a cohort of 43 older adults (aged 55 years and upwards) to evaluate this prediction.
Result
Modelling showed significant contributions of item‐level, pair‐level, list‐level, and study‐level factors to probability of recall. When applied to the new set of words and tested with our older cohort, we found that this model‐derived measure was a robust predictor of recall (r = .931, p < 10‐6).
Conclusion
We have a robust method of predicting, for a given word‐pair, what the likelihood of recall will be. This enables the generation of sets of word‐pairs with well‐understood memorability characteristics, for use in automated, repeat, remote assessment.
Background
Cognitive load is the mental demand a task imposes for a specific person. Performance declines when demand exceed capacity; therefore, increase of mental effort may precede measurable ...cognitive decline. Physiological indices of load (e.g. heart rate, skin conductance etc.) are sensitive to task demand (e.g. subtracting three vs seven), show increased cognitive load with ageing, and in MCI compared to healthy ageing. Voice features have promise as non‐invasive and scalable indicators of mental effort. Here, we aim to classify serial subtraction at high and low cognitive load using voice recordings captured using an automated remote data collection system.
Method
Participants (aged 17‐86) completed serial subtraction via the Neurovocalix web‐app on their own devices. From a pool of 5,742 participants, 100 were randomly selected for manual review. Seven participants were excluded for audio or performance issues. Responses were transcribed and the start and end of each subtraction attempt marked, producing 3,254 attempts for analysis. Low‐level acoustic features were extracted and aggregated over each attempt, then normalized within participant. Random Forest classifiers were trained and evaluated using Leave‐One‐Subject‐Out‐Cross‐Validation (LOSOCV) to predict high vs low load. LOSOCV repeatedly splits the dataset by subject, with one participant at a time used for testing, and the remainder used for training the model. This produces model predictions for each participant and attempt.
Result
Average cross‐validation accuracy was 0.81 (95% CI 0.78 to 0.84), with an average area under the curve (AUC) of 0.87 (95% CI 0.85 to 0.89). We tested predictions for specific numbers which appeared in both subtraction by seven and by three. Accuracy was 0.78, suggesting that predictions were not driven by specific numeric responses. We observed a significant negative correlation between behavioural performance on the task (response rate), and utterance load probability metric for utterances (ρ=‐0.32, p<0.001), suggesting that participants who were more fluent in serial subtraction exhibited lower cognitive load.
Conclusion
Acoustic features of voice can distinguish between utterances generated under conditions of high and low cognitive load during serial subtraction, adding a novel, independent and sensitive outcome measure to a cognitive task with established utility in the context of neurodegeneration.
The ability of remote research tools to collect granular, high-frequency data on symptoms and digital biomarkers is an important strength because it circumvents many limitations of traditional ...clinical trials and improves the ability to capture clinically relevant data. This approach allows researchers to capture more robust baselines and derive novel phenotypes for improved precision in diagnosis and accuracy in outcomes. The process for developing these tools however is complex because data need to be collected at a frequency that is meaningful but not burdensome for the participant or patient. Furthermore, traditional techniques, which rely on fixed conditions to validate assessments, may be inappropriate for validating tools that are designed to capture data under flexible conditions. This paper discusses the process for determining whether a digital assessment is suitable for remote research and offers suggestions on how to validate these novel tools.
Background
Decline in episodic memory is a strong marker of neurodegenerative diseases, making it an important target for cognitive assessment. We present a new remote, repeatable, and brief ...assessment of episodic memory and visual short‐term memory (VSTM) for smart phone deployment. We evaluated this task for sensitivity to age and established cognitive measures of memory and attention.
Method
The task consists of a learning phase where participants see a sequence of items, then are asked to replicate the item order and location by dragging and dropping these items on‐screen. A recall phase follows a minimum 12‐hour delay, with replication of the earlier response. Each session takes less than 2 minutes. The first phase supplies metrics of spatial and order precision VSTM, with the second phase characterising long‐term episodic memory.
Experiment 1 (n=133) compares the task with CANTAB assessments of visuospatial memory. In Experiment 2 (n=80) the novel task was repeated twice daily for a week, alongside momentary mood ratings, and a more comprehensive battery of CANTAB tasks at baseline.
In Experiment 1, multiple regression and dominance analyses explore the independent and shared variance of measures in explaining age. In Experiment 2, clustering and multivariate autoregressive models outline differences in timeseries relationships between mood and memory, and age.
Result
Experiment 1 revealed that CANTAB measures and novel spatial and order metrics were correlated with age (r=.20 to r=.37, adjusted P= <.05). Multiple regression showed models with CANTAB Paired Associate Learning (PAL) score, and delayed novel spatial metrics were significantly predictive of age (t= ‐4.08, 3.17, p <.005). The dominance analysis revealed novel immediate precision overlapped with existing measures in predicting Age, while delayed precision had more unique variance (4.6% vs 0.62% unique sample variance explained in Age, accounting for PAL).
Experiment 2 characterised variance in individual learning curves over time and differing autoregressive processes associated with age and CANTAB measures.
Conclusion
We present a brief novel episodic memory task that can be deployed remotely, with delayed spatial precision significantly predictive of healthy Ageing. Additionally, we show this test’s suitability as a repeated assessment, compatible with high‐frequency study designs in older populations.
Normative cognitive data can help to distinguish pathological decline from normal aging. This study presents normative data from the Cambridge Neuropsychological Test Automated Battery, using linear ...regression and nonlinear quantile regression approaches.
Heinz Nixdorf Recall study participants completed Cambridge Neuropsychological Test Automated Battery tests: paired-associate learning, spatial working memory, and reaction time. Data were available for 1349-1529 healthy adults aged 57-84 years. Linear and nonlinear quantile regression analyses examined age-related changes, adjusting for sex and education. Quantile regression differentiated seven performance bands (percentiles: 97.7, 93.3, 84.1, 50, 15.9, 6.7, and 2.3).
Normative data show age-related cognitive decline across all tests, but with quantile regression revealing heterogeneous trajectories of cognitive aging, particularly for the test of episodic memory function (paired-associate learning).
This study presents normative data from Cambridge Neuropsychological Test Automated Battery in mid-to-late life. Quantile regression can model heterogeneity in age-related cognitive trajectories as seen in the paired-associate learning episodic memory measure.
•The study presents normative cognitive data from the Cambridge Neuropsychological Test Automated Battery in mid-to-late life.•Most tasks showed similar decline across performance bands with increasing age.•Quantile regression is sensitive for evaluating diverging trajectories with age.•Episodic memory showed accelerated decline in the average performance range.