The ability of remote research tools to collect granular, high-frequency data on symptoms and digital biomarkers is an important strength because it circumvents many limitations of traditional ...clinical trials and improves the ability to capture clinically relevant data. This approach allows researchers to capture more robust baselines and derive novel phenotypes for improved precision in diagnosis and accuracy in outcomes. The process for developing these tools however is complex because data need to be collected at a frequency that is meaningful but not burdensome for the participant or patient. Furthermore, traditional techniques, which rely on fixed conditions to validate assessments, may be inappropriate for validating tools that are designed to capture data under flexible conditions. This paper discusses the process for determining whether a digital assessment is suitable for remote research and offers suggestions on how to validate these novel tools.
Introduction
Biomarkers of mental effort may help to identify subtle cognitive impairments in the absence of task performance deficits. Here, we aim to detect mental effort on a verbal task, using ...automated voice analysis and machine learning.
Methods
Audio data from the digit span backwards task were recorded and scored with automated speech recognition using the online platform NeuroVocalix
TM
, yielding usable data from 2,764 healthy adults (1,022 male, 1,742 female; mean age 31.4 years). Acoustic features were aggregated across each trial and normalized within each subject. Cognitive load was dichotomized for each trial by categorizing trials at >0.6 of each participants' maximum span as “high load.” Data were divided into training (60%), test (20%), and validate (20%) datasets, each containing different participants. Training and test data were used in model building and hyper-parameter tuning. Five classification models (Logistic Regression, Naive Bayes, Support Vector Machine, Random Forest, and Gradient Boosting) were trained to predict cognitive load (“high” vs. “low”) based on acoustic features. Analyses were limited to correct responses. The model was evaluated using the validation dataset, across all span lengths and within the subset of trials with a four-digit span. Classifier discriminant power was examined with Receiver Operating Curve (ROC) analysis.
Results
Participants reached a mean span of 6.34 out of 8 items (SD = 1.38). The Gradient Boosting classifier provided the best performing model on test data (AUC = 0.98) and showed excellent discriminant power for cognitive load on the validation dataset, across all span lengths (AUC = 0.99), and for four-digit only utterances (AUC = 0.95).
Discussion
A sensitive biomarker of mental effort can be derived from vocal acoustic features in remotely administered verbal cognitive tests. The use-case of this biomarker for improving sensitivity of cognitive tests to subtle pathology now needs to be examined.
Cognitive symptoms are an underrecognized aspect of depression that are often untreated. High-frequency cognitive assessment holds promise for improving disease and treatment monitoring. Although we ...have previously found it feasible to remotely assess cognition and mood in this capacity, further work is needed to ascertain the optimal methodology to implement and synthesize these techniques.
The objective of this study was to examine (1) longitudinal changes in mood, cognition, activity levels, and heart rate over 6 weeks; (2) diurnal and weekday-related changes; and (3) co-occurrence of fluctuations between mood, cognitive function, and activity.
A total of 30 adults with current mild-moderate depression stabilized on antidepressant monotherapy responded to testing delivered through an Apple Watch (Apple Inc) for 6 weeks. Outcome measures included cognitive function, assessed with 3 brief n-back tasks daily; self-reported depressed mood, assessed once daily; daily total step count; and average heart rate. Change over a 6-week duration, diurnal and day-of-week variations, and covariation between outcome measures were examined using nonlinear and multilevel models.
Participants showed initial improvement in the Cognition Kit N-Back performance, followed by a learning plateau. Performance reached 90% of individual learning levels on average 10 days after study onset. N-back performance was typically better earlier and later in the day, and step counts were lower at the beginning and end of each week. Higher step counts overall were associated with faster n-back learning, and an increased daily step count was associated with better mood on the same (P<.001) and following day (P=.02). Daily n-back performance covaried with self-reported mood after participants reached their learning plateau (P=.01).
The current results support the feasibility and sensitivity of high-frequency cognitive assessments for disease and treatment monitoring in patients with depression. Methods to model the individual plateau in task learning can be used as a sensitive approach to better characterize changes in behavior and improve the clinical relevance of cognitive data. Wearable technology allows assessment of activity levels, which may influence both cognition and mood.
Background: Digital measures offer an unparalleled opportunity to create a more holistic picture of how people who are patients behave in their real-world environments, thereby establishing a better ...connection between patients, caregivers, and the clinical evidence used to drive drug development and disease management. Reaching this vision will require achieving a new level of co-creation between the stakeholders who design, develop, use, and make decisions using evidence from digital measures. Summary: In September 2022, the second in a series of meetings hosted by the Swiss Federal Institute of Technology in Zürich, the Foundation for the National Institutes of Health Biomarkers Consortium, and sponsored by Wellcome Trust, entitled “Reverse Engineering of Digital Measures,” was held in Zurich, Switzerland, with a broad range of stakeholders sharing their experience across four case studies to examine how patient centricity is essential in shaping development and validation of digital evidence generation tools. Key Messages: In this paper, we discuss progress and the remaining barriers to widespread use of digital measures for evidence generation in clinical development and care delivery. We also present key discussion points and takeaways in order to continue discourse and provide a basis for dissemination and outreach to the wider community and other stakeholders. The work presented here shows us a blueprint for how and why the patient voice can be thoughtfully integrated into digital measure development and that continued multistakeholder engagement is critical for further progress.
Several app-based studies share similar characteristics of a light touch approach that recruit, enroll, and onboard via a smartphone app and attempt to minimize burden through low-friction active ...study tasks while emphasizing the collection of passive data with minimal human contact. However, engagement is a common challenge across these studies, reporting low retention and adherence.
This study aims to describe an alternative to a light touch digital health study that involved a participant-centric design including high friction app-based assessments, semicontinuous passive data from wearable sensors, and a digital engagement strategy centered on providing knowledge and support to participants.
The Stress and Recovery in Frontline COVID-19 Health Care Workers Study included US frontline health care workers followed between May and November 2020. The study comprised 3 main components: (1) active and passive assessments of stress and symptoms from a smartphone app, (2) objective measured assessments of acute stress from wearable sensors, and (3) a participant codriven engagement strategy that centered on providing knowledge and support to participants. The daily participant time commitment was an average of 10 to 15 minutes. Retention and adherence are described both quantitatively and qualitatively.
A total of 365 participants enrolled and started the study, and 81.0% (n=297) of them completed the study for a total study duration of 4 months. Average wearable sensor use was 90.6% days of total study duration. App-based daily, weekly, and every other week surveys were completed on average 69.18%, 68.37%, and 72.86% of the time, respectively.
This study found evidence for the feasibility and acceptability of a participant-centric digital health study approach that involved building trust with participants and providing support through regular phone check-ins. In addition to high retention and adherence, the collection of large volumes of objective measured data alongside contextual self-reported subjective data was able to be collected, which is often missing from light touch digital health studies.
ClinicalTrials.gov NCT04713111; https://clinicaltrials.gov/ct2/show/NCT04713111.
Background
More sensitive and less burdensome efficacy end points are urgently needed to improve the effectiveness of clinical drug development for Alzheimer disease (AD). Although conventional end ...points lack sensitivity, digital technologies hold promise for amplifying the detection of treatment signals and capturing cognitive anomalies at earlier disease stages. Using digital technologies and combining several test modalities allow for the collection of richer information about cognitive and functional status, which is not ascertainable via conventional paper-and-pencil tests.
Objective
This study aimed to assess the psychometric properties, operational feasibility, and patient acceptance of 10 promising technologies that are to be used as efficacy end points to measure cognition in future clinical drug trials.
Methods
The Method for Evaluating Digital Endpoints in Alzheimer Disease study is an exploratory, cross-sectional, noninterventional study that will evaluate 10 digital technologies’ ability to accurately classify participants into 4 cohorts according to the severity of cognitive impairment and dementia. Moreover, this study will assess the psychometric properties of each of the tested digital technologies, including the acceptable range to assess ceiling and floor effects, concurrent validity to correlate digital outcome measures to traditional paper-and-pencil tests in AD, reliability to compare test and retest, and responsiveness to evaluate the sensitivity to change in a mild cognitive challenge model. This study included 50 eligible male and female participants (aged between 60 and 80 years), of whom 13 (26%) were amyloid-negative, cognitively healthy participants (controls); 12 (24%) were amyloid-positive, cognitively healthy participants (presymptomatic); 13 (26%) had mild cognitive impairment (predementia); and 12 (24%) had mild AD (mild dementia). This study involved 4 in-clinic visits. During the initial visit, all participants completed all conventional paper-and-pencil assessments. During the following 3 visits, the participants underwent a series of novel digital assessments.
Results
Participant recruitment and data collection began in June 2020 and continued until June 2021. Hence, the data collection occurred during the COVID-19 pandemic (SARS-CoV-2 virus pandemic). Data were successfully collected from all digital technologies to evaluate statistical and operational performance and patient acceptance. This paper reports the baseline demographics and characteristics of the population studied as well as the study's progress during the pandemic.
Conclusions
This study was designed to generate feasibility insights and validation data to help advance novel digital technologies in clinical drug development. The learnings from this study will help guide future methods for assessing novel digital technologies and inform clinical drug trials in early AD, aiming to enhance clinical end point strategies with digital technologies.
International Registered Report Identifier (IRRID)
DERR1-10.2196/35442
Abstract
Background
PAL is a well‐known and trusted task with good sensitivity to deficiencies in memory capabilities as has been shown in hundreds of studies over the past two decades. There has ...been considerable interest in a smartphone version of PAL that can be used remotely, and Cambridge Cognition has now completed work on the first production version of this task. Here, we show how the task compares with classic PAL, in an online within‐subjects study. PAL may be used in repeated assessments, and one notable feature is that performance tends to improve most strongly between a first and second attempt at the task. We therefore included a first smartphone ‘Familiarisation’ session on Day One, before the Days Two and Three web and smartphone sessions.
Method
We obtained PAL scores from 76 adults (43m, 33f) aged 50+ (M = 56.3, SD = 5.46) using the Prolific online cognitive testing platform for this three‐day crossover‐design study. Participants were divided into two groups. In a first ‘Familiarisation’ session, all participants used their smartphones to complete a PAL assessment, then on Day Two, one group used the web‐based assessment, and the other used smartphone assessment, and on the last day each group changed device from the previous day’s session, as well as a questionnaire about their experience and comparison of the task in different devices.
Results
We found that test‐retest reliability between the tasks was very good (Pearson’s
r
> .7) between the first (‘Familiarisation’) session and Day Two, and then excellent (Pearson’s
r
> .8) between mobile and web devices on Day Two and Day Three of the study. Next, when comparing performance across the two groups (web‐first, and smartphone‐first), bias was not significant and numerically a very small 0.2 adjusted error points. A significant improvement was seen between Familiarisation and Day Two, but not second and third, supporting the value of a familiarisation session.
Conclusion
We found that the new smartphone version is highly comparable to the classic version of the task, and that familiarisation using PAL on smartphone is effective, generalising across both task versions.
Background
Learning over repeated exposure (LORE) in periods of days or months has been shown in recent studies to be sensitive to amyloid load where short‐term learning tested in immediate recall ...was not (Samaroo et al., 2020). Prior LORE studies have used visual presentations of words, non‐words, and faces. We addressed the feasibility of using validated verbal paired associate (VPA) stimuli in an automated LORE task, with verbal delivery and scoring performed by our NeuroVocalix system using text‐to‐speech (TTS) and automated speech recognition (ASR). Previously we have shown that using a word‐pair memorability model based on natural language characteristics to generate sets of VPA word‐pairs of equivalent memorability allows for repeated testing with interchangeable stimulus sets. Here we explore the use of one such set in an adaptation of VPA that assesses LORE rather than immediate recall and shows promising results.
Method
In this pilot study, we assessed memory performance in 20 older adults (10m, 10f) aged 65+ years, (M = 70.7, SD = 4.65), recruited using the Prolific online platform with the test delivered using our proprietary system. The task for participants was to learn a set of eight word‐pair associations over a burst of five days. On the first day, participants heard the word‐pair set, and were immediately tested for recall of that set. On days two, three, and four of the study, participants first attempted cued recall of the eight word‐pairs, after which the set was presented again, and the session ended. On the final day, cued recall was tested for the last time.
Result
On the first day, immediate recall scores were in line with first attempt VPA scores observed in validation work. On the second day of the series, scores fell through forgetting, and then learning over three further exposures showed monotonically increasing recall scores with neither ceiling nor floor effects across the group. Adherence was excellent, with 99% of sessions successfully completed, and participants reported high enjoyment of the task and eager anticipation of sessions.
Conclusion
VPA‐form LORE using our validated VPA word‐pair stimuli performs well and shows promising task characteristics in a representative sample.
Background
PAL is a well‐known and trusted task with good sensitivity to deficiencies in memory capabilities as has been shown in hundreds of studies over the past two decades. There has been ...considerable interest in a smartphone version of PAL that can be used remotely, and Cambridge Cognition has now completed work on the first production version of this task. Here, we show how the task compares with classic PAL, in an online within‐subjects study. PAL may be used in repeated assessments, and one notable feature is that performance tends to improve most strongly between a first and second attempt at the task. We therefore included a first smartphone ‘Familiarisation’ session on Day One, before the Days Two and Three web and smartphone sessions.
Method
We obtained PAL scores from 76 adults (43m, 33f) aged 50+ (M = 56.3, SD = 5.46) using the Prolific online cognitive testing platform for this three‐day crossover‐design study. Participants were divided into two groups. In a first ‘Familiarisation’ session, all participants used their smartphones to complete a PAL assessment, then on Day Two, one group used the web‐based assessment, and the other used smartphone assessment, and on the last day each group changed device from the previous day’s session, as well as a questionnaire about their experience and comparison of the task in different devices.
Result
We found that test‐retest reliability between the tasks was very good (Pearson’s r > .7) between the first (‘Familiarisation’) session and Day Two, and then excellent (Pearson’s r > .8) between mobile and web devices on Day Two and Day Three of the study. Next, when comparing performance across the two groups (web‐first, and smartphone‐first), bias was not significant and numerically a very small 0.2 adjusted error points. A significant improvement was seen between Familiarisation and Day Two, but not second and third, supporting the value of a familiarisation session.
Conclusion
We found that the new smartphone version is highly comparable to the classic version of the task, and that familiarisation using PAL on smartphone is effective, generalising across both task versions.