Summary Have state-of-the-art clinical trials failed to deliver treatments for neurodegenerative diseases because of shortcomings in the rating scales used? This Review assesses two methodological ...limitations of rating scales that might help to answer this question. First, the numbers generated by most rating scales do not satisfy the criteria for rigorous measurements. Second, we do not really know which variables most rating scales measure. We use clinical examples to highlight concerns about the limitations of rating scales, examine their underlying rationales, clarify their implications, explore potential solutions, and make some recommendations for future research. We show that improvements in the scientific rigour of rating scales can improve the chances of reaching the correct conclusions about the effectiveness of treatments.
Multiple sclerosis (MS) is associated with chronic symptoms, including muscle stiffness, spasms, pain and insomnia. Here we report the results of the Multiple Sclerosis and Extract of Cannabis ...(MUSEC) study that aimed to substantiate the patient based findings of previous studies.
Patients with stable MS at 22 UK centres were randomised to oral cannabis extract (CE) (N=144) or placebo (N=135), stratified by centre, walking ability and use of antispastic medication. This double blind, placebo controlled, phase III study had a screening period, a 2 week dose titration phase from 5 mg to a maximum of 25 mg of tetrahydrocannabinol daily and a 10 week maintenance phase. The primary outcome measure was a category rating scale (CRS) measuring patient reported change in muscle stiffness from baseline. Further CRSs assessed body pain, spasms and sleep quality. Three validated MS specific patient reported outcome measures assessed aspects of spasticity, physical and psychological impact, and walking ability.
The rate of relief from muscle stiffness after 12 weeks was almost twice as high with CE than with placebo (29.4% vs. 15.7%; OR 2.26; 95% CI 1.24 to 4.13; p=0.004, one sided). Similar results were found after 4 weeks and 8 weeks, and also for all further CRSs. Results from the MS scales supported these findings.
The study met its primary objective to demonstrate the superiority of CE over placebo in the treatment of muscle stiffness in MS. This was supported by results for secondary efficacy variables. Adverse events in participants treated with CE were consistent with the known side effects of cannabinoids. No new safety concerns were observed.
NCT00552604.
The Alzheimer's Disease Assessment Scale Cognitive Behavior Section (ADAS-cog), a measure of cognitive performance, has been used widely in Alzheimer's disease trials. Its key role in clinical trials ...should be supported by evidence that it is both clinically meaningful and scientifically sound. Its conceptual and neuropsychological underpinnings are well-considered, but its performance as an instrument of measurement has received less attention. Objective To examine the traditional psychometric properties of the ADAS-cog in a large sample of people with Alzheimer's disease.
Data from three clinical trials of donepezil (Aricept) in mild-to-moderate Alzheimer's disease (n=1421; MMSE 10-26) were analysed at both the scale and component level. Five psychometric properties were examined using traditional psychometric methods. These methods of examination underpin upcoming Food and Drug Administration recommendations for patient rating scale evaluation.
At the scale-level, criteria tested for data completeness, scaling assumptions (eg, component total correlations: 0.39-0.67), targeting (no floor or ceiling effects), reliability (eg, Cronbach's α: = 0.84; test-retest intraclass correlations: 0.93) and validity (correlation with MMSE: -0.63) were satisfied. At the component level, 7 of 11 ADAS-cog components had substantial ceiling effects (range 40-64%).
Performance was satisfactory at the scale level, but most ADAS-cog components were too easy for many patients in this sample and did not reflect the expected depth and range of cognitive performance. The clinical implication of this finding is that the ADAS-cog's estimate of cognitive ability, and its potential ability to detect differences in cognitive performance under treatment, could be improved. However, because of the limitations of traditional psychometric methods, further evaluations would be desirable using additional rating scale analysis techniques to pinpoint specific improvements.
Rating scales are increasingly used in neurologic research and trials. A key question relating to their use across the range of neurologic diseases, both common and rare, is what sample sizes provide ...meaningful estimates of reliability and validity. Here, we address two questions: (1) to what extent does sample size influence the
stability
of reliability and validity estimates; and (2) to what extent does sample size influence the
inferences
made from reliability and validity testing? We examined data from two studies. In Study 1, we
retrospectively
reduced the total sample randomly and nonrandomly by decrements of approximately 50 % to generate sub-samples from
n
= 713–20. In Study 2, we
prospectively
generated sub-samples from
n
= 20–320, by entry time into study. In all samples we estimated reliability (internal consistency, item total correlations, test–retest) and validity (within scale correlations, convergent and discriminant construct validity). Reliability estimates were stable in magnitude and interpretation in all sub-samples of both studies. Validity estimates were stable in samples of
n
≥ 80, for 75 % of scales in samples of
n
= 40, and for 50 % of scales in samples of
n
= 20. In this study, sample sizes of a minimum of 20 for reliability and 80 for validity provided estimates highly representative of the main study samples. These findings should be considered provisional and more work is needed to determine if these estimates are generalisable, consistent, and useful.
Abstract Introduction We present international consensus recommendations for improving diagnosis, management and treatment access in multiple sclerosis (MS). Our vision is that these will be used ...widely among those committed to creating a better future for people with MS and their families. Methods Structured discussions and literature searches conducted in 2015 examined the personal and economic impact of MS, current practice in diagnosis, treatment and management, definitions of disease activity and barriers to accessing disease-modifying therapies (DMTs). Results Delays often occur before a person with symptoms suggestive of MS sees a neurologist. Campaigns to raise awareness of MS are needed, as are initiatives to improve access to MS healthcare professionals and services. We recommend a clear treatment goal: to maximize neurological reserve, cognitive function and physical function by reducing disease activity. Treatment should start early, with DMT and lifestyle measures. All parameters that predict relapses and disability progression should be included in the definition of disease activity and monitored regularly when practical. On suboptimal control of disease activity, switching to a DMT with a different mechanism of action should be considered. A shared decision-making process that embodies dialogue and considers all appropriate DMTs should be implemented. Monitoring data should be recorded formally in registries to generate real-world evidence. In many jurisdictions, access to DMTs is limited. To improve treatment access the relevant bodies should consider all costs to all parties when conducting economic evaluations and encourage the continuing investigation, development and use of cost-effective therapeutic strategies and alternative financing models. Conclusions The consensus findings of an international author group recommend a therapeutic strategy based on proactive monitoring and shared decision-making in MS. Early diagnosis and improved treatment access are also key components.
Abstract An outcome assessment, the patient assessment used in an endpoint, is the measuring instrument that provides a rating or score (categorical or continuous) that is intended to represent some ...aspect of the patient’s health status. Outcome assessments are used to define efficacy endpoints when developing a therapy for a disease or condition. Most efficacy endpoints are based on specified clinical assessments of patients. When clinical assessments are used as clinical trial outcomes, they are called clinical outcome assessments (COAs). COAs include any assessment that may be influenced by human choices, judgment, or motivation. COAs must be well-defined and possess adequate measurement properties to demonstrate (directly or indirectly) the benefits of a treatment. In contrast, a biomarker assessment is one that is subject to little, if any, patient motivational or rater judgmental influence. This is the first of two reports by the ISPOR Clinical Outcomes Assessment – Emerging Good Practices for Outcomes Research Task Force. This report provides foundational definitions important for an understanding of COA measurement principles. The foundation provided in this report includes what it means to demonstrate a beneficial effect, how assessments of patients relate to the objective of showing a treatment’s benefit, and how these assessments are used in clinical trial endpoints. In addition, this report describes intrinsic attributes of patient assessments and clinical trial factors that can affect the properties of the measurements. These factors should be considered when developing or refining assessments. These considerations will aid investigators designing trials in their choice of using an existing assessment or developing a new outcome assessment. Although the focus of this report is on the development of a new COA to define endpoints in a clinical trial, these principles may be applied more generally. A critical element in appraising or developing a COA is to describe the treatment’s intended benefit as an effect on a clearly identified aspect of how a patient feels or functions. This aspect must have importance to the patient and be part of the patient’s typical life. This meaningful health aspect can be measured directly or measured indirectly when it is impractical to evaluate it directly or when it is difficult to measure. For indirect measurement, a concept of interest (COI) can be identified. The COI must be related to how a patient feels or functions. Procedures are then developed to measure the COI. The relationship of these measurements with how a patient feels or functions in the intended setting and manner of use of the COA (the context of use) could then be defined. A COA has identifiable attributes or characteristics that affect the measurement properties of the COA when used in endpoints. One of these features is whether judgment can influence the measurement, and if so, whose judgment. This attribute defines four categories of COAs: patient reported outcomes, clinician reported outcomes, observer reported outcomes, and performance outcomes. A full description as well as explanation of other important COA features is included in this report. The information in this report should aid in the development, refinement, and standardization of COAs, and, ultimately, improve their measurement properties.
Commentary On: Fekete TF, Haschtmann D, Kleinstück FS, Porchet F, Jeszenszky D, Mannion AF. What level of pain are patients happy to live with after surgery for lumbar degenerative disorders? Spine J ...2016:16:S12–18 ( in this issue ).
Introduction
Poorly developed patient-reported outcome measures (PROs) risk type-II errors (i.e. false negatives) in clinical trials, resulting in erroneous failure to achieve trial endpoints. ...Validity is a fundamental requirement of fit-for-purpose PROs, with the main determinant of validity being the PROs items, i.e. content validity. Here, we sought to identify fatigue PRO instruments used in multiple sclerosis (MS) studies and to assess the extent to which their development satisfied current content validity standards.
Methods
We searched Embase
®
and Medline
®
for MS studies using fatigue-based PROs. Abstracts were screened, PROs identified, and their relevant development papers assessed against seven Consensus Standards for Measurement Instruments (COSMIN) criteria for content development.
Results
From 3814 abstracts, 18 fatigue PROs met our inclusion criteria. Most PROs did not satisfy at least one COSMIN content validity standard. Frequent omissions during PRO development include: clearly defined constructs; conceptual frameworks; qualitative research in representative samples; and literature reviews. PRO development quality has improved significantly since FDA guidance was published (
U
= 10.0,
p
= 0.02). However, scatterplots and correlations between PRO COSMIN scores and citation frequency (rho = − 0.62) and clinical trials usage (rho = + 0.18) implied that PRO quality is unrelated to choice. COSMIN scores implied that the Fatigue Symptoms and Impact Questionnaire—Relapsing Multiple Sclerosis (FSIQ-RMS) and Neurological Fatigue Index—Multiple Sclerosis (NFI-MS) had the strongest evidence for adequate content validity.
Conclusion
Most existing fatigue PROs do not meet COSMIN content validity requirements. Although two PROs scored well on aggregate (NFI-MS and FSIQ-RMS), our subsequent evaluation of the item sets that generated their scores implied that both PROs have weaker content validity than COSMIN suggests. This indicates that COSMIN criteria require further development, and raises significant concerns about how we have measured one of the most common and burdensome MS symptoms. A detailed head-to-head psychometric evaluation is needed to determine the impact of different PRO development qualities and the implications of the problems implied by our analyses, on measurement performance.
Plain Language Summary
In MS clinical trials, impacts such as fatigue, walking ability, and quality of life, are measured using questionnaires—called patient-reported outcome measures—completed by people living with MS. The quality of these measures is fundamentally important. If poor quality patient-reported outcome measures are used, treatment benefits are easily missed or underestimated.
We studied the quality of 18 fatigue patient-reported outcome measures previously used in MS studies. Specifically, we studied how the questionnaire questions were developed and scored them against recognised quality control standards. In general, the patient-reported outcome measures were poor. Only two scored reasonably well. One common weakness was that people living with MS were not involved during patient-reported outcome measure development. We also conducted novel examinations that went beyond the quality control standards. These test how well the questions relate back to the MS impacts they claim to measure. We found even the two best patient-reported outcome measures were poor.
Our study had two findings. First, patient-reported outcome measures of MS fatigue are poor. Second, current standards for testing patient-reported outcome measure development are too easy to satisfy, overestimate patient-reported outcome measure quality, and need updating. Therefore, the ways we measure MS fatigue, one of the most common and burdensome MS symptoms, are scientifically weak.
3u_RvEza8RqxoERL84ceCF
Measuring fatigue in multiple sclerosis: there may be trouble ahead—a video abstract (MP4 125165 KB)
Summary Background Laboratory evidence has shown that cannabinoids might have a neuroprotective action. We investigated whether oral dronabinol (Δ9 -tetrahydrocannabinol) might slow the course of ...progressive multiple sclerosis. Methods In this multicentre, parallel, randomised, double-blind, placebo-controlled study, we recruited patients aged 18–65 years with primary or secondary progressive multiple sclerosis from 27 UK neurology or rehabilitation departments. Patients were randomly assigned (2:1) to receive dronabinol or placebo for 36 months; randomisation was by stochastic minimisation, using a computer-generated randomisation sequence, balanced according to expanded disability status scale (EDSS) score, centre, and disease type. Maximum dose was 28 mg per day, titrated against bodyweight and adverse effects. Primary outcomes were EDSS score progression (masked assessor, time to progression of ≥1 point from a baseline score of 4·0–5·0 or ≥0·5 points from a baseline score of ≥5·5, confirmed after 6 months) and change from baseline in the physical impact subscale of the 29-item multiple sclerosis impact scale (MSIS-29-PHYS). All patients who received at least one dose of study drug were included in the intention-to-treat analyses. This trial is registered as an International Standard Randomised Controlled Trial (ISRCTN 62942668). Findings Of the 498 patients randomly assigned to a treatment group, 329 received at least one dose of dronabinol and 164 received at least one dose of placebo (five did not receive the allocated intervention). 145 patients in the dronabinol group had EDSS score progression (0·24 first progression events per patient-year; crude rate) compared with 73 in the placebo group (0·23 first progression events per patient-year; crude rate); HR for prespecified primary analysis was 0·92 (95% CI 0·68–1·23; p=0·57). Mean yearly change in MSIS-29-PHYS score was 0·62 points (SD 3·29) in the dronabinol group versus 1·03 points (3·74) in the placebo group. Primary analysis with a multilevel model gave an estimated between-group difference (dronabinol–placebo) of −0·9 points (95% CI −2·0 to 0·2). We noted no serious safety concerns (114 35% patients in the dronabinol group had at least one serious adverse event, compared with 46 28% in the placebo group). Interpretation Our results show that dronabinol has no overall effect on the progression of multiple sclerosis in the progressive phase. The findings have implications for the design of future studies of progressive multiple sclerosis, because lower than expected progression rates might have affected our ability to detect clinical change. Funding UK Medical Research Council, National Institute for Health Research Efficacy and Mechanism Evaluation programme, Multiple Sclerosis Society, and Multiple Sclerosis Trust.
Abstract A clinician-reported outcome (ClinRO) assessment is a type of clinical outcome assessment (COA). ClinRO assessments, like all COAs (patient-reported, observer-reported, or performance ...outcome assessments), are used to 1) measure patients’ health status and 2) define end points that can be interpreted as treatment benefits of medical interventions on how patients feel, function, or survive in clinical trials. Like other COAs, ClinRO assessments can be influenced by human choices, judgment, or motivation. A ClinRO assessment is conducted and reported by a trained health care professional and requires specialized professional training to evaluate the patient’s health status. This is the second of two reports by the ISPOR Clinical Outcomes Assessment—Emerging Good Practices for Outcomes Research Task Force. The first report provided an overview of COAs including definitions important for an understanding of COA measurement practices. This report focuses specifically on issues related to ClinRO assessments. In this report, we define three types of ClinRO assessments (readings, ratings, and clinician global assessments) and describe emerging good measurement practices in their development and evaluation. The good measurement practices include 1) defining the context of use; 2) identifying the concept of interest measured; 3) defining the intended treatment benefit on how patients feel, function, or survive reflected by the ClinRO assessment and evaluating the relationship between that intended treatment benefit and the concept of interest; 4) documenting content validity; 5) evaluating other measurement properties once content validity is established (including intra- and inter-rater reliability); 6) defining study objectives and end point(s) objectives, and defining study end points and placing study end points within the hierarchy of end points; 7) establishing interpretability in trial results; and 8) evaluating operational considerations for the implementation of ClinRO assessments used as end points in clinical trials. Applying good measurement practices to ClinRO assessment development and evaluation will lead to more efficient and accurate measurement of treatment effects. This is important beyond regulatory approval in that it provides evidence for the uptake of new interventions into clinical practice and provides justification to payers for reimbursement on the basis of the clearly demonstrated added value of the new intervention.