Summary Have state-of-the-art clinical trials failed to deliver treatments for neurodegenerative diseases because of shortcomings in the rating scales used? This Review assesses two methodological ...limitations of rating scales that might help to answer this question. First, the numbers generated by most rating scales do not satisfy the criteria for rigorous measurements. Second, we do not really know which variables most rating scales measure. We use clinical examples to highlight concerns about the limitations of rating scales, examine their underlying rationales, clarify their implications, explore potential solutions, and make some recommendations for future research. We show that improvements in the scientific rigour of rating scales can improve the chances of reaching the correct conclusions about the effectiveness of treatments.
Multiple sclerosis (MS) is associated with chronic symptoms, including muscle stiffness, spasms, pain and insomnia. Here we report the results of the Multiple Sclerosis and Extract of Cannabis ...(MUSEC) study that aimed to substantiate the patient based findings of previous studies.
Patients with stable MS at 22 UK centres were randomised to oral cannabis extract (CE) (N=144) or placebo (N=135), stratified by centre, walking ability and use of antispastic medication. This double blind, placebo controlled, phase III study had a screening period, a 2 week dose titration phase from 5 mg to a maximum of 25 mg of tetrahydrocannabinol daily and a 10 week maintenance phase. The primary outcome measure was a category rating scale (CRS) measuring patient reported change in muscle stiffness from baseline. Further CRSs assessed body pain, spasms and sleep quality. Three validated MS specific patient reported outcome measures assessed aspects of spasticity, physical and psychological impact, and walking ability.
The rate of relief from muscle stiffness after 12 weeks was almost twice as high with CE than with placebo (29.4% vs. 15.7%; OR 2.26; 95% CI 1.24 to 4.13; p=0.004, one sided). Similar results were found after 4 weeks and 8 weeks, and also for all further CRSs. Results from the MS scales supported these findings.
The study met its primary objective to demonstrate the superiority of CE over placebo in the treatment of muscle stiffness in MS. This was supported by results for secondary efficacy variables. Adverse events in participants treated with CE were consistent with the known side effects of cannabinoids. No new safety concerns were observed.
NCT00552604.
Rating scales are increasingly used in neurologic research and trials. A key question relating to their use across the range of neurologic diseases, both common and rare, is what sample sizes provide ...meaningful estimates of reliability and validity. Here, we address two questions: (1) to what extent does sample size influence the
stability
of reliability and validity estimates; and (2) to what extent does sample size influence the
inferences
made from reliability and validity testing? We examined data from two studies. In Study 1, we
retrospectively
reduced the total sample randomly and nonrandomly by decrements of approximately 50 % to generate sub-samples from
n
= 713–20. In Study 2, we
prospectively
generated sub-samples from
n
= 20–320, by entry time into study. In all samples we estimated reliability (internal consistency, item total correlations, test–retest) and validity (within scale correlations, convergent and discriminant construct validity). Reliability estimates were stable in magnitude and interpretation in all sub-samples of both studies. Validity estimates were stable in samples of
n
≥ 80, for 75 % of scales in samples of
n
= 40, and for 50 % of scales in samples of
n
= 20. In this study, sample sizes of a minimum of 20 for reliability and 80 for validity provided estimates highly representative of the main study samples. These findings should be considered provisional and more work is needed to determine if these estimates are generalisable, consistent, and useful.
The Alzheimer's Disease Assessment Scale Cognitive Behavior Section (ADAS-cog), a measure of cognitive performance, has been used widely in Alzheimer's disease trials. Its key role in clinical trials ...should be supported by evidence that it is both clinically meaningful and scientifically sound. Its conceptual and neuropsychological underpinnings are well-considered, but its performance as an instrument of measurement has received less attention. Objective To examine the traditional psychometric properties of the ADAS-cog in a large sample of people with Alzheimer's disease.
Data from three clinical trials of donepezil (Aricept) in mild-to-moderate Alzheimer's disease (n=1421; MMSE 10-26) were analysed at both the scale and component level. Five psychometric properties were examined using traditional psychometric methods. These methods of examination underpin upcoming Food and Drug Administration recommendations for patient rating scale evaluation.
At the scale-level, criteria tested for data completeness, scaling assumptions (eg, component total correlations: 0.39-0.67), targeting (no floor or ceiling effects), reliability (eg, Cronbach's α: = 0.84; test-retest intraclass correlations: 0.93) and validity (correlation with MMSE: -0.63) were satisfied. At the component level, 7 of 11 ADAS-cog components had substantial ceiling effects (range 40-64%).
Performance was satisfactory at the scale level, but most ADAS-cog components were too easy for many patients in this sample and did not reflect the expected depth and range of cognitive performance. The clinical implication of this finding is that the ADAS-cog's estimate of cognitive ability, and its potential ability to detect differences in cognitive performance under treatment, could be improved. However, because of the limitations of traditional psychometric methods, further evaluations would be desirable using additional rating scale analysis techniques to pinpoint specific improvements.
The problem with health measurement Cano, Stefan J; Hobart, Jeremy C
Patient preference and adherence,
01/2011, Letnik:
5, Številka:
default
Journal Article
Recenzirano
Odprti dostop
In this review we discuss health measurement with a focus on psychometric methods and methodology. In particular, we examine some of the key issues currently facing the use of clinician and patient ...rating scales to measure the health outcomes of disease and treatment. We present three key facts and flag one crucial problem. First, the numbers generated by scales are increasingly used as the measurements of the central dependent variables upon which clinical decisions are frequently made. The rising profile of rating scales has significant implications for scale construction, evaluation, and selection, as well as for interpreting studies. Second, rating scale science is well established. Therefore, it is important to learn the lessons from those who have built and established the science over the last century. Finally, the goal of a rating scale is to measure. As such, over the last half century, developments in rating scale (psychometric) methods have caused a refocus in the way we should be measuring health. In particular, newer methods have significant clinical advantages over traditional approaches. These should be seriously considered for inclusion in everyday practice. This leads us to the central problem with health measurement, which is that we cannot currently be sure what most rating scales are measuring. This is because the methods we have in place to ensure the validity of rating scales fall short of what is actually required. We expand on this point, and provide some potential routes forward to help address this important problem.
There is a need for greater understanding of the impact of multiple sclerosis (MS) from the perspective of individuals with the condition. The South West Impact of MS Project (SWIMS) has been ...designed to improve understanding of disease impact using a patient-centred approach. The purpose is to (1) develop improved measurement instruments for clinical trials, (2) evaluate longitudinal performance of a variety of patient-reported outcome measures, (3) develop prognostic predictors for use in individualising drug treatment for patients, particularly early on in the disease course.
This is a patient-centred, prospective, longitudinal study of multiple sclerosis and clinically isolated syndrome (CIS) in south west England. The study area comprises two counties with a population of approximately 1.7 million and an estimated 1,800 cases of MS. Self-completion questionnaires are administered to participants every six months (for people with MS) or 12 months (CIS). Here we present descriptive statistics of the baseline data provided by 967 participants with MS.
Seventy-five percent of those approached consented to participate. The male:female ratio was 1.00:3.01 (n = 967). Average (standard deviation) age at time of entry to SWIMS was 51.6 (11.5) years (n = 961) and median (interquartile range) time since first symptom was 13.3 (6.8 to 24.5) years (n = 934). Fatigue was the most commonly reported symptom, with 80% of participants experiencing fatigue at baseline. Although medication use for symptom control was common, there was little evidence of effectiveness, particularly for fatigue. Nineteen percent of participants were unable to classify their subtype of MS. When patient-reported subtype was compared to neurologist assessment for a sample of participants (n = 396), agreement in disease sub-type was achieved in 63% of cases. There were 836 relapses, reported by 931 participants, in the twelve months prior to baseline. Twenty-three percent of the relapsing-remitting group and 12% of the total sample were receiving disease-modifying therapy at baseline.
Demographics of this sample were similar to published data for the UK. Overall, the results broadly reflect clinical experience in confirming high symptom prevalence, with relatively little complete symptom relief. Participants often had difficulty in defining MS relapses and their own MS type.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Previous comparisons of the ability to detect change in the Barthel Index (BI) and Functional Independence Measure motor scale (FIMm) have implied these two scales are equally responsive when ...examined using traditional effect size statistics. Clinically, this is counterintuitive as the FIMm has greater potential to detect change than the BI and raises concerns about the validity of effect size statistics as indicators of rating scale responsiveness. To examine these concerns, in this study a sophisticated psychometric analysis was applied, Rasch measurement to BI and FIMm data.
BI and FIMm data were examined from 976 people at a single neurorehabilitation unit. Rasch analysis was used to compare the responsiveness of the BI and FIMm at the group comparison level (effect sizes, relative efficiency, relative precision) and for each individual person in the sample by computing the significance of their change.
Group level analyses from both interval measurements and ordinal scores implied the BI and FIMm had equivalent responsiveness (BI and FIMm effect size ranges -0.82 to -1.12 and -0.77 to -1.05, respectively). However, individual person level analyses indicated that the FIMm detected significant improvement in almost twice as many people as the BI (50%, n=496 vs 31%, n=298), and recorded less people as unchanged on discharge (FIMm=4%, n=38; BI=12%, n=115). This difference was found to be statistically significant (chi(2)=273.81; p<0.000).
These findings demonstrate that effect size calculations are limited and potentially misleading indicators of rating scale responsiveness at the group comparison level. Rasch analysis at the individual person level showed the superior responsiveness of the FIMm, supporting clinical expectation, and its added value as a method for examining and comparing rating scale responsiveness.
The Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) is widely used to measure health status after stroke. However, a fundamental assumption for its valid use after stroke has not been ...comprehensively tested: is it legitimate to generate scores for 8 scales and 2 summary measures using the standard algorithms? We tested this assumption.
SF-36 data from 177 people after stroke were examined (71% male; mean age, 62). We tested 6 scaling criteria to determine the legitimacy of generating the 8 SF-36 scale scores using Likert's method of summed ratings, and we tested 2 scaling criteria to determine the appropriateness of the standard SF-36 algorithms for weighting and combining scale scores to generate 2 summary measures (physical and mental).
Scaling assumptions were fully satisfied for 6 of the 8 scales, but 3 of these 6 scales had notable floor and/or ceiling effects. Assumptions for generating 2 SF-36 summary measures were not satisfied.
In this sample, 5 of the 8 SF-36 scales had limited validity as outcome measures after stroke, and the reporting of physical and mental summary scores was not supported. Results raise questions about the use of the SF-36 in stroke, and the SF-12 that is developed from it, and highlight the importance of testing scaling assumptions when applying existing scales to new populations.
Intravenous steroids are routinely used to treat disabling relapses in multiple sclerosis (MS). Theoretically, the infusion could take place at home, rather than in hospital. Findings from other ...patient populations suggest that patients may find the experiences of home relapse management more desirable. However, formal comparison of these two settings, from the patients' point of view, was prevented by the lack of a clinical scale. We report the development of a rating scale to measure patient's experiences of relapse management that allowed this question to be answered confidently.
Scale development had three stages. First, in-depth interviews of 21 MS patients generated a conceptual model and pool of potential scale items. Second, these items were administered to 160 people with relapsing-remitting MS. Standard psychometric techniques were used to develop a scale. Third, the psychometric properties of the scale were evaluated in a randomised controlled trial of 138 patients whose relapses were managed either at home or hospital.
A preliminary conceptual model with eight dimensions, and a pool of 154 items was generated. From this we developed the MS Relapse Management Scale (MSRMS), a 42-item with four subscales: access to care (6 items), coordination of care (11 items), information (7 items), interpersonal care (18 items). The MSRMS subscales satisfied most psychometric criteria but had notable floor effects.
The MSRMS is a reliable and valid measure of patients' experiences of MS relapse management. The high floor effects suggest most respondents had positive care experiences. Results demonstrate that patients' experiences of relapse management can be measured, and that the MSRMS is a powerful tool for determining which services to develop, support and ultimately commission.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK