Background Content validity is the most important measurement property of a patient-reported outcome measure (PROM) and the most challenging to assess. Our aims were to: (1) develop standards for ...evaluating the quality of PROM development; (2) update the original COSMIN standards for assessing the quality of content validity studies of PROMs; (3) develop criteria for what constitutes good content validity of PROMs, and (4) develop a rating system for summarizing the evidence on a PROM's content validity and grading the quality of the evidence in systematic reviews of PROMs. Methods An online 4-round Delphi study was performed among 159 experts from 21 countries. Panelists rated the degree to which they (dis)agreed to proposed standards, criteria, and rating issues on 5-point rating scales ('strongly disagree' to 'strongly agree'), and provided arguments for their ratings. Results Discussion focused on sample size requirements, recording and field notes, transcribing cognitive interviews, and data coding. After four rounds, the required 67% consensus was reached on all standards, criteria, and rating issues. After pilot-testing, the steering committee made some final changes. Ten criteria for good content validity were defined regarding item relevance, appropriateness of response options and recall period, comprehensiveness, and comprehensibility of the PROM. Discussion The consensus-based COSMIN methodology for content validity is more detailed, standardized, and transparent than earlier published guidelines, including the previous COSMIN standards. This methodology can contribute to the selection and use of high-quality PROMs in research and clinical practice.
Summary Objective To conduct a systematic review and meta-analysis to synthesise evidence regarding measurement properties of the Knee injury and Osteoarthritis Outcome Score (KOOS). Design A ...comprehensive literature search identified 37 eligible papers evaluating KOOS measurement properties in participants with knee injuries and/or osteoarthritis. Methodological quality was evaluated using the COSMIN checklist. Where possible, meta-analysis of extracted data was conducted for all studies and stratified by age and knee condition; otherwise narrative synthesis was performed. Results KOOS has adequate internal consistency, test-retest reliability and construct validity in young and old adults with knee injuries and/or osteoarthritis. The ADL subscale has better content validity for older patients and Sport/Rec for younger patients with knee injuries, while the Pain subscale is more relevant for painful knee conditions. The five-factor structure of the original KOOS is unclear. There is some evidence that the KOOS subscales demonstrate sufficient unidimensionality, but this requires confirmation. Although measurement error requires further evaluation, the minimal detectable change for KOOS subscales ranges from 14.3 to 19.6 for younger individuals, and ≥20 for older individuals. Evidence of responsiveness comes from larger effect sizes following surgical (especially total knee replacement) than non-surgical interventions. Conclusions KOOS demonstrates adequate content validity, internal consistency, test-retest reliability, construct validity and responsiveness for age- and condition-relevant subscales. Structural validity, cross-cultural validity and measurement error require further evaluation, as well as construct validity of KOOS-PS. Suggested order of subscales for different knee conditions can be applied in hierarchical testing of endpoints in clinical trials.
Scores on an outcome measurement instrument depend on the type and settings of the instrument used, how instructions are given to patients, how professionals administer and score the instrument, etc. ...The impact of all these sources of variation on scores can be assessed in studies on reliability and measurement error, if properly designed and analyzed. The aim of this study was to develop standards to assess the quality of studies on reliability and measurement error of clinician-reported outcome measurement instruments, performance-based outcome measurement instrument, and laboratory values.
We conducted a 3-round Delphi study involving 52 panelists.
Consensus was reached on how a comprehensive research question can be deduced from the design of a reliability study to determine how the results of a study inform us about the quality of the outcome measurement instrument at issue. Consensus was reached on components of outcome measurement instruments, i.e. the potential sources of variation. Next, we reached consensus on standards on design requirements (n = 5), standards on preferred statistical methods for reliability (n = 3) and measurement error (n = 2), and their ratings on a four-point scale. There was one term for a component and one rating of one standard on which no consensus was reached, and therefore required a decision by the steering committee.
We developed a tool that enables researchers with and without thorough knowledge on measurement properties to assess the quality of a study on reliability and measurement error of outcome measurement instruments.
Purpose The original COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist was developed to assess the methodological quality of single studies on ...measurement properties of Patient-Reported Outcome Measures (PROMs). Now it is our aim to adapt the COSMIN checklist and its four-point rating system into a version exclusively for use in systematic reviews of PROMs, aiming to assess risk of bias of studies on measurement properties. Methods For each standard (i.e., a design requirement or preferred statistical method), it was discussed within the COSMIN steering committee if and how it should be adapted. The adapted checklist was pilot-tested to strengthen content validity in a systematic review on the quality of PROMs for patients with hand osteoarthritis. Results Most important changes were the reordering of the measurement properties to be assessed in a systematic review of PROMs; the deletion of standards that concerned reporting issues and standards that not necessarily lead to biased results; the integration of standards on general requirements for studies on item response theory with standards for specific measurement properties; the recommendation to the review team to specify hypotheses for construct validity and responsiveness in advance, and subsequently the removal of the standards about formulating hypotheses; and the change in the labels of the four-point rating system. Conclusions The COSMIN Risk of Bias checklist was developed exclusively for use in systematic reviews of PROMs to distinguish this application from other purposes of assessing the methodological quality of studies on measurement properties, such as guidance for designing or reporting a study on the measurement properties.
Purpose Systematic reviews of patient-reported outcome measures (PROMs) differ from reviews of interventions and diagnostic test accuracy studies and are complex. In fact, conducting a review of one ...or more PROMs comprises of multiple reviews (i.e., one review for each measurement property of each PROM). In the absence of guidance specifically designed for reviews on measurement properties, our aim was to develop a guideline for conducting systematic reviews of PROMs. Methods Based on literature reviews and expert opinions, and in concordance with existing guidelines, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) steering committee developed a guideline for systematic reviews of PROMs. Results A consecutive ten-step procedure for conducting a systematic review of PROMs is proposed. Steps 1-4 concern preparing and performing the literature search, and selecting relevant studies. Steps 5-8 concern the evaluation of the quality of the eligible studies, the measurement properties, and the interpretability and feasibility aspects. Steps 9 and 10 concern formulating recommendations and reporting the systematic review. Conclusions The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties). This enables reviewers to draw transparent conclusions and making evidence-based recommendations on the quality of PROMs, and supports the evidence-based selection of PROMs for use in research and in clinical practice.
To investigate the reliability and validity of the SQUASH physical activity (PA) questionnaire in a multi-ethnic population living in the Netherlands.
We included participants from the HELIUS study, ...a population-based cohort study. In this study we included Dutch (n = 114), Turkish (n = 88), Moroccan (n = 74), South-Asian Surinamese (n = 98) and African Surinamese (n = 91) adults, aged 18-70 years. The SQUASH was self-administered twice to assess test-re-test reliability (mean interval 6-7 weeks) and participants wore an accelerometer and heart rate monitor (Actiheart) to enable assessment of construct validity.
We observed low test-re-test reliability; Intra class correlation coefficients ranged from low (0.05 for moderate/high intensity PA in African Surinamese women) to acceptable (0.78 for light intensity PA in Moroccan women). The discrepancy between self-reported and measured PA differed on the basis of the intensity of activity: self-reported light intensity PA was lower than measured but self-reported moderate/high intensity PA was higher than measured, with wide limits of agreement. The discrepancy between questionnaire and Actiheart measures of moderate intensity PA did not differ between ethnic minority and Dutch participants with correction for relevant confounders. Additionally, the SQUASH overestimated the number of participants meeting the Dutch PA norm; Cohen's kappas for the agreement were poor, the highest being 0.30 in Dutch women.
We found considerable variation in the test-re-test reliability and validity of self-reported PA with no consistency based on ethnic origin. Our findings imply that the SQUASH does not provide a valid basis for comparison of PA between ethnic groups.
Summary Objectives To systematically review the measurement properties of performance-based measures to assess physical function in people with hip and/or knee osteoarthritis (OA). Methods Electronic ...searches were performed in MEDLINE, CINAHL, Embase, and PsycINFO up to the end of June 2012. Two reviewers independently rated measurement properties using the consensus-based standards for the selection of health status measurement instrument (COSMIN). “Best evidence synthesis” was made using COSMIN outcomes and the quality of findings. Results Twenty-four out of 1792 publications were eligible for inclusion. Twenty-one performance-based measures were evaluated including 15 single-activity measures and six multi-activity measures. Measurement properties evaluated included internal consistency (three measures), reliability (16 measures), measurement error (14 measures), validity (nine measures), responsiveness (12 measures) and interpretability (three measures). A positive rating was given to only 16% of possible measurement ratings. Evidence for the majority of measurement properties of tests reported in the review has yet to be determined. On balance of the limited evidence, the 40 m self-paced test was the best rated walk test, the 30 s-chair stand test and timed up and go test were the best rated sit to stand tests, and the Stratford battery, Physical Activity Restrictions and Functional Assessment System were the best rated multi-activity measures. Conclusion Further good quality research investigating measurement properties of performance measures, including responsiveness and interpretability in people with hip and/or knee OA, is needed. Consensus on which combination of measures will best assess physical function in people with hip/and or knee OA is urgently required.
Background The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, ...with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. Methods The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Results Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Conclusions Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.
Choosing an adequate measurement instrument depends on the proposed use of the instrument, the concept to be measured, the measurement properties (e.g. internal consistency, reproducibility, content ...and construct validity, responsiveness, and interpretability), the requirements, the burden for subjects, and costs of the available instruments. As far as measurement properties are concerned, there are no sufficiently specific standards for the evaluation of measurement properties of instruments to measure health status, and also no explicit criteria for what constitutes good measurement properties. In this paper we describe the protocol for the COSMIN study, the objective of which is to develop a checklist that contains COnsensus-based Standards for the selection of health Measurement INstruments, including explicit criteria for satisfying these standards. We will focus on evaluative health related patient-reported outcomes (HR-PROs), i.e. patient-reported health measurement instruments used in a longitudinal design as an outcome measure, excluding health care related PROs, such as satisfaction with care or adherence. The COSMIN standards will be made available in the form of an easily applicable checklist.
An international Delphi study will be performed to reach consensus on which and how measurement properties should be assessed, and on criteria for good measurement properties. Two sources of input will be used for the Delphi study: (1) a systematic review of properties, standards and criteria of measurement properties found in systematic reviews of measurement instruments, and (2) an additional literature search of methodological articles presenting a comprehensive checklist of standards and criteria. The Delphi study will consist of four (written) Delphi rounds, with approximately 30 expert panel members with different backgrounds in clinical medicine, biostatistics, psychology, and epidemiology. The final checklist will subsequently be field-tested by assessing the inter-rater reproducibility of the checklist.
Since the study will mainly be anonymous, problems that are commonly encountered in face-to-face group meetings, such as the dominance of certain persons in the communication process, will be avoided. By performing a Delphi study and involving many experts, the likelihood that the checklist will have sufficient credibility to be accepted and implemented will increase.
Summary
Bioelectrical impedance analysis (BIA) is a practical method to estimate percentage body fat (%BF). In this systematic review, we aimed to assess validity, responsiveness, reliability and ...measurement error of BIA methods in estimating %BF in children and adolescents.We searched for relevant studies in Pubmed, Embase and Cochrane through November 2012. Two reviewers independently screened titles and s for inclusion, extracted data and rated methodological quality of the included studies. We performed a best evidence synthesis to synthesize the results, thereby excluding studies of poor quality. We included 50 published studies. Mean differences between BIA and reference methods (gold standard criterion validity and convergent measures of body composition convergent validity) were considerable and ranged from negative to positive values, resulting in conflicting evidence for criterion validity. We found strong evidence for a good reliability, i.e. (intra‐class) correlations ≥0.82. However, test‐retest mean differences ranged from 7.5% to 13.4% of total %BF in the included study samples, indicating considerable measurement error. Our systematic review suggests that BIA is a practical method to estimate %BF in children and adolescents. However, validity and measurement error are not satisfactory.