Scores on an outcome measurement instrument depend on the type and settings of the instrument used, how instructions are given to patients, how professionals administer and score the instrument, etc. ...The impact of all these sources of variation on scores can be assessed in studies on reliability and measurement error, if properly designed and analyzed. The aim of this study was to develop standards to assess the quality of studies on reliability and measurement error of clinician-reported outcome measurement instruments, performance-based outcome measurement instrument, and laboratory values.
We conducted a 3-round Delphi study involving 52 panelists.
Consensus was reached on how a comprehensive research question can be deduced from the design of a reliability study to determine how the results of a study inform us about the quality of the outcome measurement instrument at issue. Consensus was reached on components of outcome measurement instruments, i.e. the potential sources of variation. Next, we reached consensus on standards on design requirements (n = 5), standards on preferred statistical methods for reliability (n = 3) and measurement error (n = 2), and their ratings on a four-point scale. There was one term for a component and one rating of one standard on which no consensus was reached, and therefore required a decision by the steering committee.
We developed a tool that enables researchers with and without thorough knowledge on measurement properties to assess the quality of a study on reliability and measurement error of outcome measurement instruments.
Background Content validity is the most important measurement property of a patient-reported outcome measure (PROM) and the most challenging to assess. Our aims were to: (1) develop standards for ...evaluating the quality of PROM development; (2) update the original COSMIN standards for assessing the quality of content validity studies of PROMs; (3) develop criteria for what constitutes good content validity of PROMs, and (4) develop a rating system for summarizing the evidence on a PROM's content validity and grading the quality of the evidence in systematic reviews of PROMs. Methods An online 4-round Delphi study was performed among 159 experts from 21 countries. Panelists rated the degree to which they (dis)agreed to proposed standards, criteria, and rating issues on 5-point rating scales ('strongly disagree' to 'strongly agree'), and provided arguments for their ratings. Results Discussion focused on sample size requirements, recording and field notes, transcribing cognitive interviews, and data coding. After four rounds, the required 67% consensus was reached on all standards, criteria, and rating issues. After pilot-testing, the steering committee made some final changes. Ten criteria for good content validity were defined regarding item relevance, appropriateness of response options and recall period, comprehensiveness, and comprehensibility of the PROM. Discussion The consensus-based COSMIN methodology for content validity is more detailed, standardized, and transparent than earlier published guidelines, including the previous COSMIN standards. This methodology can contribute to the selection and use of high-quality PROMs in research and clinical practice.
Purpose The original COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist was developed to assess the methodological quality of single studies on ...measurement properties of Patient-Reported Outcome Measures (PROMs). Now it is our aim to adapt the COSMIN checklist and its four-point rating system into a version exclusively for use in systematic reviews of PROMs, aiming to assess risk of bias of studies on measurement properties. Methods For each standard (i.e., a design requirement or preferred statistical method), it was discussed within the COSMIN steering committee if and how it should be adapted. The adapted checklist was pilot-tested to strengthen content validity in a systematic review on the quality of PROMs for patients with hand osteoarthritis. Results Most important changes were the reordering of the measurement properties to be assessed in a systematic review of PROMs; the deletion of standards that concerned reporting issues and standards that not necessarily lead to biased results; the integration of standards on general requirements for studies on item response theory with standards for specific measurement properties; the recommendation to the review team to specify hypotheses for construct validity and responsiveness in advance, and subsequently the removal of the standards about formulating hypotheses; and the change in the labels of the four-point rating system. Conclusions The COSMIN Risk of Bias checklist was developed exclusively for use in systematic reviews of PROMs to distinguish this application from other purposes of assessing the methodological quality of studies on measurement properties, such as guidance for designing or reporting a study on the measurement properties.
Purpose Systematic reviews of patient-reported outcome measures (PROMs) differ from reviews of interventions and diagnostic test accuracy studies and are complex. In fact, conducting a review of one ...or more PROMs comprises of multiple reviews (i.e., one review for each measurement property of each PROM). In the absence of guidance specifically designed for reviews on measurement properties, our aim was to develop a guideline for conducting systematic reviews of PROMs. Methods Based on literature reviews and expert opinions, and in concordance with existing guidelines, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) steering committee developed a guideline for systematic reviews of PROMs. Results A consecutive ten-step procedure for conducting a systematic review of PROMs is proposed. Steps 1-4 concern preparing and performing the literature search, and selecting relevant studies. Steps 5-8 concern the evaluation of the quality of the eligible studies, the measurement properties, and the interpretability and feasibility aspects. Steps 9 and 10 concern formulating recommendations and reporting the systematic review. Conclusions The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties). This enables reviewers to draw transparent conclusions and making evidence-based recommendations on the quality of PROMs, and supports the evidence-based selection of PROMs for use in research and in clinical practice.
Choosing an adequate measurement instrument depends on the proposed use of the instrument, the concept to be measured, the measurement properties (e.g. internal consistency, reproducibility, content ...and construct validity, responsiveness, and interpretability), the requirements, the burden for subjects, and costs of the available instruments. As far as measurement properties are concerned, there are no sufficiently specific standards for the evaluation of measurement properties of instruments to measure health status, and also no explicit criteria for what constitutes good measurement properties. In this paper we describe the protocol for the COSMIN study, the objective of which is to develop a checklist that contains COnsensus-based Standards for the selection of health Measurement INstruments, including explicit criteria for satisfying these standards. We will focus on evaluative health related patient-reported outcomes (HR-PROs), i.e. patient-reported health measurement instruments used in a longitudinal design as an outcome measure, excluding health care related PROs, such as satisfaction with care or adherence. The COSMIN standards will be made available in the form of an easily applicable checklist.
An international Delphi study will be performed to reach consensus on which and how measurement properties should be assessed, and on criteria for good measurement properties. Two sources of input will be used for the Delphi study: (1) a systematic review of properties, standards and criteria of measurement properties found in systematic reviews of measurement instruments, and (2) an additional literature search of methodological articles presenting a comprehensive checklist of standards and criteria. The Delphi study will consist of four (written) Delphi rounds, with approximately 30 expert panel members with different backgrounds in clinical medicine, biostatistics, psychology, and epidemiology. The final checklist will subsequently be field-tested by assessing the inter-rater reproducibility of the checklist.
Since the study will mainly be anonymous, problems that are commonly encountered in face-to-face group meetings, such as the dominance of certain persons in the communication process, will be avoided. By performing a Delphi study and involving many experts, the likelihood that the checklist will have sufficient credibility to be accepted and implemented will increase.
Background The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, ...with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. Methods The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Results Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Conclusions Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.
Research on depression stigma is needed to gain more insight into the underlying construct and to reduce the level of stigma in the community. However, few validated measurements of depression stigma ...are available in the Netherlands. Therefore, this study first sought to examine the psychometric properties of the Dutch translation of the Depression Stigma Scale (DSS). Second, we examined which demographic (gender, age, education, partner status) and other variables (anxiety and knowledge of depression) are associated with personal and perceived stigma within these samples.
The study population consisted of an adult convenience sample (n = 253) (study 1) and a community adult sample with elevated depressive symptoms (n = 264) (study 2). Factor structure, internal consistency, and validity were assessed. The associations between stigma, demographic variables and anxiety level were examined with regression analyses.
Confirmatory factor analysis supported the validity and internal consistency of the DSS personal stigma scale. Internal consistency was sufficient (Cronbach's alpha = .70 (study 1) and .77 (study 2)). The results regarding the perceived stigma scale revealed no clear factor structure. Regression analyses showed that personal stigma was higher in younger people, those with no experience with depression, and those with lower education.
This study established the validity and internal consistency of the DSS personal scale in the Netherlands, in a community sample and in people with elevated depressive symptoms. However, additional research is needed to examine the factor structure of the DSS perceived scale and its use in other samples.
Purpose
The Patient and Observer Scar Assessment Scale (POSAS) is widely used for measurements of
scar quality
. This encompasses visual, tactile and sensory characteristics of the scar. The Patient ...Scale of previous POSAS versions was lacking input from patients. Therefore, the aim of this study was to develop the POSAS3.0, Patient Scale with involvement of adults patients with all scar types, complying with the highest clinimetric standards.
Methods
From February 2018 to April 2019, a series of six focus group interviews were performed in the Netherlands and Australia to identify
scar quality
characteristics that adults with scars consider to be important. All focus groups were transcribed, anonymized and analysed using a thematic analysis. Relevant characteristics were formulated into items, resulting in a Dutch and English version of the Patient Scale. These drafts were pilot tested in Australia, the Netherlands and the United Kingdom, and refined accordingly.
Results
A total of 21 relevant
scar quality
characteristics were identified during the focus groups. Two distinct versions of the POSAS3.0, Patient Scale were developed. The
Generic version
contains 16 items and can be used for all scar types, except linear scars. The
Linear Scar version
of the Patient Scale contains the same 16 items, with an extra item referring to the widening of scar margins. All included items are rated on a verbal rating scale with five response options.
Conclusion
Two versions of the POSAS3.0 Patient Scale were developed. Further field tests are being performed to establish the measurement properties and scoring algorithm of the scales.
To clarify and elaborate on the choices that were made in the development of the Patient Scale of the Patient and Observer Scar Assessment Scale 3.0 (POSAS 3.0), based upon the rich information ...obtained from patients during focus groups and pilot tests.
The discussions described in this paper are a reflection of the focus group study and pilot tests that were conducted in order to develop the Patient Scale of the POSAS3.0. The focus groups took place in the Netherlands and Australia and included 45 participants. Pilot tests were performed with 15 participants in Australia, the Netherlands, and the United Kingdom.
We discussed the selection, wording and merging of 17 included items. Additionally, the reason for exclusion of 23 characteristics are given.
Based upon the unique and rich material of patient input obtained, two versions of the Patient Scale of the POSAS3.0 were developed: the Generic version, and the Linear scar version. The discussions and decisions taken during the development are informative for a good understanding of the POSAS 3.0 and are indispensable as a background for future translations and cross-cultural adaptations.
•The Patient Scale was meant to be concise, readily assessable, quick, and user-friendly instrument assessing the key attributes of scar quality.•Two versions have been developed, the Generic version (total of 16 items), and the Linear scar version (total of 17 items).•Eight of the identified characteristics were merged into three included items.•Twenty-three characteristics were excluded during different phases of the development process for multiple reasons.
Summary
Background
The OVAMA (Outcome Measures for Vascular Malformations) project determined quality of life (QoL) as a core outcome domain for patients with vascular malformations. In order to ...measure how current therapeutic strategies alter QoL in these patients, a patient‐reported outcome measurement (PROM) responsive to changes in QoL is required.
Objectives
To assess the responsiveness of two widely used generic QoL PROMs, the Medical Outcomes Study Short Form 36 (SF‐36) and Skindex‐29, in adult patients with vascular malformations.
Methods
In an international multicentre prospective study, treated and untreated patients completed the SF‐36 and Skindex‐29 at baseline and after a follow‐up period of 6–8 weeks. Global rating of change (GRC) scales assessing various QoL‐related outcome domains were additionally completed. Per subscale, responsiveness was assessed using two methods: by testing hypotheses on expected correlation strength between change scores of the questionnaires and the GRC scales, and by calculating the area under the receiver operating characteristics curve (AUC). The questionnaires were considered responsive if ≥ 75% of the hypotheses were confirmed or if the AUC was ≥ 0·7.
Results
Eighty‐nine participants were recruited in three centres in the Netherlands and the U.S.A., of whom 67 completed all baseline and follow‐up questionnaires. For all subscales of the SF‐36 and Skindex‐29, < 75% of the hypotheses were confirmed and the AUC was < 0·7.
Conclusions
Our findings suggest that the SF‐36 and Skindex‐29 seemed unresponsive to change in QoL. This suggests that alternative PROMs are needed to measure – and ultimately improve – QoL in patients with vascular malformations.
What's already known about this topic?
Quality of life is often impaired in patients with vascular malformations.
Quality of life is considered a core outcome domain for evaluating treatment of vascular malformations.
To measure the effect of treatment on quality of life, a patient‐reported outcome measure is required that is responsive to changes in quality of life.
What does this study add?
This is the first study assessing the responsiveness of quality‐of‐life measures in patients with vascular malformations.
The results seem to indicate that the Medical Outcomes Study Short Form 36 (SF‐36) and Skindex‐29 are not responsive to changes in quality of life in patients with vascular malformations.
What are the clinical implications of this work?
Medical Outcomes Study Short Form 36 (SF‐36) and Skindex‐29 are not ideal to assess the effect on quality of life over time, of treatment strategies for peripheral vascular malformations.
Plain language summary available online
Respond to this article