Scores on an outcome measurement instrument depend on the type and settings of the instrument used, how instructions are given to patients, how professionals administer and score the instrument, etc. ...The impact of all these sources of variation on scores can be assessed in studies on reliability and measurement error, if properly designed and analyzed. The aim of this study was to develop standards to assess the quality of studies on reliability and measurement error of clinician-reported outcome measurement instruments, performance-based outcome measurement instrument, and laboratory values.
We conducted a 3-round Delphi study involving 52 panelists.
Consensus was reached on how a comprehensive research question can be deduced from the design of a reliability study to determine how the results of a study inform us about the quality of the outcome measurement instrument at issue. Consensus was reached on components of outcome measurement instruments, i.e. the potential sources of variation. Next, we reached consensus on standards on design requirements (n = 5), standards on preferred statistical methods for reliability (n = 3) and measurement error (n = 2), and their ratings on a four-point scale. There was one term for a component and one rating of one standard on which no consensus was reached, and therefore required a decision by the steering committee.
We developed a tool that enables researchers with and without thorough knowledge on measurement properties to assess the quality of a study on reliability and measurement error of outcome measurement instruments.
Background Content validity is the most important measurement property of a patient-reported outcome measure (PROM) and the most challenging to assess. Our aims were to: (1) develop standards for ...evaluating the quality of PROM development; (2) update the original COSMIN standards for assessing the quality of content validity studies of PROMs; (3) develop criteria for what constitutes good content validity of PROMs, and (4) develop a rating system for summarizing the evidence on a PROM's content validity and grading the quality of the evidence in systematic reviews of PROMs. Methods An online 4-round Delphi study was performed among 159 experts from 21 countries. Panelists rated the degree to which they (dis)agreed to proposed standards, criteria, and rating issues on 5-point rating scales ('strongly disagree' to 'strongly agree'), and provided arguments for their ratings. Results Discussion focused on sample size requirements, recording and field notes, transcribing cognitive interviews, and data coding. After four rounds, the required 67% consensus was reached on all standards, criteria, and rating issues. After pilot-testing, the steering committee made some final changes. Ten criteria for good content validity were defined regarding item relevance, appropriateness of response options and recall period, comprehensiveness, and comprehensibility of the PROM. Discussion The consensus-based COSMIN methodology for content validity is more detailed, standardized, and transparent than earlier published guidelines, including the previous COSMIN standards. This methodology can contribute to the selection and use of high-quality PROMs in research and clinical practice.
Purpose The original COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist was developed to assess the methodological quality of single studies on ...measurement properties of Patient-Reported Outcome Measures (PROMs). Now it is our aim to adapt the COSMIN checklist and its four-point rating system into a version exclusively for use in systematic reviews of PROMs, aiming to assess risk of bias of studies on measurement properties. Methods For each standard (i.e., a design requirement or preferred statistical method), it was discussed within the COSMIN steering committee if and how it should be adapted. The adapted checklist was pilot-tested to strengthen content validity in a systematic review on the quality of PROMs for patients with hand osteoarthritis. Results Most important changes were the reordering of the measurement properties to be assessed in a systematic review of PROMs; the deletion of standards that concerned reporting issues and standards that not necessarily lead to biased results; the integration of standards on general requirements for studies on item response theory with standards for specific measurement properties; the recommendation to the review team to specify hypotheses for construct validity and responsiveness in advance, and subsequently the removal of the standards about formulating hypotheses; and the change in the labels of the four-point rating system. Conclusions The COSMIN Risk of Bias checklist was developed exclusively for use in systematic reviews of PROMs to distinguish this application from other purposes of assessing the methodological quality of studies on measurement properties, such as guidance for designing or reporting a study on the measurement properties.
Low-field MRI scanners are significantly less expensive than their high-field counterparts, which gives them the potential to make MRI technology more accessible all around the world. In general, ...images acquired using low-field MRI scanners tend to be of a relatively low resolution, as signal-to-noise ratios are lower. The aim of this work is to improve the resolution of these images. To this end, we present a deep learning-based approach to transform low-resolution low-field MR images into high-resolution ones. A convolutional neural network was trained to carry out single image super-resolution reconstruction using pairs of noisy low-resolution images and their noise-free high-resolution counterparts, which were obtained from the publicly available NYU fastMRI database. This network was subsequently applied to noisy images acquired using a low-field MRI scanner. The trained convolutional network yielded sharp super-resolution images in which most of the high-frequency components were recovered. In conclusion, we showed that a deep learning-based approach has great potential when it comes to increasing the resolution of low-field MR images.
Purpose Systematic reviews of patient-reported outcome measures (PROMs) differ from reviews of interventions and diagnostic test accuracy studies and are complex. In fact, conducting a review of one ...or more PROMs comprises of multiple reviews (i.e., one review for each measurement property of each PROM). In the absence of guidance specifically designed for reviews on measurement properties, our aim was to develop a guideline for conducting systematic reviews of PROMs. Methods Based on literature reviews and expert opinions, and in concordance with existing guidelines, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) steering committee developed a guideline for systematic reviews of PROMs. Results A consecutive ten-step procedure for conducting a systematic review of PROMs is proposed. Steps 1-4 concern preparing and performing the literature search, and selecting relevant studies. Steps 5-8 concern the evaluation of the quality of the eligible studies, the measurement properties, and the interpretability and feasibility aspects. Steps 9 and 10 concern formulating recommendations and reporting the systematic review. Conclusions The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties). This enables reviewers to draw transparent conclusions and making evidence-based recommendations on the quality of PROMs, and supports the evidence-based selection of PROMs for use in research and in clinical practice.
Patients with rheumatoid arthritis (RA) have an increased cardiovascular risk, but the magnitude of this risk is not known precisely. A study was undertaken to investigate the associations between RA ...and type 2 diabetes (DM2), a well-established cardiovascular risk factor, on the one hand, and cardiovascular disease (CVD) on the other.
The prevalence of CVD (coronary, cerebral and peripheral arterial disease) was determined in 353 randomly selected outpatients with RA (diagnosed between 1989 and 2001, aged 50-75 years; the CARRE study) and in participants of a population-based cohort study on diabetes and CVD (the Hoorn study). Patients with RA with normal fasting glucose levels from the CARRE study (RA, n = 294) were compared with individuals from the Hoorn study with normal glucose metabolism (non-diabetic, n = 258) and individuals with DM2 (DM2, n = 194).
The prevalence of CVD was 5.0% (95% CI 2.3% to 7.7%) in the non-diabetic group, 12.4% (95% CI 7.5% to 17.3%) in the DM2 group and 12.9% (95% CI 8.8% to 17.0%) in those with RA. With non-diabetic individuals as the reference category, the age- and gender-adjusted prevalence odds ratio (OR) for CVD was 2.3 (95% CI 1.1 to 4.7) for individuals with DM2 and 3.1 (95% CI 1.6 to 6.1) for those with RA. There was an attenuation of the prevalences after adjustment for conventional cardiovascular risk factors (OR 2.0 (95% CI 0.9 to 4.5) and 2.7 (95% CI 1.2 to 5.9), respectively).
The prevalence of CVD in RA is increased to an extent that is at least comparable to that of DM2. This should have implications for primary cardiovascular prevention strategies in RA.
The Motor Activity Log (MAL) is a semistructured interview for hemiparetic stroke patients to assess the use of their paretic arm and hand (amount of use AOU) and quality of movement QOM) during ...activities of daily living. Scores range from 0 to 5. The following clinimetric properties of the MAL were quantified: internal consistency (Cronbach alpha), test-retest agreement (Bland and Altman method), cross-sectional construct validity (correlation between AOU and QOM and with the Action Research Arm ARA test), longitudinal construct validity (correlation of change on the MAL during the intervention with a global change rating GCR and with change on the ARA), and responsiveness (effect size).
Two baseline measurements 2 weeks apart and 1 follow-up measurement immediately after 2 weeks of intensive exercise therapy either with or without immobilization of the unimpaired arm (forced use) were performed in 56 chronic stroke patients.
Internal consistency was high (AOU: alpha=0.88; QOM: alpha=0.91). The limits of agreement were -0.70 to 0.85 and -0.61 to 0.71 for AOU and QOM, respectively. The correlation with the ARA score (Spearman rho) was 0.63 (AOU and QOM). However, the improvement on the MAL during the intervention was only weakly related to the GCR and to the improvement on the ARA, Spearman rho was between 0.16 and 0.22. The responsiveness ratio was 1.9 (AOU) and 2.0 (QOM).
The MAL is internally consistent and relatively stable in chronic stroke patients not undergoing an intervention. The cross-sectional construct validity of the MAL is reasonable, but the results raise doubt about its longitudinal construct validity.
Robot-assisted laparoscopic staging (RALS) is increasingly used for staging epithelial ovarian cancer (EOC). Evidence of its safety is limited. The aim of this review is to compare the efficacy and ...safety of RALS in clinical early-stage EOC to conventional laparoscopy and laparotomy and to assess the level of evidence that is currently available to adopt this surgical technique.
Only retrospective studies comparing staging by minimally invasive surgery (MIS) to laparotomy are available. Both RALS and conventional laparoscopic staging shorten length of hospital stay (LHS, mean -2.9 days) and decrease estimated blood loss (EBL, mean -79 ml less) compared to laparotomy. Complication rates and number of lymph nodes collected are similar in all surgical staging techniques. Survival outcomes after staging by MIS cannot be compared to staging by laparotomy because of the lack of evidence but RALS is probably noninferior to conventional laparoscopic staging.
RALS probably improves perioperative outcomes in patients with clinical early stage EOC similar to conventional laparoscopic staging. Whether oncologic outcomes of RALS are comparable to open and conventional approaches is uncertain as there is only level C evidence and randomized controlled trials are urgently needed to confirm the current retrospective findings.
Choosing an adequate measurement instrument depends on the proposed use of the instrument, the concept to be measured, the measurement properties (e.g. internal consistency, reproducibility, content ...and construct validity, responsiveness, and interpretability), the requirements, the burden for subjects, and costs of the available instruments. As far as measurement properties are concerned, there are no sufficiently specific standards for the evaluation of measurement properties of instruments to measure health status, and also no explicit criteria for what constitutes good measurement properties. In this paper we describe the protocol for the COSMIN study, the objective of which is to develop a checklist that contains COnsensus-based Standards for the selection of health Measurement INstruments, including explicit criteria for satisfying these standards. We will focus on evaluative health related patient-reported outcomes (HR-PROs), i.e. patient-reported health measurement instruments used in a longitudinal design as an outcome measure, excluding health care related PROs, such as satisfaction with care or adherence. The COSMIN standards will be made available in the form of an easily applicable checklist.
An international Delphi study will be performed to reach consensus on which and how measurement properties should be assessed, and on criteria for good measurement properties. Two sources of input will be used for the Delphi study: (1) a systematic review of properties, standards and criteria of measurement properties found in systematic reviews of measurement instruments, and (2) an additional literature search of methodological articles presenting a comprehensive checklist of standards and criteria. The Delphi study will consist of four (written) Delphi rounds, with approximately 30 expert panel members with different backgrounds in clinical medicine, biostatistics, psychology, and epidemiology. The final checklist will subsequently be field-tested by assessing the inter-rater reproducibility of the checklist.
Since the study will mainly be anonymous, problems that are commonly encountered in face-to-face group meetings, such as the dominance of certain persons in the communication process, will be avoided. By performing a Delphi study and involving many experts, the likelihood that the checklist will have sufficient credibility to be accepted and implemented will increase.
Objective: To identify all available shoulder disability questionnaires designed to measure physical functioning and to evaluate evidence for the clinimetric quality of these instruments. Methods: ...Systematic literature searches were performed to identify self administered shoulder disability questionnaires. A checklist was developed to evaluate and compare the clinimetric quality of the instruments. Results: Two reviewers identified and evaluated 16 questionnaires by our checklist. Most studies were found for the Disability of the Arm, Shoulder, and Hand scale (DASH), the Shoulder Pain and Disability Index (SPADI), and the American Shoulder and Elbow Surgeons Standardised Shoulder Assessment Form (ASES). None of the questionnaires demonstrated satisfactory results for all properties. Most questionnaires claim to measure several domains (for example, pain, physical, emotional, and social functioning), yet dimensionality was studied in only three instruments. The internal consistency was calculated for seven questionnaires and only one received an adequate rating. Twelve questionnaires received positive ratings for construct validity, although depending on the population studied, four of these questionnaires received poor ratings too. Seven questionnaires were shown to have adequate test-retest reliability (ICC >0.70), but five questionnaires were tested inadequately. In most clinimetric studies only small sample sizes (n<43) were used. Nearly all publications lacked information on the interpretation of scores. Conclusion: The DASH, SPADI, and ASES have been studied most extensively, and yet even published validation studies of these instruments have limitations in study design, sample sizes, or evidence for dimensionality. Overall, the DASH received the best ratings for its clinimetric properties.