In many countries attention for fostering research integrity started with a misconduct case that got a lot of media exposure. But there is an emerging consensus that questionable research practices ...are more harmful due to their high prevalence. QRPs have in common that they can help to make study results more exciting, more positive and more statistically significant. That makes them tempting to engage in. Research institutions have the duty to empower their research staff to steer away from QRPs and to explain how they realize that in a Research Integrity Promotion Plan. Avoiding perverse incentives in assessing researchers for career advancement is an important element in that plan. Research institutions, funding agencies and journals should make their research integrity policies as evidence-based as possible. The dilemmas and distractions researchers face are real and universal. We owe it to society to collaborate and to do our utmost best to prevent QRPs and to foster research integrity.
Prevalence of research misconduct, questionable research practices (QRPs) and their associations with a range of explanatory factors has not been studied sufficiently among academic researchers. The ...National Survey on Research Integrity targeted all disciplinary fields and academic ranks in the Netherlands. It included questions about engagement in fabrication, falsification and 11 QRPs over the previous three years, and 12 explanatory factor scales. We ensured strict identity protection and used the randomized response method for questions on research misconduct. 6,813 respondents completed the survey. Prevalence of fabrication was 4.3% (95% CI: 2.9, 5.7) and of falsification 4.2% (95% CI: 2.8, 5.6). Prevalence of QRPs ranged from 0.6% (95% CI: 0.5, 0.9) to 17.5% (95% CI: 16.4, 18.7) with 51.3% (95% CI: 50.1, 52.5) of respondents engaging frequently in at least one QRP. Being a PhD candidate or junior researcher increased the odds of frequently engaging in at least one QRP, as did being male. Scientific norm subscription (odds ratio (OR) 0.79; 95% CI: 0.63, 1.00) and perceived likelihood of detection by reviewers (OR 0.62, 95% CI: 0.44, 0.88) were associated with engaging in less research misconduct. Publication pressure was associated with more often engaging in one or more QRPs frequently (OR 1.22, 95% CI: 1.14, 1.30). We found higher prevalence of misconduct than earlier surveys. Our results suggest that greater emphasis on scientific norm subscription, strengthening reviewers in their role as gatekeepers of research quality and curbing the "publish or perish" incentive system promotes research integrity.
...health care and public health may suffer....I shall explore what biomedical research can learn about fostering responsible research practices from other disciplines, social sciences first and ...foremost....the putative determinants of research misconduct and questionable research practices will be discussed, and the actions different stakeholders can take shall be explored....it started with the case of data fabrication by Diederik Stapel that deeply shocked both the general public and scientists.
The COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement INstruments) was developed in an international Delphi study to evaluate the methodological quality of ...studies on measurement properties of health-related patient reported outcomes (HR-PROs). In this paper, we explain our choices for the design requirements and preferred statistical methods for which no evidence is available in the literature or on which the Delphi panel members had substantial discussion.
The issues described in this paper are a reflection of the Delphi process in which 43 panel members participated.
The topics discussed are internal consistency (relevance for reflective and formative models, and distinction with unidimensionality), content validity (judging relevance and comprehensiveness), hypotheses testing as an aspect of construct validity (specificity of hypotheses), criterion validity (relevance for PROs), and responsiveness (concept and relation to validity, and (in) appropriate measures).
We expect that this paper will contribute to a better understanding of the rationale behind the items, thereby enhancing the acceptance and use of the COSMIN checklist.
Our objective was to develop an instrument to assess the methodological quality of systematic reviews, building upon previous tools, empirical evidence and expert consensus.
A 37-item assessment tool ...was formed by combining 1) the enhanced Overview Quality Assessment Questionnaire (OQAQ), 2) a checklist created by Sacks, and 3) three additional items recently judged to be of methodological importance. This tool was applied to 99 paper-based and 52 electronic systematic reviews. Exploratory factor analysis was used to identify underlying components. The results were considered by methodological experts using a nominal group technique aimed at item reduction and design of an assessment tool with face and content validity.
The factor analysis identified 11 components. From each component, one item was selected by the nominal group. The resulting instrument was judged to have face and content validity.
A measurement tool for the 'assessment of multiple systematic reviews' (AMSTAR) was developed. The tool consists of 11 items and has good face and content validity for measuring the methodological quality of systematic reviews. Additional studies are needed with a focus on the reproducibility and construct validity of AMSTAR, before strong recommendations can be made on its use.
Thousands of systematic reviews have been conducted in all areas of health care. However, the methodological quality of these reviews is variable and should routinely be appraised. AMSTAR is a ...measurement tool to assess systematic reviews.
AMSTAR was used to appraise 42 reviews focusing on therapies to treat gastro-esophageal reflux disease, peptic ulcer disease, and other acid-related diseases. Two assessors applied the AMSTAR to each review. Two other assessors, plus a clinician and/or methodologist applied a global assessment to each review independently.
The sample of 42 reviews covered a wide range of methodological quality. The overall scores on AMSTAR ranged from 0 to 10 (out of a maximum of 11) with a mean of 4.6 (95% CI: 3.7 to 5.6) and median 4.0 (range 2.0 to 6.0). The inter-observer agreement of the individual items ranged from moderate to almost perfect agreement. Nine items scored a kappa of >0.75 (95% CI: 0.55 to 0.96). The reliability of the total AMSTAR score was excellent: kappa 0.84 (95% CI: 0.67 to 1.00) and Pearson's R 0.96 (95% CI: 0.92 to 0.98). The overall scores for the global assessment ranged from 2 to 7 (out of a maximum score of 7) with a mean of 4.43 (95% CI: 3.6 to 5.3) and median 4.0 (range 2.25 to 5.75). The agreement was lower with a kappa of 0.63 (95% CI: 0.40 to 0.88). Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84). For the AMSTAR total score, the limits of agreement were -0.19+/-1.38. This translates to a minimum detectable difference between reviews of 0.64 'AMSTAR points'. Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.
Publications determine to a large extent the possibility to stay in academia ("publish or perish"). While some pressure to publish may incentivise high quality research, too much publication pressure ...is likely to have detrimental effects on both the scientific enterprise and on individual researchers. Our research question was: What is the level of perceived publication pressure in the four academic institutions in Amsterdam and does the pressure to publish differ between academic ranks and disciplinary fields? Investigating researchers in Amsterdam with the revised Publication Pressure Questionnaire, we find that a negative attitude towards the current publication climate is present across academic ranks and disciplinary fields. Postdocs and assistant professors (M = 3.42) perceive the greatest publication stress and PhD-students (M = 2.44) perceive a significant lack of resources to relieve publication stress. Results indicate the need for a healthier publication climate where the quality and integrity of research is rewarded.
Background The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, ...with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. Methods The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Results Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Conclusions Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.
Changes in scores on health status questionnaires are difficult to interpret. Several methods to determine minimally important changes (MICs) have been proposed which can broadly be divided in ...distribution-based and anchor-based methods. Comparisons of these methods have led to insight into essential differences between these approaches. Some authors have tried to come to a uniform measure for the MIC, such as 0.5 standard deviation and the value of one standard error of measurement (SEM). Others have emphasized the diversity of MIC values, depending on the type of anchor, the definition of minimal importance on the anchor, and characteristics of the disease under study. A closer look makes clear that some distribution-based methods have been merely focused on minimally detectable changes. For assessing minimally important changes, anchor-based methods are preferred, as they include a definition of what is minimally important. Acknowledging the distinction between minimally detectable and minimally important changes is useful, not only to avoid confusion among MIC methods, but also to gain information on two important benchmarks on the scale of a health status measurement instrument. Appreciating the distinction, it becomes possible to judge whether the minimally detectable change of a measurement instrument is sufficiently small to detect minimally important changes.
Abstract Objective Our purpose was to measure the agreement, reliability, construct validity, and feasibility of a measurement tool to assess systematic reviews (AMSTAR). Study Design and Setting We ...randomly selected 30 systematic reviews from a database. Each was assessed by two reviewers using: (1) the enhanced quality assessment questionnaire (Overview of Quality Assessment Questionnaire OQAQ); (2) Sacks' instrument; and (3) our newly developed measurement tool (AMSTAR). We report on reliability (interobserver kappas of the 11 AMSTAR items), intraclass correlation coefficients (ICCs) of the sum scores, construct validity (ICCs of the sum scores of AMSTAR compared with those of other instruments), and completion times. Results The interrater agreement of the individual items of AMSTAR was substantial with a mean kappa of 0.70 (95% confidence interval CI: 0.57, 0.83) (range: 0.38–1.0). Kappas recorded for the other instruments were 0.63 (95% CI: 0.38, 0.78) for enhanced OQAQ and 0.40 (95% CI: 0.29, 0.50) for the Sacks' instrument. The ICC of the total score for AMSTAR was 0.84 (95% CI: 0.65, 0.92) compared with 0.91 (95% CI: 0.82, 0.96) for OQAQ and 0.86 (95% CI: 0.71, 0.94) for the Sacks' instrument. AMSTAR proved easy to apply, each review taking about 15 minutes to complete. Conclusions AMSTAR has good agreement, reliability, construct validity, and feasibility. These findings need confirmation by a broader range of assessors and a more diverse range of reviews.