The contingent valuation (CV) method is used to estimate the willingness to pay (WTP) for services and products to inform cost benefit analyses (CBA). A long-standing criticism that stated WTP ...estimates may be poor indicators of actual WTP, calls into question their validity and the use of such estimates for welfare evaluation, especially in the health sector. Available evidence on the validity of CV studies so far is inconclusive. We systematically reviewed the literature to (1) synthesize the evidence on the criterion validity of WTP/willingness to accept (WTA), (2) undertake a meta-analysis, pooling evidence on the extent of variation between stated and actual WTP values and, (3) explore the reasons for the variation.
Eight electronic databases were searched, along with citations and reference reviews. 50 papers detailing 159 comparisons were identified and reviewed using a standard proforma. Two reviewers each were involved in the paper selection, review and data extraction. Meta-analysis was conducted using random effects models for ratios of means and percentage differences separately. Meta-bias was investigated using funnel plots.
Hypothetical WTP was on average 3.2 times greater than actual WTP, with a range of 0.7–11.8 and 5.7 (0.0–13.6) for ratios of means and percentage differences respectively. However, key methodological differences between surveys of hypothetical and actual values were found. In the meta-analysis, high levels of heterogeneity existed. The overall effect size for mean summaries was 1.79 (1.56–2.04) and 2.37 (1.93–2.80) for percent summaries. Regression analyses identified mixed results on the influence of the different experimental protocols on the variation between stated and actual WTP values. Results indicating publication bias did not account for differences in study design.
The evidence on the criterion validity for CV studies is more mixed than authors are representing because substantial differences in study design between hypothetical and actual WTP/WTA surveys are not accounted for.
•The debate on the criterion validity of CV-WTP is a subject of ongoing concern.•The majority of published papers confirm the presence of hypothetical bias.•The assessment of the drivers of criterion validity is largely exploratory.•Estimates reported variedly, limiting analyses and clouds clarity of comparison.•The evidence on the criterion validity of CV-WTP is more mixed than is reported.
Theories about the emotion elicitation process have been proposed as a scoring rationale for tests of emotional understanding (EU) – a subcomponent of emotional intelligence (EI). Theory-based ...scoring represents a considerable improvement over approaches that rely on rather subjective group judgements. The aim of this article is twofold: Firstly, we discuss an important limitation of appraisal theories for scoring EU tests. We argue that theory-based scoring is only unambiguous if the cognitive appraisals of the target persons are presented in the situational descriptions. Secondly, we provide a theory-based situational judgement test of EU, the Theory-Based Test of Emotional Understanding (TBEU), which takes this limitation into account. In a study of N = 200 we present initial validity evidence of this new measure with regard to its intended one-dimensional structure, its relations to classical intelligence (in terms of convergent validity evidence), and the Big Five personality traits (in terms of discriminant validity evidence) at the level of latent variables. Overall, the results support the usefulness of emotion theories for the assessment of EU.
•Development of a situational judgement test of emotional understanding•Stronger theoretical foundation of scoring keys•Latent variable correlations with intelligence and the Big-Five personality traits
Despite the wide use of the Strengths and Difficulties Questionnaire (SDQ) to assess adolescent mental health, its psychometric functionality is still under debate. This study investigated the ...structural validity and reliability of the SDQ scores, and the resemblance of the SDQ sum scores and factor scores. Factor one-dimensionality and competing multifactor structures were tested against data. With the best acceptable models, measurement invariance was tested between genders and over time. Subscale reliability and correspondence between subscale sum scores and factor scores were estimated. The nationally representative self-report data from 23,980 Finnish early (12-13 years) and mid- (15-16 years) adolescents (50.4% girls) were collected from two cohorts in 2008 and 2013. The results showed that among early adolescents, the revised SDQ with a controlled method effect had an excellent fit. In contrast, none of the tested models had an acceptable fit among the mid-adolescents. Among early adolescents, strong measurement invariance was achieved between genders and over time. Three of the five subscales were one-dimensional, and all subscales had low reliability. The resemblance between the subscale sum scores and factor scores was alarmingly low. Researchers should be cautious when using the SDQ Total Difficulties sum score or the subscale scores as they may be substantially biased, and practitioners should desist from using the SDQ as a screening tool in its current form. This study strongly supports the revision of the SDQ. In line with the previous findings, we suggest rewording the worst functioning items and revising the reverse-worded difficulties items.
Public Significance Statement
The self-reported SDQ contains method effects which can and should be controlled when the SDQ is used in research, and more research is needed to guarantee the reliable use of the SDQ sum scores for assessing adolescent mental health, because the sum scores in their current form may be substantially biased.
Nostalgia is a mixed emotion. Recent empirical research, however, has highlighted positive effects of nostalgia, suggesting it is a predominantly positive emotion. When measured as an individual ...difference, nostalgia-prone individuals report greater meaning in life and approach temperament. When manipulated in an experimental paradigm, nostalgia increases meaning in life, self-esteem, optimism, and positive affect. These positive effects may result from the specific experimental procedures used and little is known about daily experiences that covary with nostalgia. To address this gap, we aimed to measure nostalgia in ecologically valid contexts. We created and validated the Personal Inventory of Nostalgic Experiences (PINE) scale (Studies 1a-1d) to assess both trait and state-based nostalgic experiences. When measured as an individual difference, the nomological net was generally negative (Study 2). When measured in daily life (Studies 3 and 4), nostalgia as a state variable was negatively related to well-being. Lagged analyses showed that state nostalgia had mixed effects on well-being at a later moment that day and negative effects on well-being on the following day. To reconcile the discrepancies between these studies and the positive effects of nostalgia from previous research, we showed that experimentally induced nostalgic recollections were rated more positively and less negatively than daily experiences of nostalgia (Study 5). These studies show that nostalgia is a mixed emotion; although it may be predominantly positive when nostalgic memories are generated on request, it seems predominantly negative when nostalgia is experienced in the course of everyday life.
Expanding eligibility for Medicaid was a central goal of the Affordable Care Act (ACA), which continues to be debated and discussed at the state and federal levels as further reforms are considered. ...In an effort to provide a synthesis of the available research, we systematically reviewed the peer-reviewed scientific literature on the effects of Medicaid expansion on the original goals of the ACA. After analyzing seventy-seven published studies, we found that expansion was associated with increases in coverage, service use, quality of care, and Medicaid spending. Furthermore, very few studies reported that Medicaid expansion was associated with negative consequences, such as increased wait times for appointments-and those studies tended to use study designs not suited for determining cause and effect. Thus, there is evidence to document improvements in several areas of health care delivery following the ACA Medicaid expansion. We outline areas for future research that can further reduce current knowledge gaps.
Aims and objectives
To assess the concurrent validity between logbooks and a single‐item rehabilitation adherence measurement for patients with stroke. Agreement between caregivers and patients and ...between caregivers and physical therapists regarding a single‐item measurement was investigated, and its predictive validity was explored.
Background
Adherence to therapy is a primary determinant of treatment success. There are no standard instruments for measuring rehabilitation adherence available for stroke patients.
Design
Prospective longitudinal study.
Methods
Seventy‐five patients with stroke were recruited, measured four times and followed for 6 months. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist was used to ensure comprehensive reporting. Adherence was documented in logbooks, and single‐item measurements were compared. Predictive validity was explored by assessing associations between adherence levels, self‐care ability and health‐related quality of life. The Spearman's correlation coefficients, weighted kappa, and generalised estimating equations statistics were used to explore the concurrent validity, measurement agreement, and predictive validity, respectively.
Results
Logbook records had a fair correlation (rs = .23, p = .04) with the single‐item rehabilitation adherence measurements. There was moderate agreement (kappa = 0.42, p < .001) between caregiver and patient assessments and fair agreement (kappa = 0.29, p = .017) between caregiver and physical therapist assessments of patients' rehabilitation adherence levels. Perfect rehabilitation adherence, based on the logbook and single‐item measurements, predicted better scores for self‐care ability and quality of life than imperfect rehabilitation adherence during 6 months after inclusion.
Conclusions
There was fair concurrent validity between logbooks and single‐item rehabilitation adherence measurements and moderate and fair adherence measure agreement between caregivers and patients and caregivers and physical therapists, respectively. Logbooks and single‐item rehabilitation adherence measurements had adequate predictive validity.
Relevance to clinical practice
Single‐item rehabilitation adherence measurement is a workable and straightforward method to assess stroke patients' rehabilitation adherence in busy clinical care settings. Caregivers can represent stroke patients regarding their reported rehabilitation adherence.
Patient or public contribution
Patients were diagnosed with stroke in the study hospital. Rehabilitation physicians transferred patients to a research nurse who then screened them for the inclusion criteria and invited them and their family caregivers to participate in this study if they met the requirements. We also recruited seven physical therapists responsible for the physical therapy of the study participants. After participants signed informed consent, the research nurse encouraged participants to respond to research questions face to face, including rehabilitation adherence data, daily physical function, and quality of life. Each participant was measured four times at baseline and at 1, 3, and 6 months after inclusion in this study. Physical therapists had to score their patients' rehabilitation adherence levels before discharge.
Trial registration details
Not applicable.
The authors evaluated the reliability and validity of a set of 7 behavioral decision-making tasks, measuring different aspects of the decision-making process. The tasks were administered to ...individuals from diverse populations. Participants showed relatively consistent performance within and across the 7 tasks, which were then aggregated into an Adult Decision-Making Competence (A-DMC) index that showed good reliability. The validity of the 7 tasks and of overall A-DMC emerges in significant relationships with measures of socioeconomic status, cognitive ability, and decision-making styles. Participants who performed better on the A-DMC were less likely to report negative life events indicative of poor decision making, as measured by the Decision Outcomes Inventory. Significant predictive validity remains when controlling for demographic measures, measures of cognitive ability, and constructive decision-making styles. Thus, A-DMC appears to be a distinct construct relevant to adults' real-world decisions.
The Static-99, Static-99R, and STABLE-2007 are internationally well-established instruments for predicting static and dynamic risks of sexual recidivism in individuals convicted of sexual offenses. ...Previous meta-analyses assessed their predictive and incremental validity, but none has yet compared the two Static versions and the Static-STABLE combinations. Here, we implemented diagnostic test accuracy network meta-analysis (DTA-NMA) to compare all tests and identify optimal cutoffs in one comprehensive analysis. The DTA-NMA included 32 samples comprising 45,224 adult male individuals. More information was available on the Static-99 (22 samples; 34,316 individuals) and the Static-99R (13 samples; 27,243 individuals), compared to the Static-99/STABLE-2007 (three samples; 762 individuals), the Static-99R/STABLE-2007 (two samples; 2,972 individuals), and the STABLE-2007 (three samples; 816 individuals). The primary outcome was the area under the receiver operating characteristic curve (AUC). The secondary outcomes were sensitivity and specificity. Optimal cutoffs were determined using the Youden index. The AUC suggested moderate predictive validity for Static-99 and Static-99R, whereas STABLE-2007 had no predictive value. The optimal cutoff of Static-99R was suggested to have higher specificity than that of Static-99, whereas sensitivity was comparable between instruments. The notion of incremental validity for STABLE-2007 could not be confirmed. This work represents the first meta-analysis to compare Static-99, Static-99R, STABLE-2007, and their combinations in one analysis. Static-99R demonstrated the highest specificity in predicting the risk of sexual recidivism, indicating a potential advantage in detecting true nonrecidivists. The findings are discussed, considering the current recommendations for assessing the risk of sexual recidivism in the criminal justice system.
Public Significance Statement
This meta-analysis suggests an advantage of the Static-99R over the Static-99 in predictive validity and no incremental validity of the STABLE-2007 in assessing the risk of sexual recidivism in adult male individuals convicted of sexual offenses.
A child's relationship with his or her nonresident father has been found to be related to that child's development in important ways. However, validated measures of the relationship between ...nonresident fathers and their children are rare, particularly for low-income nonresident fathers. To provide guidance for researchers and practitioners evaluating nonresident fatherhood programs, this study uses a sample of 420 primarily low-income nonresident fathers to examine the reliability, convergent validity, and predictive validity of measures of father-child closeness and conflict contained in the Child-Parent Relationship Scale-Short Form (CPRS-SF). Validity was examined across 3 child age groups: preschool, middle childhood, and adolescence. The CPRS-SF closeness scale demonstrated measurement equivalence across time (conflict did not) and had excellent reliability and validity. Compared to the closeness scale, the CPRS-SF conflict scale was related to fewer validity items but still showed both convergent and predictive validity, including predicting child behavior problems (which the closeness scale did not). Both the closeness and conflict scales are recommended for use with low-income nonresident fathers. Age differences in validity findings are discussed.
Background: Recently new software to quantify muscle mass with ultrasonography has been developed. Although this software has been validated in healthy adults, athletes and critically ill patients, ...this has not been the case in a population with obesity. The aim of this study was to validate the measurement of lean mass with ultrasound (US) by comparing it to the reference of DXA measurements in a population with class 2/3 obesity. Methods: This prospective cross-sectional study consisted of participants between 18 and 65 years of age with a BMI above 35 kg/m2. US and DXA measurements were performed during the same visit. The US equation was based on a seven-point body measurement and carried out by a trained researcher. The body composition DXA scan was performed by trained technicians. The validity was examined by a Bland-Altman plot to assess proportional bias, and Intraclass Correlation Coefficient (ICC) to estimate the correlation between the two methods with absolute agreement in a two-way mixed model. Results: The population consisted of 60 participants, of which 51 females (85%), with a mean age of 44 (± 10) and a mean BMI of 41 kg/m2 (± 3.8). The mean lean mass measured with DXA was 66.8 kg (± 11.7), compared to 65 kg (± 11) measured with ultrasound. The mean difference between the ultrasound and DXA measurement of lean mass was 1.4 kg (± 4.6). The Bland-Altman plot of the measurements showed no apparent bias between the two techniques. The ICC of the lean mass difference is 0.916 (0.855 - 0.951), demonstrating an excellent validity. Conclusions: The results of this study show that the measurement of lean mass in a population with obesity with this ultrasound technique has a high validity when compared to lean mass by DXA scan. Ultrasound can be used as a more accessible and cheaper way to monitor muscle mass in obesity during weight loss strategies.