Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported ...Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT).
With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents.
In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change.
Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Abstract Objectives Patient-reported outcomes (PROs) are essential when evaluating many new treatments in health care; yet, current measures have been limited by a lack of precision, standardization, ...and comparability of scores across studies and diseases. The Patient-Reported Outcomes Measurement Information System (PROMIS) provides item banks that offer the potential for efficient (minimizes item number without compromising reliability), flexible (enables optional use of interchangeable items), and precise (has minimal error in estimate) measurement of commonly studied PROs. We report results from the first large-scale testing of PROMIS items. Study Design and Setting Fourteen item pools were tested in the U.S. general population and clinical groups using an online panel and clinic recruitment. A scale-setting subsample was created reflecting demographics proportional to the 2000 U.S. census. Results Using item-response theory (graded response model), 11 item banks were calibrated on a sample of 21,133, measuring components of self-reported physical, mental, and social health, along with a 10-item Global Health Scale. Short forms from each bank were developed and compared with the overall bank and with other well-validated and widely accepted (“legacy”) measures. All item banks demonstrated good reliability across most of the score distributions. Construct validity was supported by moderate to strong correlations with legacy measures. Conclusion PROMIS item banks and their short forms provide evidence that they are reliable and precise measures of generic symptoms and functional reports comparable to legacy instruments. Further testing will continue to validate and test PROMIS items and banks in diverse clinical populations.
Abstract Objective To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static ...instruments. Study Design and Setting The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets ( n > 2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation SD = 10) in a US general population sample. Results The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. Conclusion The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range.
The ability to effectively measure health-related quality-of-life longitudinally is central to describing the impacts of disease, treatment, or other insults, including normal aging, upon the ...patient. Over the last two decades, assessment of patient health status has undergone a dramatic paradigm shift, evolving from a predominant reliance on biochemical and physical measurements, such as erythrocyte sedimentation rate, lipid profiles, or radiographs, to an emphasis upon health outcomes based on the patient's personal appreciation of their illness. The Health Assessment Questionnaire (HAQ), published in 1980, was among the first instruments based on generic, patient-centered dimensions. The HAQ was designed to represent a model of patient-oriented outcome assessment and has played a major role in many diverse areas such as prediction of successful aging, inversion of the therapeutic pyramid in rheumatoid arthritis (RA), quantification of NSAID gastropathy, development of risk factor models for osteoarthrosis, and examination of mortality risks in RA. Evidenced by its use over the past two decades in diverse settings, the HAQ has established itself as a valuable, effective, and sensitive tool for measurement of health status. It is available in more than 60 languages and is supported by a bibliography of more than 500 references. It has increased the credibility and use of validated self-report measurement techniques as a quantifiable set of hard data endpoints and has contributed to a new appreciation of outcome assessment. In this article, information regarding the HAQ's development, content, dissemination and reference sources for its uses, translations, and validations are provided.
To estimate responsiveness (sensitivity to change) and minimally important difference (MID) for the Patient-Reported Outcomes Measurement Information System (PROMIS) 20-item physical functioning ...scale (PROMIS PF-20).
The PROMIS PF-20, short form 36 (SF-36) physical functioning scale, and Health Assessment Questionnaire (HAQ) were administered at baseline, and 6 and 12 months later to a sample of 451 persons with rheumatoid arthritis. A retrospective change (anchor) item was administered at the 12-month follow-up. We estimated responsiveness between 12 months and baseline, and between 12 months and 6 months using one-way analysis of variance F-statistics. We estimated the MID for the PROMIS PF-20 using prospective change for people reporting getting 'a little better' or 'a little worse' on the anchor item.
F-statistics for prospective change on the PROMIS PF-20, SF-36 and HAQ by the anchor item over 12 and 6 months (in parentheses) were 16.64 (14.98), 12.20 (7.92) and 10.36 (12.90), respectively. The MID for the PROMIS PF-20 was 2 points (about 0.20 of an SD).
The PROMIS PF-20 is more responsive than two widely used ('legacy') measures. The MID is a small effect size. The measure can be useful for assessing physical functioning in clinical trials and observational studies.
The Health Assessment Questionnaire Disability Index (HAQ) and the SF-36 PF-10, among other instruments, yield sensitive and valid Disability (Physical Function) endpoints. Modern techniques, such as ...Item Response Theory (IRT), now enable development of more precise instruments using improved items. The NIH Patient Reported Outcomes Measurement Information System (PROMIS) is charged with developing improved IRT-based tools. We compared the ability to detect change in physical function using original (Legacy) instruments with Item-Improved and PROMIS IRT-based instruments.
We studied two Legacy (original) Physical Function/Disability instruments (HAQ, PF-10), their item-improved derivatives (Item-Improved HAQ and PF-10), and the IRT-based PROMIS Physical Function 10- (PROMIS PF 10) and 20-item (PROMIS PF 20) instruments. We compared sensitivity to detect 12-month changes in physical function in 451 rheumatoid arthritis (RA) patients and assessed relative responsiveness using P-values, effect sizes (ES), and sample size requirements.
The study sample was 81% female, 87% Caucasian, 65 years of age, had 14 years of education, and had moderate baseline disability. All instruments were sensitive to detecting change (< 0.05) in physical function over one year. The most responsive instruments in these patients were the Item-Improved HAQ and the PROMIS PF 20. IRT-improved instruments could detect a 1.2% difference with 80% power, while reference instruments could detect only a 2.3% difference (P < 0.01). The best IRT-based instruments required only one-quarter of the sample sizes of the Legacy (PF-10) comparator (95 versus 427). The HAQ outperformed the PF-10 in more impaired populations; the reverse was true in more normal populations. Considering especially the range of severity measured, the PROMIS PF 20 appears the most responsive instrument.
Physical Function scales using item improved or IRT-based items can result in greater responsiveness and precision across a broader range of physical function. This can reduce sample size requirements and thus study costs.
Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response ...theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated.
A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ.
Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms.
Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length.
Over the last 2 decades, assessment of patient health status has undergone a dramatic paradigm shift, evolving from a predominant reliance on biochemical and physical measurements to an emphasis upon ...health outcomes based on the patient's personal appreciation of their illness. The Health Assessment Questionnaire (HAQ), published in 1980, was among the first instruments based on patient centered dimensions. The HAQ was designed to represent a model of patient oriented outcome assessment and has played a major role in diverse areas such as prediction of successful aging, inversion of the therapeutic pyramid in rheumatoid arthritis (RA), quantification of nonsteroidal antiinflammatory drug gastropathy, development of risk factor models for osteoarthrosis, and examination of mortality risks in RA. The HAQ has established itself as a valuable, effective, and sensitive tool for measurement of health status. It has increased the credibility and use of validated self-report measurement techniques as a quantifiable set of hard data endpoints and has contributed to a new appreciation of outcome assessment. We review the development, content, and dissemination of the HAQ and provide reference sources for its uses, translations, and validations. We discuss contemporary issues regarding outcome assessment instruments relative to the HAQ's identity and utility. These include: (1) the issue of labeling instruments as generic versus disease-specific; (2) floor and ceiling effects in scales such as "disability"; (3) distances between values on scales; and (4) the continuing introduction of new measurement instruments and their potential effects.
To evaluate the validity of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function measures using longitudinal data collected in six chronic health conditions.
...Individuals with rheumatoid arthritis (RA), major depressive disorder (MDD), back pain, chronic obstructive pulmonary disease (COPD), chronic heart failure (CHF), and cancer completed the PROMIS Physical Function computerized adaptive test or fixed-length short form at baseline and at the end of clinically relevant follow-up intervals. Anchor items were also administered to assess change in physical function and general health. Linear mixed-effects models and standardized response means were estimated at baseline and follow-up.
A total of 1,415 individuals participated (COPD n = 121; CHF n = 57; back pain n = 218; MDD n = 196; RA n = 521; cancer n = 302). The PROMIS Physical Function scores improved significantly for treatment of CHF and back pain patients but not for patients with MDD or COPD. Most of the patient subsamples that reported improvement or worsening on the anchors showed a corresponding positive or negative change in PROMIS Physical Function.
This study provides evidence that the PROMIS Physical Function measures are sensitive to change in intervention studies where physical function is expected to change and able to distinguish among different clinical samples. The results inform the estimation of meaningful change, enabling comparative effectiveness research.
Background: The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of ...research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. Design: The network consists of 6 primary research sites (PRSs), a statistical coordinating center (SCC), and NIH research scientists. Governed by a steering committee, the network is organized into functional subcommittees and working groups. In the first year, we created an item library and activated 3 interacting protocols: Domain Mapping, Archival Data Analysis, and Qualitative Item Review (QIR). In the second year, we developed and initiated testing of item banks covering 5 broad domains of self-reported health. Results: The domain mapping process is built on the World Health Organization (WHO) framework of physical, mental, and social health. From this framework, pain, fatigue, emotional distress, physical functioning, social role participation, and global health perceptions were selected for the first wave of testing. Item response theory (IRT)-based analysis of 11 large datasets supplemented and informed item-level qualitative review of nearly 7000 items from available PRO measures in the item library. Items were selected for rewriting or creation with further detailed review before the first round of testing in the general population and target patient populations. Conclusions: The NIH PROMIS network derived a consensusbased framework for self-reported health, systematically reviewed available instruments and datasets that address the initial PROMIS domains. Qualitative item research led to the first wave of network testing which began in the second year.