Article Text


Extended report
Responsiveness and minimally important difference for the Patient-Reported Outcomes Measurement Information System (PROMIS) 20-item physical functioning short form in a prospective observational study of rheumatoid arthritis
  1. Ron D Hays1,2,
  2. Karen L Spritzer1,
  3. James F Fries3,
  4. Eswar Krishnan3
  1. 1Department of Medicine, UCLA Division of General Internal Medicine & Health Services Research, Los Angeles, California, USA
  2. 2RAND, Santa Monica, California, USA
  3. 3Stanford ARAMIS Program, Stanford University School of Medicine, Palo Alto, California, USA
  1. Correspondence to Professor Ron D Hays, UCLA Department of Medicine, Division of General Internal Medicine & Health Services Research, 911 Broxton Avenue, Los Angeles, CA 90095-1736, USA; drhays{at}


Objective To estimate responsiveness (sensitivity to change) and minimally important difference (MID) for the Patient-Reported Outcomes Measurement Information System (PROMIS) 20-item physical functioning scale (PROMIS PF-20).

Methods The PROMIS PF-20, short form 36 (SF-36) physical functioning scale, and Health Assessment Questionnaire (HAQ) were administered at baseline, and 6 and 12 months later to a sample of 451 persons with rheumatoid arthritis. A retrospective change (anchor) item was administered at the 12-month follow-up. We estimated responsiveness between 12 months and baseline, and between 12 months and 6 months using one-way analysis of variance F-statistics. We estimated the MID for the PROMIS PF-20 using prospective change for people reporting getting ‘a little better’ or ‘a little worse’ on the anchor item.

Results F-statistics for prospective change on the PROMIS PF-20, SF-36 and HAQ by the anchor item over 12 and 6 months (in parentheses) were 16.64 (14.98), 12.20 (7.92) and 10.36 (12.90), respectively. The MID for the PROMIS PF-20 was 2 points (about 0.20 of an SD).

Conclusions The PROMIS PF-20 is more responsive than two widely used (‘legacy’) measures. The MID is a small effect size. The measure can be useful for assessing physical functioning in clinical trials and observational studies.

  • Outcomes Research
  • Health Services Research
  • Rheumatoid Arthritis

Statistics from


Physical functioning is an especially important indicator of health for older individuals and one of the strongest predictors of healthcare utilisation and mortality. A physical functioning item bank was created for the Patient-Reported Outcomes Measurement Information System (PROMIS) project,1 which consists of 124 items assessing mobility (lower extremity), dexterity (upper extremity), axial or central activity (neck and back function), and complex activities that overlap with more than one domain (daily living activities). The items were found to satisfy the item response theory unidimensionality assumption, and item parameters were estimated using a sample of over 21 000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis.2–4 The PROMIS physical functioning bank was shown to have greater precision than existing measures. The PROMIS physical functioning items were recently translated and adapted for use in the Dutch culture.5

Item response theory makes it possible to estimate the underlying score using a subset of the items in the full bank. Subsets of the physical functioning items (short forms) can be chosen to minimise response burden. In a cross-sectional study, a 20-item short form was selected from the ‘best’ PROMIS items3 which yielded more information (precise measurement) than the short form 36 (SF-36) physical functioning scale and Health Assessment Questionnaire (HAQ). But information about responsiveness (sensitivity to change), an important indicator of validity, for the PROMIS 20-item physical functioning measure has not yet been reported. Rheumatoid arthritis is a progressive disease and physical function tends to decline over time.

A responsive measure is sensitive to improvements, deteriorations and stability of health status over time.6 ,7 This paper evaluates the responsiveness of the PROMIS 20-item physical functioning scale (PROMIS PF-20) in a prospective observational cohort of people with rheumatoid arthritis.


Data sources and measures


A total of 451 patients participating in the Arthritis, Rheumatism and Aging Medical Information Systems (ARAMIS) cohorts during 2000–2002 accepted our invitation to participate in this study. There were no specific inclusion or exclusion criteria. ARAMIS is a multicentre longitudinal observational study in the USA that has been following patients who meet the American College of Rheumatology classification criteria.8 ,9 These patients were followed over a year using semi-annual surveys. The study was approved by the Stanford University Institutional Review Board (IRB-17334).

An observational study of patients follows them as they receive whatever treatment their healthcare providers implement. Responsiveness can be estimated in this sort of study as long as there are enough subjects who get worse, stay the same, and get better over time.


Physical functioning is a subdomain of physical health, which is in turn a subdomain of general health ( The PROMIS definition of physical function is the ability to perform basic and instrumental activities of daily living. The PROMIS physical functioning items assess ability to perform, not whether or not an activity actually has been performed (box 1). The items assess capability and use the present tense and avoid attribution to disease or other limiting context. The PROMIS item bank assesses the latent trait of physical functioning ability.

Box 1

Content of PROMIS PF-20

  1. Are you able to do chores such as vacuuming or yard work?

  2. Are you able to push open a heavy door?

  3. Are you able to dress yourself, including tying shoelaces and doing buttons?

  4. Are you able to wash your back?

  5. Are you able to dry your back with a towel?

  6. Are you able to sit on the edge of a bed?

  7. Are you able to wash and dry your body?

  8. Are you able to get in and out of a car?

  9. Are you able to squeeze a new tube of toothpaste?

  10. Are you able to hold a plate full of food?

  11. Are you able to run a short distance, such as to catch a bus?

  12. Are you able to shampoo your hair?

  13. Are you able to get on and off the toilet?

  14. Are you able to transfer from a bed to a chair and back?

  15. Does your health now limit you in doing vigorous activities, such as running, lifting heavy objects, participating in strenuous sports?

  16. Does your health now limit you in bending, kneeling or stooping?

  17. Does your health now limit you in lifting or carrying groceries?

  18. Does your health now limit you in doing 2 hours of physical labor?

  19. Does your health now limit you in walking more than a mile?

  20. Does your health now limit you in climbing one flight of stairs?

Study participants were administered 19 of the 20 PROMIS PF-20 items. ‘Are you able to wash your back’ (item 4 in box 1) was not administered because of overlap with other similar items administered in the study. The correlation between scores estimated from the 19 items with the PROMIS PF-20 in the PROMIS wave 1 dataset was 0.998 (n=14 600).

The first 14 items shown in box 1 were administered with five response options: ‘without any difficulty’, ‘with a little difficulty’, ‘with some difficulty’, ‘with much difficulty’ and ‘unable to do’. The last six items were administered using five other response options: ‘not at all,’ ‘very little,’ ‘somewhat,’ ‘quite a bit,’ and ‘cannot do’.

In addition to the PROMIS PF-20, widely used self-report measures of physical functioning (‘legacy’ measures) were also administered to provide comparative information. These instruments were the 20-item HAQ10 and the 10-item SF-36 physical functioning scale.11

An ‘anchor’ item was administered on both the two follow-up surveys: ‘We would like to know about any changes in how you are feeling now compared with how you were feeling 6 months ago. How has your ability to carry out your everyday physical activities such as walking, climbing stairs, carrying groceries, or moving a chair got a lot better, got a little better, stayed the same, got a little worse, or got a lot worse?’


The PROMIS PF-20 and two legacy measures were self-administered at baseline and 6 and 12 months after baseline. As noted above, the anchor item was included in the 6-month and 12-month follow-up surveys. Surveys were administered by mail with three rounds of follow-up which included postcard and telephone reminders and multiple mailings of the survey. The attrition rate over the 1-year course of the study was 13%.

Statistical analysis

The anchor item at the 12-month follow-up assessment was used to categorise study participants into five retrospective ratings of change groups: ‘lot better’, ‘little better’, ‘same’, ‘little worse’ and ‘lot worse’. Because the anchor referred to change over the last 6 months, we estimated change on the PROMIS PF-20 between the 6-month and 12-month post-baseline assessments according to change on the anchor. In addition, we examined change from baseline to the 12-month post-baseline assessment to see if there was consistency in responsiveness over a longer time period. This anchor was the independent variable in analyses of variance in which the PROMIS PF-20, SF-36 physical functioning scale and HAQ were dependent variables.

We computed correlations (product–moment and Spearman) between change on the PROMIS PF-20 and the anchor item. F-statistics from one-way analyses of variance were used as indicators of responsiveness.12 ,13 In addition, we estimated the minimally important difference on the PROMIS PF-20 by looking at prospective change for the two subgroups that reported on the retrospective anchor item getting ‘a little better’ or ‘a little worse’. Duncan multiple range tests were performed to identify when prospective change on the PROMIS PF-20 differed significantly by retrospectively reported change group. Finally, we conducted a sensitivity analysis for responsiveness by collapsing the ‘a little’ and ‘a lot’ categories so that the anchor items had three categories and computed F-statistics for the three physical functioning scales.


Forty-nine per cent of the sample reported an age of 64 years or younger, with 15% being 65–69, and 36% 70 or older; 81% were female; 87% were white; median educational level was 14 years (range 2–18); 6% were current smokers; median body mass index was 26.

Table 1 presents correlations among the PROMIS PF-20, SF-36 and HAQ physical functioning scales at baseline. Also provided are the means, SDs and range of scores. All three scales were strongly associated with one another; the HAQ was somewhat more strongly related to the PROMIS PF-20 than was the SF-36.

Table 1

Correlations among physical functioning scales and descriptive statistics at baseline

On the retrospective rating of change (anchor) item at the 12-month assessment, 21 people reported being ‘a lot better’, 35 ‘a little better’, 252 ‘the same’, 113 ‘a little worse’ and 30 ‘a lot worse’. Product–moment (Spearman) correlations for prospective change with the anchor item were 0.35 (0.33) at 12 months and 0.34 (0.33) at 6 months for the PROMIS PF-20, 0.29 (0.32) at 12 months and 0.22 (0.26) for the SF-36 physical functioning scale, and 0.29 (0.25) at 12 months and 0.29 (0.25) at 6 months for the HAQ.

Tables 24 show prospective change estimates for the PROMIS PF-20, SF-36 physical functioning scale and HAQ, respectively, by the retrospective anchor item for the 12-month and 6-month time intervals. F-statistics for prospective change in the PROMIS PF-20, SF-36 and HAQ physical functioning measures by the retrospective change item over 12 months were 16.64, 12.20 and 10.36, respectively (all p values< 0.0001). F-statistics for the 6-month change were 14.98, 7.92 and 12.90, respectively (all p values <0.0001).

Table 2

Change on PROMIS PF-20 by self-reported retrospective rating of change

Table 3

Change on SF-36 physical functioning scale by self-reported retrospective rating of change

Table 4

Change on HAQ by self-reported retrospective rating of change

For the three-category version of the anchor item (collapsing the ‘a little’ and ‘a lot’ response categories), F-statistics for prospective change in the PROMIS PF-20, SF-36 and HAQ physical functioning measures by the retrospective change item over 12 months were 30.71, 21.43 and 15.66, respectively (all p values <0.0001). F-statistics for the 6-month change were 23.54, 12.49, and 13.47, respectively (all p values <0.0001).

The estimates in table 2 show that the change on the PROMIS PF-20 at 12 months for those who were ‘a lot better’ on the anchor was significantly different from those reporting they were ‘the same’, ‘a little worse’, or ‘a lot worse’ on the anchor. In addition, those who reported they were a ‘little worse’ on the anchor differed significantly from those who reported they were ‘the same’ and those who were ‘a lot worse’. Similar results were found for change at 6 months.

A change of about 2 points on a T-score metric (SD=10) is associated with a report of getting ‘a little better’ or ‘a little worse’, but change over 6 months for those reporting that they got ‘a little worse’ was about 1 point. Hence, the estimated minimally important difference for the PROMIS PF-20 appears to be about 0.20 (small effect size) of the baseline SD.


The American College of Rheumatology and other professional organisations have recommended that functional status in patients with rheumatoid arthritis be assessed at least annually to systematically identify patients not doing well and to benchmark physician performance. The PROMIS project was initiated to improve precision and the validity of health outcome measures. Previous analyses provided support for the greater precision of measurement of the PROMIS physical functioning measures compared with legacy measures.3 This study provides support for the construct validity (responsiveness) of the PROMIS PF-20 compared with the SF-36 physical functioning scale and the HAQ. The PROMIS measures were also designed to minimise response burden. The PROMIS PF-20 is estimated to take about 5 min (using the Hays & Reeve14 rule of thumb of 3–5 items per minute) to administer. We recommend that the PROMIS PF-20 be considered for this assessment and as an end point in studies of rheumatoid arthritis. Standard item parameters can be used to score the PROMIS PF-20 (see using ‘response pattern scoring’. Raw score to T-score conversion tables are available at

Bio-similar drugs for rheumatoid arthritis are expected to enter the market in the next few years. The regulatory pathway for approval of these drugs will involve performance of non-inferiority trials against the existing products. This study suggests that a change in the PROMIS PF-20 of 2 or more may be a minimally important difference for such trials. Although a change of two points does not necessarily warrant changes in therapy, the clinician can be confident that a change of this magnitude is non-trivial.

Future work is needed to evaluate the performance of the PROMIS PF-20 in additional samples and with other external anchors of change. In addition, research is needed to evaluate the extent to which the results reported here generalise to the full PROMIS item bank, computer-adaptive short form testing administration, and other static measures developed from it such as the 10-item PROMIS physical function


We appreciate the feedback received from other PROMIS investigators on this work.


View Abstract


  • Handling editor Tore K Kvien

  • Contributors All authors included on the paper fulfil the criteria of authorship: conception and design, or analysis and interpretation of data; drafting the article or revising it critically for important intellectual content; and final approval of the version to be published.

  • Funding This paper was supported in part by an NIH cooperative agreement (1U54AR057951). RDH was also supported by UCLA/DREW Project EXPORT, NIMHD, (2P20MD000182). The paper's contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

  • Competing interests None.

  • Ethics approval Stanford University Institutional Review Board (IRB-17334).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The data analysed in this paper are available for academic research and education from Stanford University.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.