Article Text

Download PDFPDF

Benchmarking and the percentile assessment of RA: adding a new dimension to rheumatic disease measurement
  1. F WOLFE
  1. H K CHOI
  1. National Data Bank for Rheumatic Diseases
  2. Arthritis Research Centre Foundation
  3. University of Kansas School of Medicine
  4. Wichita, Kansas, USA
  5. Arthritis Unit
  6. Department of Medicine
  7. Massachusetts General Hospital
  8. Boston, USA
  1. Professor F Wolfe, Arthritis Research Centre Foundation, 1035 N Emporia, Suite 230, Wichita, KS 67214, USA fwolfe{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

It is somewhat surprising that the importance of benchmarking1 was not generally realised until the last decade.2-5 Benchmarking conveys extraordinary advantages in the understanding of clinical trials, observational studies, and ordinary clinic patients by providing guidelines of expected values. Benchmarks allow us to assess the status of patients, whereas most clinical trials are concerned only withchange. Statustells us how patients are doing today and predicts short and long term outcomes, including costs, work disability, and death.Change describes treatment efficacy, but has little to do with prediction or outcome. In the long run, it isstatus not changethat is important.

Benchmarking requires a relevant, generalisable, large, well characterised population sample that is representative of either a population based or a clinical based cohort of patients with rheumatoid arthritis (RA). In the former instance, milder cases and cases not currently meeting RA may be included. The clinical sample should be composed of cases typical of those found in clinical practice, and should be representative of clinical practice in general. In addition, the sample must have enough detail to ensure that relevant RA covariates are captured. Duration of disease can be important for outcomes such as work disability, and mortality can be directly pegged to disease duration. Finally, to be useful the sample must be large enough so that there can be confidence that the estimates are reliable.

Benchmarking also requires accurate and reliable measures, of which there are now a wide range that are relevant to rheumatology and can form the basis of the benchmarks. In addition to the Health Assessment Questionnaire (HAQ),6 pain scale, and Short Form-36 (SF-36) subscales7 ,8 of the current report, other relevant scales include patient global severity, fatigue, swollen and tender joint counts, and laboratory tests such as the C reactive protein or erythrocyte sedimentation rate.

The study of Wiles et al 9 is one more in a series of superior reports from the Norfolk Arthritis Register community based inception cohort study designed by Deborah Symmons and her colleagues in Manchester.10-12 It meets the two criteria suggested above, and delineates population based RA/polyarthritis from clinical RA as represented by hospital patients and/or those meeting ACR RA criteria at onset. One limitation of this study is that the sample size is small, a limitation that is related to the low incidence of RA in the community; and this limitation may be reflected in a lack of precision of the median estimate. Although the authors do not provide confidence intervals for their data, they can be estimated from similar data from community rheumatologists available in the US National Data Bank for Rheumatic Diseases (NDB).5For 213 patients with RA with a duration of RA of 4.75–5.25 years, the median and bootstrapped 95% CI for HAQ is 1.0 (95% CI 0.875 to 1.25), or an upper and lower CI 0.125 units above or below the median. For the SF-36 physical function scale those values are 55 (95% CI 40 to 60).

With such excellent data in hand, the authors propose that their five year results should be used as a measure against which five year outcomes from other cohorts may be judged. However, this raises the important question as to whether benchmarks at specific time intervals are required for patient self reported data. We have shown elsewhere that HAQ scores13 and other status measures14change little over time. To consider the issue further in the variables under study by Wiles et al,9 we randomly selected 12 949 patients with RA from the NDB in the years 1999–2001. The NDB is a databank of American patients with arthritis obtained from the practices of 244 (for the current report) rheumatologists.5 Using quantile (median) regression, we note that HAQ scores increase 0.0133 (SE.001) units a year. Put into perspective, the median values (and their implied confidence intervals) presented by Wiles et al might be expected to encompass HAQ values ±9.4 years; or through a disease duration of 0 to 13.4 years. A similar case for limited change with time can be made for SF-36, where, for example, the physical function score (NDB n=8479) decreases at −0.578 (SE 0.072) units a year. For pain (0–10 scale, NDB n=12 870), the change is 0.021 (SE 0.003) units a year, a change that is essentially clinically meaningless. In addition, there is very little time related change in the other SF-36 variables and in the other clinical measures shown in table 1. This relative stability over time occurs because it is inflammation and/or pain rather than disease duration that is the primary driving force for most of these clinical measures. These observations suggest that it is not necessary to benchmark at specific time points in the duration of RA for the measures included in table 1. Although time-specific benchmarks of patient self reported data may be more accurate than those that ignore the time element, they have the important limitation that they can only be used at a single time point, thereby substantially limiting their clinical usefulness. However, it is clearly necessary to account for time when outcomes such as work disability, total joint arthroplasty, and mortality are concerned.15-18

Table 1

Health status scores of patients with RA in the practices of 244 American rheumatologists1-150, as registered in the US National Data Bank for Rheumatic Diseases (1999–2001)

How should benchmarks be used? The data of table 1 represent a large and generalisable sample of patients with RA, making it possible to apply this table to RA studies and patients. In addition, more detailed charts with values at 10 percentile units or less are available If these data are applied to the results of randomised controlled trials (RCTs) of the anti-tumour necrosis factor agents infliximab19 and etanercept20, we note that before starting treatment HAQ scores were 1.7 and 1.65, corresponding to percentile values of 77 and 76, respectively. At study closure, these percentiles had changed to 59 and 55, respectively. These percentile data give additional insight into the effectiveness of these agents that is not apparent from ACR improvement scores or raw HAQ scores, for they put into perspective the results of patients treated with these agents in comparison with all other patients with RA. In addition, percentile benchmarks allow one to compare RCTs by understanding the severity of disease in patients entering studies.

Percentile benchmarks also have an important role in individual patient care, and can answer the question, “How is my patient doing?” Figure 1 represents the percentile benchmarking system applied to a patient with severe RA. It is easy to understand the benefit this patient received from his most recent disease modifying antirheumatic drug.

Figure 1

Benchmarking in clinical practice. Patient is a 48 year old attorney who had rheumatoid arthritis (RA) for 18 years. RA was nodular, seropositive, highly erosive, and had required total joint arthroplasty of elbows, hip, and left knee. Previous treatment included methotrexate, sulfasalazine, hydroxychloroquine, penicillamine, azathioprine, etanercept, and prednisone (7.5 mg ongoing). The figure shows the status and change in status after six months' treatment using benchmarking by percentiles.

Thirty five years ago Landsbury described an index that might be considered to be the forerunner of benchmarking systems for rheumatology.21 That index was based on experience rather than data, and faded from use as treatments and measures changed. With far better data and valid, sensitive measures, it is now time to turn from relative change scores which are difficult to interpret to easy to understand, meaningful benchmark measurements.