Article Text

Download PDFPDF

Duration of rheumatoid arthritis influences the degree of functional improvement in clinical trials
  1. D Aletaha,
  2. M M Ward
  1. Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland, USA
  1. Correspondence to:
    Dr Daniel Aletaha
    Intramural Research Program, NIAMS, National Institutes of Health, 10 Center Drive, Building 10, Room 9S205, Bethesda, MD 20892, USA; aletahad{at}


Background: Functional capacity is an important outcome in rheumatoid arthritis and is generally measured using the Health Assessment Questionnaire disability index (HAQ). Functional limitation incorporates both activity and damage. Because irreversible damage increases over time, the HAQ may be less likely to show improvement in late than in early rheumatoid arthritis.

Objective: To determine the relation between sensitivity to change of the HAQ and duration of rheumatoid arthritis in reports of clinical trials.

Methods: Data were pooled from clinical trials that measured responses of HAQ scores at three or six months. The effect size of the HAQ was calculated and linear regression used to predict the effect size by duration of rheumatoid arthritis at group level. Treatment effect was adjusted for by including the effect sizes of pain scores and of tender joint counts as additional independent variables in separate models. Subgroup analysis employed contemporary regimens (methotrexate, leflunomide, combination therapies, and TNF inhibitors) only.

Results: 36 studies with 64 active treatment arms and 7628 patients (disease duration 2.5 months to 12.2 years) were included. The effect sizes of the HAQ decreased by 0.02 for each additional year of mean disease duration using all trials, and by 0.04/year in the subgroup analysis (p⩽0.01 for both analyses, except for pain adjusted models at three months).

Conclusions: In individual trials, less improvement in the HAQ might be expected in late than in early rheumatoid arthritis. Comparison of changes in HAQ among rheumatoid arthritis trials should take into consideration the disease stage of the treated groups.

  • DMARD, disease modifying antirheumatic drug
  • HAQ, Health Assessment Questionnaire
  • TNF, tumour necrosis factor
  • rheumatoid arthritis
  • sensitivity to change
  • function

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Functional capacity is an important determinant of morbidity and a predictor of mortality of patients with rheumatoid arthritis.1–4 Function is part of the core set measures to be used in clinical trials in this disease,5 and the Health Assessment Questionnaire disability index (HAQ)6 is the most commonly used instrument for this purpose.

In rheumatoid arthritis, functional impairment is a composite of disease activity and damage.7–10 While disease activity is responsive to treatment, damage is considered irreversible and accrues over time.11 Accordingly, only the component of the HAQ related to activity will be sensitive to change in clinical trials. This component is likely to decrease over time as the damage component increases. In this pooled analysis of clinical trials, we assessed whether there is an association between average duration of rheumatoid arthritis and sensitivity to change of the HAQ on the group level. Our hypothesis was that the HAQ would be less sensitive to change among trials of patients with more longstanding disease than in trials of patients with early disease.


Literature search and selection criteria

We sought to identify all clinical trials in rheumatoid arthritis of non-experimental disease modifying antirheumatic drugs (DMARDs—that is, methotrexate, sulfasalazine, hydroxychloroquine and chloroquine, oral and parenteral gold compounds, D-penicillamine, azathioprine, ciclosporine A, leflunomide, adalimumab, etanercept, infliximab, and anakinra) and corticosteroids, in which the HAQ was assessed at baseline and after three or six months. We searched PubMed (including all subsets) from 1980 (the year the HAQ was introduced) to December 2004, using “Arthritis, Rheumatoid” as the medical subject heading, along with search limits for publication type (“Clinical trial”), age (“All adult: 19+ years”), and language (“English”) (n = 2209). We also searched the Cochrane Library for that period.

Among these, we first identified studies on DMARDs using the generic drug names as subject terms (n = 873), and then studies using the HAQ employing the subject terms “function”, “disability”, “health assessment questionnaire”, or “HAQ” (n = 543). Reviewing the abstracts of these articles, we excluded all studies that investigated experimental DMARDs or non-steroidal anti-inflammatory drugs, and obtained full length reports of the remainder (n = 104). From these, we excluded 43 studies, mostly because they were multiple reports on the same group of patients (n = 29), had a design as a drug withdrawal study (n = 2), or employed functional measures other than the HAQ (n = 3). Nine studies had no outcome assessment at three months (grace period, 10 to 14 weeks) or six months (20 to 28 weeks). Sixty one studies were eligible for our investigation. On reviewing the reference lists of these articles, no additional study was found eligible.

Data extraction

Separately for each treatment arm, we collected data on the number of patients, the proportion of patients seropositive for rheumatoid factor, mean patient age, and mean and standard deviation of duration of rheumatoid arthritis at baseline. For all study arms, we extracted means and SD for HAQ scores, pain scores, and tender joint counts at baseline, and their respective mean changes after three or six months, as available. Wherever possible, we used data from the three and six month completers, but the intent-to-treat data were also acceptable. In case of missing data or if the reports were presenting medians instead of means, authors or sponsoring pharmaceutical companies for studies published 1994 or later were contacted, mostly by email. If there was no response within six weeks, they were contacted again.

Relation of baseline HAQ scores to duration of rheumatoid arthritis

As a preliminary analysis, we tested the hypothesis that average HAQ scores were associated with duration of rheumatoid arthritis on a group level at baseline. Using all available study arms, we undertook linear regression of HAQ scores (dependent variable) on duration of rheumatoid arthritis (independent variable), and adjusted for mean baseline disease activity. We included mean baseline pain scores and mean baseline tender joint counts as co-variates in separate models. As joint counts were assessed on a variety of scales, we normalised the mean tender joint counts at baseline to a scale from 0 to 100 (that is, per cent of maximum possible score). The number of patients per treatment arm varied substantially between studies. We therefore weighted all analyses by sample size to give greater emphasis to larger studies. We obtained parameter estimates for this relation and calculated their 95% confidence intervals (CI). Probability (p) values of ⩽0.05 were regarded as statistically significant.

Association of changes in HAQ scores with duration of rheumatoid arthritis

In the main analysis, we assessed whether the sensitivity to change of HAQ scores was associated with disease duration. Sensitivity to change of HAQ scores was expressed as an effect size, which we calculated12 as: [mean baseline HAQ – mean follow up HAQ]/SD of baseline HAQ, separately for the three and six month follow up. The resulting measure is unit-free and is therefore comparable across different variables; its values can be interpreted as the number or fraction of baseline standard deviations by which the patient groups improved (if positive) or deteriorated (if negative). Effect sizes of zero indicate that no change occurred on the group level. Again, we used linear regression models to predict effect sizes of the HAQ by baseline disease duration. We adjusted for changes in rheumatoid arthritis activity, as a potential confounder of this relation. For example, larger effect sizes of the HAQ can simply be a consequence of a more powerful therapeutic stimulus, without being related to the duration of the disease. We used effect sizes of pain scores and effect sizes of tender joint scores for this adjustment in separate regression models. Effect sizes were calculated as detailed above for the HAQ.

We subjected our hypothesis to a subgroup analysis of treatment arms that only included methotrexate, tumour necrosis factor (TNF) inhibitors, leflunomide, or combination therapies (using one or more of these drugs). According to the greater effect of treatments, our ability to detect associations in this analysis was increased. Again, weighted analysis was undertaken on all models to account for differences in precision between smaller and larger studies. For all statistical analyses, we used SAS 9.0 (SAS Inc, Cary, North Carolina, USA).


Literature review

Of the 61 eligible studies, 23 had complete data in the published articles. From the remaining 38 studies we solicited missing data for 33; authors were not contacted for five studies that had been published before 1994. For 13 of the 33 studies (39%), data were provided, giving a total of 36 studies available for analysis.13–48 These studies comprised 82 arms (9242 patients), of which 64 were active treatment arms (7628 patients) and 18 were placebo arms (1614 patients). These 82 arms were used for the cross sectional analysis at baseline. The 64 active treatment arms were used for the longitudinal analyses; one arm was excluded as an outlier from all longitudinal analyses (HAQ effect size of 5.0).39 The remaining 63 arms comprised 31 arms with a three month follow up and 51 arms with a six month follow up, and 3844 and 6475 patients, respectively. The number of patients per study arm ranged from 7 to 530 in the three month follow up group, and from 12 to 696 in the six month follow up group (tables 1 and 2).

Table 1

 Effect sizes in study arms with available three month follow up

Table 2

 Effect sizes in study arms with available six month follow up

Relation of baseline HAQ scores with duration of rheumatoid arthritis

To test whether HAQ scores were associated with duration of rheumatoid arthritis at baseline, we used one model adjusting for pain scores and one adjusting for tender joint counts. For the pain adjusted model, complete data were available for 77 study arms (active treatment and placebo arms, 8445 patients). For the model adjusted for tender joint counts, 68 studies were available, comprising 7566 patients. There was a statistically significant association between HAQ scores and duration of rheumatoid arthritis in both analyses, before and after adjusting for the mentioned disease activity surrogates. These models indicated that HAQ scores increase between 0.02 and 0.03 with each additional year of average duration of rheumatoid arthritis (table 3). As expected, pain scores were also a strong determinant of HAQ scores, with increases of 0.1 in HAQ per each 1 cm increase in pain on 10 cm visual analogue scales. The models using pain scores for adjustment were better in explaining the variability in baseline HAQ scores than those using normalised tender joint counts (table 3, R2), which were not associated with HAQ scores in this analysis.

Table 3

 Association of baseline Health Assessment Questionnaire scores with duration of rheumatoid arthritis

Association of changes in HAQ scores with duration of rheumatoid arthritis

We assessed the association between duration of rheumatoid arthritis and HAQ effect sizes using effect sizes of pain or effect sizes of tender joint counts as additional predictors (table 4). Only active treatment arms were included. The models weighted by study size indicated a decrease in HAQ effect sizes of 0.02 and 0.03 per each additional year of average disease duration, respectively, for the three and six month time periods (Table 4). All models except for the pain adjusted model at three months were statistically significant (p⩽0.01); the pain adjusted models had a better fit to the data than the ones adjusted for joint counts, and the three month models had a better fit than the six month models (table 4, R2). This may be because responses at three months are more homogeneous among patients than at six months.

Table 4

 Association of effect sizes of the Health Assessment Questionnaire with duration of rheumatoid arthritis (all treatment arms)

In a subgroup analysis we investigated the same models using methotrexate, leflunomide, TNF inhibitors, and combinations of these drugs (tables 1 and 2, identified by asterisks). The fit of these models to the data was similar to the comparable models in the above analysis (table 5; R2). As in the analysis of all drugs, longer disease duration was associated with smaller effect sizes of the HAQ. The magnitude of this association was larger in these trials (−0.04/year; weighted analysis) than in the analysis of all trials. Again, all analyses except for the pain-adjusted models at three months were statistically significant. As can be seen from tables 4 and 5, weighting for size of the study arm increased the power to detect statistically significant associations.

Table 5

 Association of effect sizes of the Health Assessment Questionnaire with duration of rheumatoid arthritis (only study arms including methotrexate, leflunomide, tumour necrosis factor inhibitors, or combination therapies including one or more of these drugs)

We also assessed the association of duration of rheumatoid arthritis with changes in HAQ scores instead of their effect sizes. Although this approach was not correcting for data precision (baseline SD of HAQ), these absolute changes in HAQ scores can be more readily interpreted than effect sizes. These models indicated that improvement in mean HAQ scores decreased by 0.01 (all drugs) and 0.02 (subgroup of more contemporary drugs) per each additional year of disease duration. These results were found in both the three and six month analyses.


In this study we showed that the sensitivity to change in the HAQ in clinical trials of rheumatoid arthritis depends on the average disease duration: the longer the average duration of rheumatoid arthritis, the smaller the responsiveness of the HAQ in a group of patients when changes in disease activity occur. This contrast in the responsiveness of the HAQ compared with pure measures of disease activity is in line with the inherent duality of functional measurements, which reflect both disease process (activity) and outcome (damage). Given that damage is irreversible and that it tends to increase over time, a lesser improvement in functional impairment in patients with more longstanding disease was an intuitive finding. We quantified this association between disease duration and HAQ effect sizes after adjusting for changes in disease activity (treatment effect). Our data suggest that each additional year of average duration of rheumatoid arthritis decreases the effect size of the HAQ by 0.02, which corresponded to a decrease in average HAQ improvement of 0.01. These associations were similar at three and six months, as well as for the activity adjustments using pain or tender joint counts. However, when we analysed a subgroup of trials employing major drugs, this association was even stronger, and the estimates were twice as large as the ones resulting from the analyses of all treatment arms. This indicates that this association needs even more consideration in trials of more contemporary DMARDs.

Longitudinal cohort studies have shown that HAQ scores increase over time. In one study, (median) HAQ scores increased by 0.24 over 12 years,10 which is consistent with our estimation of the cross sectional relation (0.02/year). Also similar to that single study, disease activity was a strong determinant of HAQ scores, independent of disease duration—that is, also in late disease. One recent report on data of a rheumatoid arthritis drug trial indicated that changes seen in HAQ scores are inversely related to the degree of radiographic damage.9 Our study is supportive of these data, although here the exposure variable was duration of disease rather than radiographic damage. Duration might better reflect other components of functional impairment, such as psychosocial factors, that may cumulate over time.

It should be emphasised that the associations found in this study were quantified for groups of patients, and that these numbers may vary considerably in individual patients. In fact, given that trials of established and late rheumatoid arthritis usually comprise patients with a wide range of disease duration, our analyses of group level data did not favour the identification of significant associations with disease duration. Although part of the effect seen in studies could be attributable to regression to the mean, there is no reason to believe that this is different in early and late disease. The mean (SD) of HAQ effect sizes in the placebo arms of the trials analysed was much smaller than in the active treatment arms, both at three months (0.09 (0.13)) and at six months (0.16 (0.12)). These placebo effect sizes were also not associated with the duration of rheumatoid arthritis (data not shown). Also, our findings apply to short term responsiveness of the HAQ, which differs from the responsiveness of measures of disease activity in many trials (table 1); our findings are not meant to indicate how the HAQ may improve during long term antirheumatic treatment.49 However, a recent study15 showed that patients with early rheumatoid arthritis have the potential to achieve lower HAQ scores than those with more established disease, and that the proportion with HAQ scores of 0 is almost twice as large in early disease.

In our study, we adjusted for disease activity by using only single surrogates—namely, pain scores or tender joint counts. Although, neither is likely to account for all variability in HAQ effect sizes that is attributable to changes in disease activity, the use of a larger set of activity surrogates would have limited the inclusion of many trials owing to missing data, and would have threatened the validity of our findings. Also, like HAQ scores, pain scores are self reported, which partly accounts for potential bias in reporting that might be present in some patients,50 while joint counts are examiner based, contrasting the different methods of adjustment.

The results of our study indicate that the sensitivity to change of the HAQ has an inverse relation to the duration of rheumatoid arthritis, which is independent of changes in rheumatoid activity but more pronounced if more contemporary DMARDs are used. This can become important if results of trials are compared, especially between studies of established or late rheumatoid arthritis and those of early rheumatoid arthritis. As function is one of the most important outcomes in rheumatoid arthritis, it is crucial to interpret functional response to treatment in clinical trials of this disease in the context of its duration.


This study was supported in part by the Austrian Science Funds (FWF) and the Intramural Research Program of the National Institute of Arthritis, Musculoskeletal, and Skin Diseases, National Institutes of Health. We thank all investigators and companies that have provided additional data on their studies, which has allowed us to undertake the analyses to their fullest scope.


View Abstract


  • Published Online First 23 June 2005