Article Text

Download PDFPDF

Treatment-related improvement in physical function varies with duration of rheumatoid arthritis: a pooled analysis of clinical trial results
  1. D Aletaha1,
  2. V Strand2,
  3. J S Smolen3,
  4. M M Ward4
  1. 1
    Department of Rheumatology, Internal Medicine III, Medical University of Vienna, Austria
  2. 2
    Stanford University, Portola Valley, California, USA
  3. 3
    Second Department of Medicine, Hietzing Hospital, Vienna, Austria
  4. 4
    National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland, USA
  1. Dr D Aletaha, Division of Rheumatology, Medical University of Vienna, Waehringer Guertel 18–20, 1090 Vienna, Austria; daniel.aletaha{at}


Background: Physical function in rheumatoid arthritis (RA) has reversible and irreversible components, and is typically assessed by the Health Assessment Questionnaire Disability Index (HAQ). Since irreversible components are expected to increase with longer duration of RA and reduce the ability for improvement in physical function, we analysed responsiveness of HAQ scores in patient populations with differing RA durations in randomised controlled trials (RCTs).

Methods: Data from all RCTs published between 1980 and 2005 that reported changes from baseline in HAQ at 6 and/or 12 months were analysed. Treatments were grouped as “biologics”, or “traditional” disease modifying antirheumatic drugs (DMARDs), and “placebo”. We computed effect sizes of HAQ in each trial, and contrasted the association between these effects and duration of RA among treatment groups using regression models.

Results: We identified 42 RCTs with complete data for the statistical models. The models indicate that discrimination of functional improvement between active drug groups and placebo is reduced in patients with a longer duration of RA (p = 0.02 for the change in discrimination over time). The placebo-adjusted HAQ responses decreased on average by 0.37 per year of RA duration.

Conclusion: Responsiveness in HAQ scores is inversely associated with mean disease duration in RA. This impacts assessment of physical function, a key outcome measure in RCTs and practice, and impacts the ability to discriminate active treatment from placebo.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Rheumatoid arthritis (RA) is a chronic inflammatory disease that has the propensity to cause joint destruction and disability. A variety of synthetic and biologic disease modifying antirheumatic drugs (DMARDs) have been shown to help preserve joint integrity and improve physical function.1 Functional capacity is a major determinant of morbidity and a predictor of mortality of patients with RA.26 Measures of physical function are part of the core set of measures to be used in randomised controlled trials (RCTs) in RA,7 and the Health Assessment Questionnaire Disability Index (HAQ)8 is the instrument most commonly utilised in RCTs.

In any given patient with RA, impairment in physical function is a composite of reversible and irreversible components. The reversible component is that related to joint pain, stiffness, and swelling due to inflammation, or associated symptoms such as depression, all of which may be impacted by treatment. The potentially irreversible component may be due to joint destruction or deformity,9 or can be the consequence of comorbid conditions. The concept that functional impairment in RA is comprised of two components, one related to disease activity and one related to damage, has been frequently noted in the literature.1013 As predicted from this dual nature, functional measures are less responsive to change among patients with longer disease duration of RA,14 and the degree of irreversibility can be directly quantified.15

An important implication of the decreased ability to improve physical function among patients with longer disease duration14 is that discrimination of functional response between two treatments, or between active drug and placebo, might be limited among these patients. The ability to detect improvement in functional ability would be expected to be greatest in early RA, when the majority of impairment is potentially reversible, and less in late RA, when more of the functional impairment may be due to irreversible damage.

We tested this concept that demonstration of differences among treatments in their ability to improve physical function would depend on the duration of RA. We used pooled data from all clinical trials that measured functional impairment using the HAQ, and modelled responsiveness of the HAQ as a function of the duration of RA of patients in the trials.


Literature search and identification of trials

We sought to identify all clinical trials in RA of “biologics” or “traditional” disease modifying antirheumatic drugs (DMARDs), corticosteroids, and placebo, in which the HAQ was assessed at baseline and 6 and/or 12 months. We searched PubMed (including all subsets) from 1980 to January 2006, using “arthritis, rheumatoid” as medical subject heading along with search limits for publication type (“clinical trial”), age (“all adult: >19 years”), and language (“English”). The year 1980 relates to the introduction of the HAQ, however, it has only regularly been used in RCTs since 1995. We also searched the Cochrane Library for that period, and the related documents of the US Food and Drug Administration.

Among these trials, we first identified studies on traditional or biologic DMARDs, steroids and placebo by using the generic drug names as subject terms, and then studies employing the HAQ by using the subject terms “function” or “disability” or “health assessment questionnaire” or “HAQ”. Reviewing the abstracts of these articles, we excluded studies of experimental DMARDs that did not subsequently receive regulatory approval, and obtained full-length reports for the remainder. From these, we excluded studies that were multiple reports on the same group of patients, were drug withdrawal studies, did not include the HAQ, or had no assessment at either 6 months (grace period: 20–28 weeks) or 12 months (grace period: 46–58 weeks). We accepted studies that had outcomes assessed at only one of these two time points. A total of 62 studies were eligible by these criteria. Review of the reference lists of these articles found no additional eligible studies.

Data extraction

For each treatment arm, demographic and clinical data, including the proportion of patients seropositive for rheumatoid factor, mean patient age, and mean and standard deviation (SD) of duration of RA at baseline were abstracted by two of the authors (DA, 40 studies; VS, 22 studies). For all arms, means and SD for HAQ scores at baseline, and their respective mean changes after 6 months and 12 months were extracted. Wherever possible, data from completer’s analyses were utilised, but most studies presented the intent-to-treat data using last observation carried forward (LOCF) to account for missing data. For studies published in 1994 or later, we contacted study authors or sponsoring pharmaceutical companies when reports presented medians instead of means or did not include all relevant data. If there was no response within 6 weeks, they were contacted again. One study with clearly outlying HAQ effect sizes (⩾5; for calculation see below) was excluded from the analysis.16 After the requests and this exclusion, 42 studies with complete data for inclusion in the statistical model were available, that is, with information on mean duration of RA and HAQ effect size (see below) at 6 months and/or 12 months.1657 Duration of RA served as a surrogate marker of accrued damage, because other, more direct measures of damage, such as radiographic scores or orthopaedic surgery, were based on different methods of evaluation or were not available in most studies, respectively, and could therefore not be used for the purpose of pooling and further modelling. Of the 42 studies with complete data, 37 studies (87 treatment arms) contributed to the 6-month analysis,1618 20 21 2339 4145 4749 5157 19 studies (42 treatment arms) contributed to the 12-month analysis,16 19 20 22 24 29 3537 39 40 42 44 4650 53 and 14 studies contributed to both time points.16 20 24 29 3537 39 42 44 4749 53

Responsiveness of HAQ scores

The responsiveness of HAQ58 was calculated using the effect size, as: (mean HAQ at 6 months–mean baseline HAQ)/standard deviation of baseline HAQ. The resulting measure is unit-free, and its values can be interpreted as the number or fraction of baseline standard deviations by which the patient groups improved (if negative) or deteriorated (if positive). Effect sizes of zero indicate that no change occurred on the group level. We also calculated the effect sizes of the HAQ in studies of 12 months of treatment, using the same formula.


We performed two statistical models. The first one, a generalised linear model, was used to estimate the association of the effect size of the HAQ with the mean duration of RA for each type of treatment. The model parameters were used to estimate the effect size of HAQ scores for treatment groups with increasing duration of RA. In the same model a categorical variable indicating one of three treatment groups was used. These were traditional DMARDs (monotherapy or in combination, including steroid trials), biologic DMARDs (monotherapy or combination with traditional DMARDs), and placebo therapies (true placebos or addition of placebo to background therapy).

In the second model, placebo-adjusted effect sizes were calculated for all trials in which biologic DMARDs were directly compared with placebo. Placebo-adjusted effect sizes were calculated as: ((mean HAQ at 6 months–mean baseline HAQ)biologic treatment arm–(mean HAQ at 6 months–mean baseline HAQ)placebo arm))/(standard deviation of baseline HAQ)placebo arm. This preserved the original randomisation of these trials and reflected the primary main interest of the original studies, namely to compare drug vs placebo. We used linear regression to examine the association between disease duration and placebo-adjusted HAQ effect sizes.

Finally we performed two sensitivity analyses: first, we repeated the first model, first using 12-month data instead of 6-month data; second, we analysed only trials that had been performed according to methodologically rigorous protocols. To identify such trials, the presence of five key characteristics of quality was ascertained. These were: a formal RCT design; presence of a true control arm (placebo or placebo plus background therapy); control for type of background therapy (eg, methotrexate only vs “any DMARDs”); use of the trial for regulatory purposes; and prospectively defined use of established outcome measures, such as the response criteria of the American College of Rheumatology, and/or the Disease Activity Score.

In all models, the contribution of each trial arm to the respective model was weighted by the number of patients included in that trial arm.


Table 1 gives an overview of the characteristics of trials used in the main analyses and in the two sensitivity analyses. The analysis of 6-month data included 87 trials and 10 655 patients. The average duration of RA in the trials using biologic therapies was higher than in those of traditional DMARDs (9.0 vs 4.2 years); likewise, the average HAQ at baseline was higher for biologic than traditional DMARDs (1.5 vs 1.3).

Table 1 Characteristics of trials

Discrimination of functional responses decreases with increasing disease duration

Discrimination according to drug class

Analysis of trial results at 6 months showed that the effect size of HAQ improvement was significantly different between biologic and traditional DMARDs, and placebo (p<0.001 across groups). The overall effect of duration on HAQ effect size was marginally statistically significant (p = 0.06), but the association of duration of RA with the HAQ effect size differed depending on the treatment group (p = 0.02 for interaction of duration and treatment category) (model F = 7.37; p<0.001; R2 = 0.31): The estimated mean responses based on these data in fig 1A indicate that, as expected, effect sizes for placebo treatments were very small (>−0.3). Also, the 95% confidence intervals for the placebo responses of the HAQ were very large spanning up to 1.5 effect size units, and not statistically significantly different from 0. There was no association between the magnitude of the placebo response of the HAQ and the duration of RA.

Figure 1 Responsiveness of Health Assessment Questionnaire Disability Index (HAQ) scores in patient groups with different duration of rheumatoid arthritis (RA). A. 6-month analysis; B. 12-month analysis. Based on a weighted generalised linear model, effect sizes for the HAQ are estimated and compared between placebo therapies (background disease modifying antirheumatic drugs (DMARD) + placebo or placebo alone; red line), traditional DMARDs (green line), and biologic DMARDs (blue line). Discrimination between groups was lower in later RA than in early RA.

In contrast, the effect of traditional DMARDs and biologics on the HAQ was highly significant at all durations of RA (fig 1A). For the biologic DMARDs, effects on the HAQ decreased considerably as the duration of RA increased among trials: estimated effect sizes were approximately −2.0 in very early RA and decreased to >−0.8 in late RA (fig 1A). Importantly and in line with this observation, there was a significant difference in the degree of functional response between biologics and placebo therapies in early RA but not in late RA.

Among all traditional DMARD arms, the HAQ effect size was stable across the durations of RA tested. However, if only methotrexate arms were analysed from the heterogeneous group of traditional DMARDs, the estimated HAQ effect sizes were −1.0 for 1-year duration of RA, and decreased significantly to −0.37 in late RA (13-year duration). For leflunomide arms the respective effect sizes were −1.0 for 1-year duration and −0.68 for 13-year duration (detailed data not shown). These findings suggest differential discrimination of improvement in the HAQ by methotrexate and leflunomide in early RA vs later RA.

Discrimination based on original randomisation

In this model, within-trial placebo-adjusted HAQ effect sizes were calculated for biologics. This model preserves the original randomisation that had been performed in the trial. The adjusted model showed a highly significant association of placebo-adjusted response with duration of RA, indicating the adjusted effect size of the HAQ in trials of biologics decreased on average by 0.37 per year of RA duration (model F = 30.8; p<0.001; R2 = 0.59). The placebo-adjusted model for the group of traditional DMARDs did not explain sufficient variability in the outcome (R2 = 0.15).

Sensitivity analyses

Discrimination of functional response at 12 months: confirmation of 6-month results

The analysis of the 12-month data was based on 42 trials and 5902 patients, but the model explained a significantly higher proportion of variation in the HAQ effect size than the 6-month model (R2 = 0.61), although only four trial arms were true placebos. Overall, the effect sizes for the HAQ at 12 months were smaller than those predicted for 6 months (fig 1B), but there were still significant differences among categories of treatments. The association between HAQ effect size and the duration of RA as a main effect in the model approached statistical significance (p = 0.07). However, the pattern of differences among treatments and the association between the HAQ effect size and duration of RA was similar to that in the analysis of 6-month data (model F = 11.05; p<0.001; R2 = 0.61). In this analysis, placebo treatments had no demonstrable effect on the HAQ in early or later RA, while in studies of both biologics and traditional DMARDs, the HAQ effect size was lower among trials of patients with RA of longer duration. The similarity between the 12-month trial results and the 6-month trial results suggests that the lower responsiveness of the HAQ in later RA seen in the 6-month trials was not solely because 6 months was too short a time for patients with RA of longer duration to respond to treatment. Rather, the decrease in responsiveness among patients with longer durations of RA was more likely due to decreased ability of the HAQ to respond, possibly because irreversible functional limitations were more prevalent in these patients.

Analysis using methodologically rigorous trials: confirmation of analysis of all trials

Since there may be differences in the meticulousness of the trial design, we analysed a subset of randomised controlled trials, which had been used for regulatory purposes. These 37 trial arms comprised 5925 patients. Although the confidence intervals were greater than in the main analysis of the 6-month data, we again found that functional responses to biologic and traditional DMARDs were more difficult to discern from placebo responses in late RA compared to early RA (model F = 4.46; p = 0.004; R2 = 0.42). Also there was a trend indicating a propensity for greater placebo responses in late RA, which could be related to differences in trial characteristics, as well as early rescue therapy. The differences observed here suggest reduced capability to discriminate between treatments when patient groups with long durations of RA are studied. As before, this was also true for the comparison of functional responses between biologics and traditional DMARDs.


The results of this study show that the ability to detect differences in functional improvement between active treatment and placebo, or between treatment groups, in clinical trials of RA varies with the duration of RA. The clear functional benefits of biological therapies seen in early RA, can be completely lost in late RA, where potentially not even the discrimination of these effects from mere placebo response is possible. Findings were similar for traditional DMARDs, especially when methotrexate was analysed. In general, this supports the concept that physical function has reversible and irreversible components, the latter reducing the responsiveness of any functional measures, such as the HAQ. Subsequently, this would lead to a decreasing ability to discriminate functional responses in late RA, which has been shown in the present study.

Importantly, responsiveness of measures of RA activity, such as the acute phase response or composite indices, have not been found to differ with increasing duration of RA. Our findings suggest that measures of physical function differ from those of RA activity, ie, they support the notion that function and disease activity should be regarded as separate outcome domains in RA.

Several studies have reported that RA activity was the most significant determinant of HAQ scores at all durations of RA.13 59 Even in moderately long-standing or late RA, the irreversible component of the HAQ may not exceed about 30%.14 Therefore, the reduction of sensitivity to change of the HAQ in relation to duration of RA was not predictable to the extent quantified in the present study.

The analysis based on 6-month trial data predicted a significantly greater effect on function for biologic compared to traditional DMARDs in early, but not in established RA. Traditional DMARDs have a slower onset of action, and thus these differences might simply be due to differences in the rapidity of response. Heterogeneity in onset of action among traditional DMARDs might also obscure their signal at early follow-up times. In the additional 12-month analysis, the slopes of responsiveness across patient groups with increasing RA duration were similar for biologic and traditional DMARDs.

Several issues regarding the design of our study should be recognised. First, it should be emphasised that the associations found in this study were quantified for groups of patients, and that the duration of RA may vary considerably among patients within trials. Second, a limitation of RCT data is that part of the functional effects seen could be attributable to regression to the mean. However, there is no reason to believe that the extent of this phenomenon would be different in early and late RA. In addition, our results were supported by the results of the placebo-adjusted analysis, which maintained the original randomisation of the respective studies. Third, the results of the present study are based on a statistical model that assessed the association of HAQ responsiveness and RA duration and compared its effect in different groups of interventions. In this context, we contrasted biologic DMARDs, traditional DMARDs, and placebo. For the purpose of our study, the distinction into these three classes was a trade-off between removal of within-group heterogeneity of effect sizes and statistical power. Our results show that discrimination, even on the basis of these therapeutic categories, is clearly significantly reduced in patient groups with long-standing RA. Finally, another potential limitation is that we used intent-to-treat (ITT) data if data on completers were not available. In ITT analyses missed data are usually imputed using the last observation carried forward (LOCF) method. The LOCF data clearly affects 12-month more than 6-month data, because more patients drop out or are provided rescue therapy during longer trials.

In addition to analysing two different follow-up time points, 6 and 12 months, we also analysed a subgroup of the rigorously performed trials to test whether unblinded trials or trials with less reliable outcomes assessment affected the results. Both results supported the main analyses (according to drug group or based on original randomisation).

Our results indicate that trials of medications in RA will have greater difficulty to demonstrate improvement in functional impairment if tested in patients with more long-standing RA than if tested in patients with early RA. Discrimination between active treatments and placebo among patients with more long-standing RA may require longer studies and larger sample sizes to detect a given level of improvement in physical function. This relative decrease in discrimination is likely to be specific to physical function and would not affect other measures of RA activity similarly. However, given that physical function is one component of the American College of Rheumatology (ACR) response criteria, the decreased responsiveness of measures of physical function in later RA may decrease the options by which patients with long-standing RA can meet criteria for response. In addition, our results indicate that comparisons of improvements in physical function between therapeutic interventions and across different RCTs can only be made with careful consideration of the potential for improvement within a specific patient group.



  • Funding: This research was supported in part by the Austrian Science Funds, and the Intramural Research Program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health.

  • Competing interests: None declared