Article Text

Extended report
A mixed treatment comparison of the efficacy of anti-TNF agents in rheumatoid arthritis for methotrexate non-responders demonstrates differences between treatments: a Bayesian approach
  1. Susanne Schmitz1,
  2. Roisin Adams2,
  3. Cathal D Walsh1,2,
  4. Michael Barry2,
  5. Oliver FitzGerald3
  1. 1Department of Statistics, Trinity College Dublin, Dublin, Ireland
  2. 2National Centre for Pharmacoeconomics, St James Hospital, Dublin, Ireland
  3. 3St Vincent's University Hospital, Dublin Academic Healthcare, Dublin, Ireland
  1. Correspondence to Susanne Schmitz, Department of Statistics, Trinity College Dublin, Dublin 2, Ireland; schmitzs{at}


Background A number of tumour necrosis factor α (TNFα) antagonists (anti-TNFα) are available to treat rheumatoid arthritis. All of these have demonstrated considerable efficacy in placebo controlled trials, but few head-to-head comparisons exist to date. This work's objective is to estimate the relative efficacy among licensed anti-TNFs in patients who have had an inadequate response to methotrexate (MTX). Different outcome measures are used to highlight the advantages of continuous measures in such analyses.

Methods A systematic review identified randomised controlled trials comparing the efficacy of licensed anti-TNFα agents with placebo at 24 weeks in patients who have had an inadequate response to MTX. Relative efficacy was estimated using Bayesian mixed treatment comparison (MTC) models. Three different outcome measures were used: RR of achieving an American College of Rheumatology (ACR) 20 and ACR50 response and the percentage improvement in Health Assessment Questionnaire (HAQ) score.

Results 16 published trials were included in the analysis. All anti-TNFs show considerably improved efficacy over placebo. The MTC results also provide evidence of some differences in efficacy of the TNFα antagonists. Etanercept appears superior to infliximab and golimumab, and certolizumab to infliximab and adalimumab. ACR results indicate improved efficacy of certolizumab over golimumab. On HAQ analysis, adalimumab, certolizumab, etanercept and golimumab appear superior to infliximab, and etanercept shows improved efficacy compared with adalimumab.

Conclusions There are differences in efficacy among the TNFα antagonists. In a MTC, a continuous outcome measure has more strength to detect such differences than a binomial outcome measure because of its enhanced sensitivity to change.

Statistics from


Over the past decade, enhanced understanding of molecular pathogenesis has led to the development of biological agents that target specific parts of the immune system. These innovative treatments have altered the path and face of rheumatoid arthritis (RA) and outcomes for patients and society. Tumour necrosis factor α (TNFα) antagonists (anti-TNFα) are the first of the biological treatment groups used in RA. There are currently five anti-TNF agents licensed for RA in Europe: adalimumab, certolizumab, etanercept, golimumab and infliximab. All of these agents have demonstrated considerable efficacy in placebo controlled randomised controlled trials (RCTs) in patients who have had an inadequate response to conventional disease-modifying anti-rheumatic drugs (DMARDs) such as methotrexate (MTX) or sulfasalazine.

While there is a wealth of RCT evidence available for these agents compared with either placebo or conventional DMARDs, there are currently very limited head-to-head RCTs of anti-TNF agents. Despite this, some estimate of relative efficacy is needed to inform choice of agent. In the absence of head-to-head trials of relevant comparators, it is necessary to combine evidence from placebo controlled trials of different treatments and thereby derive an estimate of effect of one treatment against another. This can be broadly termed as mixed treatment comparison (MTC), an extension of meta-analysis. Different methodologies have been described for MTC; one such method uses Bayesian principles. Bayesian meta-analysis provides more flexibility than classical methods to include more data and handle more complex modelling structures.1

A number of papers have performed evidence synthesis using a formal meta-analytical framework.2,,11 Five of these studies have used a Bayesian method to combine the evidence4,,6 10 11; the remaining studies used a frequentist approach. The outcomes chosen for these analyses have primarily been American College of Rheumatology (ACR) response. The ACR response criteria measure response to treatment based on a relative improvement (%) in a defined set of core variables.12 The ACR20, ACR50 and ACR70 are the most commonly used of the response measures. The ACR20 as the most widely used measure has come under most criticism: primarily a reduced sensitivity to change and, in light of the considerable effectiveness of new therapies, that a 20% response was setting the response at a low bar.13 A further issue with the ACR response measures is the lack of statistical independence between them; for example, ACR50 responders must also be ACR20 responders. This issue was explored elsewhere in a recent MTC, with little difference to the results.10 The HAQ (Health Assessment Questionnaire) is now reported in the majority of RCTs and observational studies. The HAQ measures physical function in RA, and the extent of functional disability predicts and is associated with work disability, joint replacement surgery and mortality.14 15 Only two indirect comparisons have estimated the comparative efficacy of the anti-TNF agents using HAQ improvement as an outcome.4 7 However, these studies did not include baseline HAQ as a variable.

ACR20 and ACR50 do not provide continuous measures of change (since patients either meet or fail to meet these, regardless of the magnitude of change). For this reason, percentage improvement on the HAQ scale is also examined here. This ‘HAQ multiplier’ is used in economic analyses, which is also a reason to look at it directly.16 17


We performed a systematic review of the literature and MTC of the efficacy of anti-TNF agents in RA in patients who have had an inadequate response to MTX. The systematic review was carried out independently by two reviewers (RA and SS). The search included published studies up to and including October 2010 in PubMed, Embase and the Cochrane Database. A number of search terms were used (see online supplementary material S1) using papers published in the English language. Rheumatological inflammatory diseases, other than RA, such as ankylosing spondylitis, psoriatic arthritis and connective tissue diseases were excluded from the search.

The inclusion criteria were patients with established RA, an inadequate response to MTX, an RCT, and who have been treated for at least 24 weeks (where 24-week data were not available, data within 6 weeks either before or after 24 weeks were used). Both monotherapy and combination therapy were included with an explicit term in the statistical model allowing for the additional effect of MTX. The outcome measures chosen were ACR20, ACR50 and HAQ improvement. The total number of patients, number of respondents achieving ACR response, and the mean improvement and standard deviations (SDs) in the case of the HAQ were extracted. Authors were contacted in cases where the required data were not reported. Where no access to the missing data was provided, the following methodology was applied: in cases where the mean was not reported, the median was used; in the absence of SDs, (interquartile ranges) IQRs were used to estimate SDs using a normal approximation, and, in the remaining cases, the maximum of clinical trial SDs was used. The doses of biological agents included are those included in the RCTs. Demographic data including age, gender, mean disease duration, baseline HAQ score and previous number of DMARDs were recorded.

Statistical analysis

A Bayesian MTC model was fitted for each of the outcome measures. Such a model simultaneously performs indirect comparisons between treatments that are not directly compared and allows estimation of all pair-wise comparisons. A network diagram shows the evidence structure that was available for the analysis (figure 1).

Figure 1

Network diagram of evidence. Edges are labelled with the number of studies and the total number of patients included in these studies. Numbers in square brackets refer to Health Assessment Questionnaire evidence where this differs from American College of Rheumatology evidence. Ada, adalimumab; Cert, certolizumab; Eta, etanercept; Gol, golimumab; Inf, infliximab.

A Bayesian approach provides the flexibility of including a wide range of data and allows borrowing of strength across the whole evidence network and therefore makes best use of the available data. Rather than providing a point estimate, the Bayesian framework estimates a probability distribution for each parameter of interest. Summary measures provide a mean estimate and 80% credible intervals to capture the uncertainty surrounding the estimated effects. This approach shifts away from the misunderstood acceptance–rejection dichotomy of random significance levels (typically α = 0.05) by communicating the actual strength of evidence.18

The methods used to analyse the binary ACR outcomes are described by Nixon et al allowing explicit modelling of concurrent treatments, multiple treatment arms and study level covariates.5 The model was adjusted to calculate RR instead of ORs based on methods described by Warn et al.19

The HAQ score can be treated as continuous. An MTC model was fitted to the difference in HAQ improvement among treatments.

Both models allow the concurrent treatment with MTX in one or several treatment arms. In this way, the effect that is attributable to MTX is separated from the effect of interest—that is, the effect of the anti-TNF drugs. Studies have shown that effect size of an anti-TNF depends on a patient's disease severity.20 To account for this, HAQ improvement is modelled as the percentage improvement to baseline HAQ score, as opposed to the actual improvement. A multiplier (m) to baseline HAQ score represents the percentage improvement and can be easily derived from the data on study arm level; if, for example, a HAQ improvement of 0.46 is recorded for a baseline HAQ score of 2, the multiplier can be calculated as follows:

m = HAQ improvement/Baseline HAQ = 0.46/2 = 0.23 Since this is a continuous variable, rather than a binary one (as in the ACR20 and ACR50), it makes better use of the complete data.

Trial demographics suggest the existence of some heterogeneity between studies. To account for this, the models assume random effects for the anti-TNF efficacy, which allows for some variability among the study estimates. Treatment effects are assumed constant between treatment arms of the same study.

All models were fitted in WinBUGs, a software package using Markov chain Monte Carlo techniques.21 For details on the code, see online supplementary material S2.


Literature review and data description

Sixteen RCTs were selected for the analysis (for details of the selection process, see online supplementary material S3). Five studies met the inclusion criteria for adalimumab, four for infliximab, two for etanercept, two for golimumab, and three for certolizumab. MTX was included in all arms of 11 of the studies22,,32; no MTX was given in four of the studies.33,,36 One trial contained arms of combination and arms of monotherapy.37 A number of trials were excluded on the basis of these criteria (see online supplementary material S4).

While the demographics of those included across the studies was broadly similar, there was some heterogeneity, in particular in relation to the severity of the disease (baseline mean HAQ score ranged from 1.3 to 1.9), dose of MTX (mean ranging from 13 mg in the certolizumab trials to 18.5 mg in the etanercept trials) (table 1) and trial design. The mean age of the trial cohorts was 52 years, and most were women (80.5%). The mean disease duration was 8.7 years. The disease duration appears lower for the newer anti-TNF agents (certolizumab and golimumab), but this is probably explained by differences in practice such as early referral and difficulty in recruiting anti-TNF-naïve patients into trials. The rate of adverse effects was the same for the placebo and treatment groups across the trials (5%), and there was a higher number of patients withdrawing because of lack of response in the placebo groups (19.6%) than the treatment groups (8%).

Table 1

Mean baseline demographics of randomised controlled trials for anti-tumour necrosis factor agents

All models were extended to a meta-regression to include the covariates duration of disease, number of previous DMARDs and HAQ score at baseline. None of these were found to have a significant impact and therefore were not included in the final analysis.

Efficacy of anti-TNF agents using ACR20 and ACR50 in active RA despite MTX

Sixteen trials were included for this analysis (table 1). Two trials did not have follow-up data for ACR50 at week 24, and therefore week 30 and week 18 data were used.28 29 The results for the ACR analysis are given in table 2. The reported parameters are all pair-wise RRs for the anti-TNF agents versus placebo and one another and the between-trial SD (σ). A graphical representation of the pair-wise comparison on the logarithmic scale is shown in figure 2.

Figure 2

Graph for pair-wise log RR (LRRs) for ACR 20 and ACR 50 outcome and estimated HAQ improvement multiplier of anti-tumour necrosis factor against placebo and one another. ACR, American College of Rheumatology; Ada, adalimumab; Cert, certolizumab; Eta, etanercept; Gol, golimumab; HAQ, Health Assessment Questionnaire; Inf, infliximab.

Table 2

Mean estimate with 80% credible intervals for each pair-wise comparison

All anti-TNF agents achieved a significant ACR response over placebo (the credible intervals are higher than, and do not include, 1). The RR for certolizumab achieving ACR20 and ACR50 indicated improved efficacy over adalimumab, infliximab and golimumab. The outcomes also provide evidence of etanercept being superior to infliximab and golimumab. For ACR50, etanercept appeared approximately equal in efficacy to certolizumab (Cert vs Eta, RR 1.03); adalimumab shows improvement over infliximab.

The model detected some heterogeneity across the trials, which was captured in the between-trial variance parameter σ2 (σ=0.26 for ACR20 and 0.24 for ACR50); the heterogeneity is probably attributable to demographic characteristics such as MTX dose and trial design.

Efficacy of anti-TNF agents using HAQ improvement in active RA despite MTX

Thirteen trials were included for the HAQ analysis (table 1). For one trial, the mean change in HAQ was unavailable and so the median value was used instead.37 In five of the studies no SDs could be accessed; in one of these,37 the IQR allowed calculation of the SDs, and for the other four studies23 29 31 36 the maximum of the other SDs was used. Table 2 lists mean and credible intervals for the HAQ multiplier; a graphical representation can be found in figure 2. Again, all anti-TNF agents show significant improvement over placebo, etanercept achieving the highest improvement (m = 0.31). All anti-TNF agents have greater efficacy than infliximab. Certolizumab and etanercept appear superior to adalimumab. Etanercept shows improved efficacy over golimumab.

The between-trial SD in the HAQ model is estimated to be σ = 0.03.


This MTC presents efficacy estimates for each anti-TNF agent against each other. A number of indirect comparisons have been performed to date3 4 6 8 11 38 39; this is the first such that examines the HAQ improvement incorporating baseline disease severity in a Bayesian framework. The value of MTC to evidence-based healthcare evaluations was recently highlighted.40 It was acknowledged that the MTC framework allows inclusion of evidence that may not be possible using classical methods; the inclusion of such evidence could in turn reduce the uncertainty around the estimates of effectiveness.

Consistent with the individual trials, all anti-TNF agents show a significant improvement over placebo across all outcome measures. Etanercept and certolizumab show high efficacy throughout; other indirect comparisons have found similar results.6 41 Improved outcomes with etanercept and certolizumab may relate to reduced immunogenicity compared with the antibody therapies. With certolizumab it may be the pegylated formulation that allows less exposure of the molecule and less opportunity for an immune response. The HAQ model outcomes provide evidence that all anti-TNF agents show improvement over infliximab. This effect is not found with the ACR outcomes for adalimumab and golimumab. Furthermore, the HAQ model indicates superiority of etanercept over adalimumab. The evidence of certolizumab providing improvement over golimumab, which can be found in the ACR outcomes, is not apparent in the HAQ outcomes, which may be because only one golimumab trial was included for this model. Therefore there is not enough power to detect these differences. We found that the rank order of efficacy for these agents when using HAQ improvement as an outcome measure is etanercept, certolizumab, adalimumab, golimumab and infliximab. Most previous MTC analyses in this area did not detect any differences among anti-TNF agents; either their focus was on a different patient population2 4 5 7,,9 11 or, in many cases, anti-TNF agents were treated as one group and hence no relative efficacies between these drugs were estimated.3 4 7 9 10 All analyses adopted the 0.05 significance level, and the majority used ACR criteria only as the outcome measure.

The enhanced significance for the continuous outcome measure underlines the lack of sensitivity to change in binary outcome measures. One of the key problems highlighted previously with the ACR response is its binomial nature. It estimates the proportion of patients achieving a certain percentage improvement, and hence provides an adequate measure for a clinical trial. However, no difference is made between different response levels. For instance, using the ACR20 response, no difference is made between patients with 20% improvement and patients with 80% improvement. It may be the case that, in MTCs, where the essence lies in detecting differences, an outcome measure that is sensitive to change is more appropriate. The HAQ multiplier provides one such measure, but other continuous measures may also be suitable (eg, ACR hybrid, Disease Activity Score (DAS) 28).13 The data in the trials used for this analysis did not provide sufficient DAS28 data to test this continuous measure in the same way as the HAQ. Two alternative continuous measures have been analysed—tender and swollen joint counts—but did not provide further insight because of increased variability associated with these data and the reduced number of trials for which data were available (see online supplementary material S5).

In order to support the hypothesis that continuous measures provide more information, a discretised version of the HAQ based on the same data as the HAQ multiplier has been constructed and analysed. Changes detected by the discretised version were a subset of changes detected by the continuous HAQ measure. This illustrates the general point that continuous measures of effect provide greater power to detect change, and that this has an impact on the results of an evidence synthesis. (For details on methods and results, see online supplementary material S6.)

Heterogeneity can exist across studies clinically and methodologically. Statistically, the heterogeneity in this analysis is partially represented by σ2, the between-trial variance. It only captures heterogeneity between study populations of the same drug; variation between drugs is not captured. It is likely that there is some heterogeneity between trials of different drugs; these trials were conducted over a 10-year time scale (publication dates range from 1999 to 2009). Differences between trial design and statistical analysis of the data exist. Statistical methods for handling missing data differed over this time; later certolizumab trials used non-responder imputation for some of the analyses, whereas earlier trials used last observation carried forward. There were also some differences in the manner in which non-responders were handled between the earlier and latest trials.

The demography of the patients also differs (table 1). While this study does not include other clinical outcomes, such as radiographic scores, it could be argued that higher radiographic scores at baseline may impede the degree of improvement that can be achieved in HAQ scores. This is likely to apply to the earlier studies where effective treatment strategies were newly developed. This is demonstrated in examining the radiographic damage seen in patients enrolled in infliximab studies versus that seen in patients in certolizumab studies. In a meta-regression, the impact of different trial demographics including year of publication was examined and found not to be significant. There has also been some discussion in the literature about the problems of comparing trials of anti-TNF agents.6 41 42


Doses across treatments and treatment arms varied, which has not been accounted for in this analysis. The aim of this paper was to compare the overall efficacy of one anti-TNF against another and therefore all doses were included in the analysis. Generalising the model to include a meta-regression for dose would raise difficulties of comparability; doses across treatments are hard to compare as well as within one treatment when the same dose is given, but with different frequency.

This analysis only includes RCT evidence. However, there exists a large body of observational data collected via registries and open label studies. The inclusion of such evidence is possible within this framework, helping to reduce uncertainty further.43

This paper confirms the effectiveness of anti-TNF agents in the treatment of RA. It also provides a further method for comparing the efficacy of anti-TNF agents against one another in the treatment of RA. Using the HAQ score (in the form of a multiplier) to compare efficacy demonstrates how a continuous outcome measure enhances the ability to detect differences between treatments. The HAQ score is commonly used as a method of modelling disease improvement in RA cost-effectiveness models, and this analysis provides a means of combining HAQ data from many trials to establish overall efficacy. This study provides a comparative effectiveness estimate for anti-TNF agents, which is of use when examining these drugs in a cost-effectiveness setting. It may also be of use in the clinical setting, but only in combination with the other decision-making tools that influence choice of medication.


The authors would like to thank Professor Roy Fleischmann and Professor Sir Ravinder Maini for their assistance in providing additional data for the analysis.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Competing interests OF has received research grants from Abbott Immunology Pharmaceuticals, Pfizer and Merck and is a speaker for UCB Pharma, Pfizer, MSD and Abbott Immunology Pharmaceuticals.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.