Article Text

Download PDFPDF

How to treat patients with rheumatoid arthritis when methotrexate has failed? The use of a multiple propensity score to adjust for confounding by indication in observational studies
  1. Sytske Anne Bergstra1,
  2. Lai-Ling Winchow2,
  3. Elizabeth Murphy3,
  4. Arvind Chopra4,
  5. Karen Salomon-Escoto5,
  6. João Eurico Fonseca6,
  7. Cornelia F Allaart1,
  8. Robert B M Landewé7,8
  1. 1 Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
  2. 2 Department of Rheumatology, University of the Witwatersrand, Johannesburg, South Africa
  3. 3 Department of Rheumatology, University Hospital Wishaw, Wishaw, Scotland
  4. 4 Center for Rheumatic Diseases, Pune, India
  5. 5 University of Massachusetts Medical School, Rheumatology Center, UMass Memorial Medical Center, Worcester, Massachusetts, USA
  6. 6 Serviço de Reumatologia e Doenças Ósseas Metabólicas, Hospital de Santa Maria, CHLN, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Centro Académico de Medicina de Lisboa, Lisboa, Portugal
  7. 7 Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
  8. 8 Zuyderland Medical Center, Heerlen, The Netherlands
  1. Correspondence to Sytske Anne Bergstra, Leiden University Medical Center, Leiden RC 2300, The Netherlands; s.a.bergstra{at}


Objectives To compare consecutive disease modifying antirheumatic drug (DMARD)-treatment regimes in daily practice in patients with rheumatoid arthritis (RA) who failed on initial methotrexate, while using a multiple propensity score (PS) method to control for the spurious effects of confounding by indication.

Methods Patients with newly diagnosed RA who had failed initial treatment with methotrexate were selected from METEOR, an international, observational registry. Subsequent DMARD-treatment regimens were categorised as: (1) conventional synthetic DMARD(s) (csDMARD(s)) only (143 patients), (2) csDMARD(s)+glucocorticoid (278 patients) and (3) biological DMARD (bDMARD)±csDMARD(s) (89 patients). Multiple PS that reflect the likelihood of treatment with each treatment-regime were estimated per patient using multinomial regression. Linear mixed model analyses were performed to analyse treatment responses per category (Disease Activity Score (DAS)) after a maximum follow-up duration of 6 and 12 months, and results were presented with adjustment for the multiple PS.

Results After 6 months, follow-up PS-adjusted treatment responses yielded a change in DAS per year (95%  CI) of −2.00 (−2.65 to −1.36) if patients received a bDMARD; of −0.96 (−1.33 to −0.59) if patients received csDMARD(s)+glucocorticoids and of −0.73 (−1.21 to −0.25) if patients received csDMARDs only. These changes were −0.91 (−1.23 to −0.60); −0.43 (−0.62 to −0.23) and −0.39 (−0.66 to −0.13), respectively after 1  year of follow-up.

Conclusions In this analysis of worldwide common practice data with adjustment for multiple PS, patients with RA who had failed initial treatment with methotrexate monotherapy had a better DAS-response after a subsequent switch to a bDMARD-containing treatment regimen than to a regimen with csDMARD(s) only, with or without glucocorticoids.

  • rheumatoid arthritis
  • methotrexate
  • disease activity
  • treatment

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known about this subject?

  • Evidence about the preferred follow-up strategy after methotrexate failure is sparse.

  • There are concerns about the risk of bias when using routinely collected data to solve clinical research questions.

What does this study add?

  • Patients who fail initial treatment with methotrexate monotherapy experience more decrease in disease activity and better treatment survival after switching to treatment with a biological DMARD (disease modifying antirheumatic drug) than to treatment with conventional synthetic DMARD(s) with or without a glucocorticoid.

  • A multiple propensity score is demonstrated that can be used to control for bias when comparing routinely collected data in multiple non-randomised treatment groups.

How might this impact on clinical practice or future developments?

  • The results of this study could impact the choice among treatment strategies in patients who fail initial methotrexate treatment.


Methotrexate should be (part of) the initial treatment for patients with rheumatoid arthritis (RA).1 If the desired treatment target is not met, various other treatment options can be considered. These include switching to—or adding—a different conventional synthetic disease modifying antirheumatic drug (csDMARD) and/or a b(iological)DMARD or a glucocorticoid.2 To date, evidence about the preferred follow-up strategy in terms of early as well as sustained response is sparse.

Previous trials with static treatment or with a treat-to-target design that aimed at long-term outcomes have shown mixed results. The BeSt study showed no difference in early or sustained clinical response between step-up combination therapy versus sequential monotherapy with csDMARDs. Three other studies showed no clear benefits of adding a bDMARD versus escalating to triple csDMARD therapy after 4–6 months.3–5 However, one study suggested that when escalating to bDMARD therapy, more patients achieved a EULAR good response after 1 year.6

In trials, random allocation of patients to different treatments provides prognostic similarity, so that differences in treatment effects can only be attributed to differences in treatment. But stringent inclusion and exclusion criteria make patients in clinical trials being different from patients in daily practice.7 Increasingly, real world data of patients with RA are routinely collected and captured in large observational databases. The possibility to use these data to solve clinical questions has increased.8 But unlike randomised controlled trials, observational studies (registries), in which physicians (rather than a trial protocol) determine treatment, lack the benefit of prognostic similarity. Thus, differences in treatment responses are due to differences in treatments and due to differences in disease severity (confounding by indication). Therefore, crude comparisons across treatments do not suffice due to potential biases that may spuriously affect the conclusions.

Classic binomial propensity scores (PS) are now rapidly becoming popular in rheumatology to adjust for this bias, but they often provide a simplification of the truth. A multinomial propensity adjustment may allow to some extent for a better comparison of multiple non-randomised treatment groups, but this technique is still underused and not widely known.9–11

In this study, we have compared the 1 year efficacy of several treatment-strategy options for patients with RA who have failed initial methotrexate. We used data from a worldwide observational database of patients with RA and will introduce the ‘multiple PS’ as a method to adjust for confounding by indication and to better allow the comparison of multiple treatment regimens.


Data selection

Data were extracted from METEOR, an international, observational registry including patients with a rheumatologist diagnosis of RA. Data in METEOR were anonymised and reflected daily clinical practice, therefore medical ethics approval was not required. An extensive description of METEOR has been previously published.12

Only patients with RA who had a treatment failure on initial treatment with methotrexate monotherapy (also excluding systemic glucocorticoids) were selected, if they had a symptom duration <5 years, had newly diagnosed RA (defined as a DMARD start within 3 months after diagnosis), age at first visit ≥16, at least one visit with available composite disease activity measure (Disease Activity Score (DAS), DAS28, Simplified Disease Activity Index, Composite Disease Activity Index or RAPID3) and at least two available visits after start of the second treatment strategy.

All available follow-up visits were selected from start of the second treatment strategy (after methotrexate), until a maximum follow-up duration of 1 year. Follow-up could be shorter if the end of available follow-up was reached or if a treatment failure of the second treatment strategy occurred before 1-year follow-up was achieved.

Treatment failure (including initial failure to methotrexate monotherapy) was defined as a change in treatment strategy (either a change in type of drug or the addition of a new drug). Stepdown strategies (eg, from combination therapy of methotrexate+prednisone to methotrexate monotherapy) or strategies with changes in medication dose were excluded.

Treatment groups and outcome measures

The treatment strategies chosen after failure of the initial methotrexate treatment were divided into three categories: (1) one or more csDMARD(s) (excluding bDMARDs and glucocorticoids), (2) combination treatment with csDMARD(s) plus a glucocorticoid and (3) treatment including a bDMARD (with or without csDMARDs). Response to these second treatment strategies was measured over time by DAS.13 14

Statistical analyses

Multiple imputation

Missing data were imputed using multivariate normal imputation (30 imputed datasets). Analyses were subsequently performed in the imputed datasets.

Multiple PS

To allow a comparison of multiple non-randomised treatment strategies, it is possible to adjust for spurious effects of confounding by indication by estimating a multiple PS. This score (between 0 and 1) indicates the likelihood per patient of being treated with one out of several (more than two) treatment categories. This likelihood is conditional on a selection of pretreatment variables that together reflect to some extent the patient’s disease severity and perceived prognosis.15 16 Since treatment category is a nominal variable, the multiple PSs were estimated using multinomial regression analysis, with treatment category as dependent variable. Linear regression analyses, with DAS as dependent variable, were performed to identify all available pretreatment variables related to the outcome of the study (in this study DAS). Variables with correlations at p<0.10 were selected for inclusion in the multiple PS.17 18 Furthermore, it was checked whether adding interaction terms would further improve balance of the model. Since three treatment groups were compared, three PSs were estimated per patient.15 Since these three scores add up to 1, only two out of three scores are needed to adjust for in further analyses. After estimating the multiple PSs, it was checked by visual analysis of a density plot whether the distributions of the multiple PSs overlapped, since ‘perfect predictability of treatment category’ is not allowed.16 19 Patients who did not have a probability of being indicated for each treatment category were disregarded. Then, it was tested whether balance in the distribution of all included variables between the three treatment groups had been achieved, which is a requirement for a successful propensity model.16 For continuous variables, this was assessed using ANCOVA with treatment group as fixed factor. For dichotomous variables, logistic regression analysis was used and for nominal variables multinomial logistic regression analysis was used, both with treatment group as independent variable. These analyses were first performed without adjustment and then after adjustment for two of the three multiple PSs as well as their interactions. If the analyses were non-significant (p>0.05) after adjustment, balance was considered present. An extended, stepwise description of the multiple PS estimation has been provided in online supplementary file 2.

Estimating the treatment effect with multiple PS adjustment

Finally, the treatment effect over time was analysed, first with a maximum follow-up duration of 6 months and next of 1 year. Only visits when patients were on the medication of interest were selected.

First, the treatment effect was analysed by linear mixed modelling with DAS as dependent variable, without adjusting for the multiple PS. Treatment group, follow-up time and the interaction between treatment group and follow-up time served as independent variables, the latter providing the parameter estimates for changes in DAS over time. Random intercept and random slope were added to each model to account for irregular time intervals between visits, assuming an ‘exchangeable’ covariance matrix.

Subsequently, a final model was estimated, by adding two of the three PSs and their interaction to the linear mixed models. If the interaction term between treatment group and follow-up time proved statistically significant (p<0.10), models were stratified by medication group and reanalysed.

As a secondary analysis, we compared differences in time-to-stop treatment between treatment groups after a maximum follow-up duration of 1 year, using Cox proportional hazards regression and adjusted the analysis for the multiple PSs. All analyses were performed using STATA SE14.


Data of 509 patients from METEOR were selected for inclusion in this analysis (online supplementary figure 1). Included patients had slightly shorter symptom duration and higher disease activity and Health Assessment Questionnaire (HAQ) than non-included patients (online supplementary table 1).

Baseline characteristics per treatment group at the start of the second treatment strategy are described in table 1. Patients proceeding to csDMARD(s)+glucocorticoid included more often females, smoked less often, had longest symptom duration and had a higher DAS than patients in the other two treatment groups. Patients proceeding to bDMARDs had the shortest symptom duration and most swollen joints. Median follow-up duration on studied treatment was 6.9 (IQR 4.1; 9.4) months for patients receiving csDMARD(s), 7.8 (IQR 5.0; 10.2) months for patients receiving csDMARD(s)+glucocorticoid and 9.0 (IQR 6.2; 10.9) months for patients receiving treatment including a bDMARD. When limiting follow-up duration to a maximum of 6 months on studied treatment, median follow-up duration was 3.9 (IQR 0.92; 5.0) months for patients receiving csDMARD(s), 3.6 (IQR 2.3; 5.0) months for patients receiving csDMARD(s)+glucocorticoid and 3.7 (IQR 2.2; 5.1) months for patients receiving treatment including a bDMARD. Furthermore, patients proceeding to csDMARDs had been longer on methotrexate monotherapy before changing treatment (median (IQR) 303 (125–481) days) compared with patients proceeding to csDMARD(s)+glucocorticoid (156 (68–397) days) or to treatment including a bDMARD (190 (91–411) days). Also, at baseline, methotrexate dose was lower for patients proceeding to csDMARD(s)+glucocorticoid (median (IQR) 7.5 (7.5–15) mg) than for patients proceeding to csDMARD(s) (15 (10–20) mg) or to treatment including a bDMARD (15 (15–20) mg). The specific medication combinations per treatment group are provided in online supplementary table 2.

Table 1

Baseline characteristics per treatment group, non-imputed data

Since the METEOR registry captures daily practice data, rheumatologists were free to choose their own disease activity measure. Consequently, DAS (based on erythrocyte sedimentation rate (ESR)) was missing in 35% of all visits. However, in only 7% of all visits, no composite disease activity measure was available and in only 3% of all visits, no component of these measures was available.

Pretreatment variables that were associated with the outcome DAS (p<0.10) were included in the multiple PS. These included the variables age, gender, weight, symptom duration, rheumatoid factor, anticitrullinated protein antibodies, ESR, visual analogue scale patient global, Ritchie Articular Index, swollen joint count, HAQ, smoking and country of residence. In addition, the interaction between symptom duration and country was added to improve the model. Body mass index and presence of erosions were not associated with DAS and were therefore not included. C reactive protein and ESR were both associated with DAS, but for reasons of multicollinearity only ESR was included. The final multiple propensity model had an adjusted R2 of 0.34 (95% CI 0.29 to 0.40). Assessment of the overlap of the distributions of the multiple PSs identified five patients with a multiple PS>0.95, who did not have a probability of receiving treatment with csDMARD(s) and were therefore disregarded from further analyses (online supplementary figure 2).

Then, it was assessed whether balance had been achieved in the distribution of all included variables between the three treatment groups. While most variables were unbalanced before adjustment, after multiple PS adjustment, balance was achieved in the distribution of all included variables (p≥0.05), indicating that the multiple PS could be used for further analyses (online supplementary table 3). After multiple PS adjustment, we found statistically significant interactions between treatment group and follow-up time, both after 6 months (p=0.001) and after 1 year (p=0.029), indicating significant differences in treatment response between the three treatment groups. The adjusted treatment effect over time stratified for the different treatment groups is shown in table 2A. Both after 6 months and after 1 year, patients receiving a bDMARD experienced most decrease in DAS per year (6 months: −2.00 (−2.65 to −1.36), 1 year: −0.91 (−1.23 to −0.60)), followed by patients receiving csDMARD(s)+glucocorticoid (6 months: −0.96 (−133 to −0.59), 1 year: −0.43 (−0.62 to −0.23)) and by patients receiving treatment with csDMARD(s) alone (6 months: −0.73 (−1.21 to −0.25), 1 year: −0.39 (−0.66 to −0.13)). When comparing the adjusted (table 2A) and unadjusted (table 2B) treatment effects, the unadjusted model showed slightly larger treatment-effects, indicating that the multiple PS (at least partly) adjusted for confounding by indication.

Table 2

Change in DAS over time for each medication group (n=509)*

Results of the Cox regression showed that patients receiving treatment including a bDMARD had a lower hazard for discontinuing treatment compared with patients receiving csDMARD(s) alone (HR (95% CI) 0.38 (0.24 to 0.60)), but there were no differences between csDMARD treatment with or without a glucocorticoid (HR (95% CI 0.89 (0.66 to 1.20), figure 1). These results again slightly differed between adjusted and unadjusted models (online supplementary table 4).

Figure 1

Probability to stop treatment over time. The figure is based on unadjusted data. Blue line=csDMARD(s), red line=csDMARD(s)+glucocorticoid, green line=bDMARD±csDMARD(s). bDMARD, biological DMARD; csDMARD, conventional synthetic DMARD; DMARD, disease modifying antirheumatic drug.


In this study of a large observational database capturing daily clinical practice, we have compared several treatment strategies in patients with RA who had failed initial treatment with methotrexate. Furthermore, we have illustrated the use of the multiple PS as a method to control for bias when comparing multiple non-randomised treatment groups. After adjustment, we found that patients who switched to a bDMARD had more decrease in DAS than patients receiving csDMARD(s) therapy or combination therapy including a glucocorticoid, either after a maximum follow-up duration of 6 months or of 1 year.

Most randomised trials have not shown important differences in disease activity between adding a bDMARD and escalating to triple csDMARD therapy after 4–6 months follow-up.4 5 Our results were more in line with randomised trials that showed superiority of bDMARD treatment compared with triple csDMARD therapy after 12 months follow-up.6 Differences between our findings and previous trials may be explained by differences in baseline characteristics between studies. Patients in trials had far higher disease activity at the start of treatment than patients in our registry had. They also importantly differed in symptom duration.

While in most previous studies only two strategies were compared, we could compare three strategies simultaneously. We also found that combination therapy including a glucocorticoid did not necessarily result in more DAS improvement than csDMARD(s) therapy without glucocorticoids, in spite of a numerical difference in DAS after 6 months. These two treatment strategies had not formally been compared in randomised clinical trials before.

One problem inherent to immediately escalating from methotrexate monotherapy to a bDMARD-strategy is cost-effectiveness.20 Ideally, a positive treatment response to bDMARDs should be predicted upfront, but this is currently impossible.21 22 However, from a societal perspective and depending on the exact medication costs, escalating to a bDMARD, resulting in a rapid clinical improvement, may be cost efficient, but further research into this topic is needed.23 We found that patients who switched to a bDMARD had a lower hazard to switch treatment again in the subsequent follow-up period (indicating sustained response) than patients who switched to csDMARD therapy, with or without glucocorticoids. With longer follow-up, more patients who switched to a bDMARD continued to have a low DAS, whereas more patients in the other groups failed with a high DAS. This explains why the estimated decrease in DAS was less in the maximum 1-year follow-up analysis than in the maximum 6 months follow-up analysis. Stable treatment follow-up was more than a month longer in the bDMARD group than in the csDMARD plus glucocorticoid group and more than 2 months longer than in the csDMARD without glucocorticoid group. One limitation of the current study is that several treatments were clustered into three subgroups, which is definitely a simplification of the truth. Although there is no consistent evidence that there are, for example, differences in the efficacy of different bDMARDs and medication doses for many drugs are often fixed, it is still possible that differences in medication strategies within subgroups could influence treatment outcomes.24–26

In addition, when making treatment comparisons in non-randomised patients, such as in an observational setting, confounding by indication will jeopardise a proper judgement of treatment efficacy. Conventional PS adjustment, that has been applied in rheumatology more frequently, is based on binomial choices (treatment A vs treatment B or treatment vs no treatment). While this may suffice in many conditions, it provides a simplification of the truth in diseases such as RA with its multitude of potential choices to consider in every scenario. The multiple PS is a means to partially overcome this limitation and still adjust for confounding by indication. A limitation of these models may be that their interpretation is more difficult than that of classic PS. Therefore, we have provided a practical description of all steps involved in estimating a proper multiple PS and how to use it as a means to control for bias when analysing more than two treatment strategies prescribed in a non-randomised context. The multiple PS can easily be extended to more than three potential treatments, provided that underlying assumptions are met appropriately. Since the availability of (large) observational databases—with patients from daily practice rather than from clinical trials—is increasing, such methods are increasingly relevant and warranted to reduce potential biases and to enhance the quality of observational studies.

An advantage of the multiple PS over conventional adjustment for individual potential confounders is that it does not require large patient numbers when there are many potential confounders. Nevertheless, a minimum number of patients is needed to be able to estimate the score with sufficient confidence. When a multiple PS has been correctly calculated, it can strongly reduce the risk of bias in observational studies. By default, it can never completely adjust for bias, since not all confounders are known and/or measured.16 27 Although we aimed to include relevant confounders based on previous literature and expert opinion, we must consider residual bias due to unknown confounders. Furthermore, part of our data were missing, therefore we had to use multiple imputation to impute these data. In addition, patients were selected based on various inclusion and exclusion criteria. Although there was little difference between included and non-included patients, we can only speculate whether the selection of patients has influenced our findings.

For the current analysis, we have chosen to add the multiple PSs as covariates in our analyses. Alternatively, the multiple PS can be used for matching and stratification. However, an increasing number of treatment groups matching and stratification are often unfeasible, since they may result in groups with small numbers of patients. We used p values to assess the balancing properties of the multiple PS. However, since p values are dependent on sample size, these should always be interpreted with caution.

In conclusion, in this analysis with real life clinical data, we have shown that after multiple PS adjustment patients with RA who had failed initial treatment with methotrexate monotherapy experienced more decrease in disease activity after switching to treatment with a bDMARD than to treatment with csDMARD(s)+glucocorticoid or to csDMARD(s) alone. Furthermore, treatment-survival was better in patients receiving treatment with a bDMARD. This could have important consequences for clinical practice, when choosing among treatment strategies in patients who fail initial methotrexate treatment.



  • Handling editor Josef S Smolen

  • Contributors SAB, CFA and RBML contributed to the design, analysis and interpretation of the data. L-LW, EM, AC, KS-E, JEF and CFA contributed to the acquisition of data. SAB drafted the work. All authors revised the manuscript and read and approved the final version of the document.

  • Competing interests SAB, L-LW, AC, KS-E, CFA, RBML: none declared. EM: received support from AbbVie and UCB to attend meetings and has received support from Roche to audit work on behalf of the Scottish Society for Rheumatology. JEF: received unrestricted research grants or acted as a speaker for Abbvie, Ache, Amgen, BIAL, Biogen, BMS, Janssen, Lilly, MSD, Novartis, Pfizer, Roche, Sanofi, UCB.

  • Patient consent Not required.

  • Ethics approval The METEOR registry contains completely anonymised data which was gathered during daily practice. Treatment, timing of follow-up visits and measurements were non-protocolled. Therefore, medical ethics board approval was not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.