Objective To derive and validate decision trees to categorise rheumatoid arthritis (RA) patients 12 weeks after starting etanercept with or without methotrexate into three groups: patients predicted to achieve low disease activity (LDA) at 1 year; patients predicted not to achieve LDA at 1 year and patients who needed additional time on therapy to be categorised.
Methods Data from RA patients enrolled in the TEMPO trial were analysed. Classification and regression trees were used to develop and validate decision tree models with week 12 and earlier assessments that predicted long-term LDA. LDA, defined as disease activity score in 28 joints (DAS28) ≤3.2 or clinical disease activity index ≤10.0, was measured at 52 or 48 weeks. Demographics, laboratory data and clinical data at baseline and to week 12 were analysed as predictors of response.
Results 39% (67/172) of patients receiving etanercept and 60% (115/193) of patients receiving etanercept plus methotrexate achieved LDA at week 52. For patients receiving etanercept, 53% were predicted to have LDA, 39% were predicted not to have LDA and 8% could not be categorised using DAS28 criteria at week 12. For patients receiving etanercept plus methotrexate, 63% were predicted to have LDA, 25% were predicted not to have LDA and 12% could not be categorised.
Conclusion Most (80–90%) patients in TEMPO initiating etanercept with or without methotrexate could be predicted within 12 weeks of starting therapy as likely to have LDA or not at week 52. However, approximately 10–20% of patients needed additional time on therapy to decide whether to continue treatment.
Statistics from Altmetric.com
The presentation and disease course are highly variable in patients with rheumatoid arthritis (RA). Unsurprisingly, there is also great variation in the response to both non-biological and biological disease-modifying antirheumatic drugs (DMARD). Given the chronic nature of RA (with the consequent need for long-term treatment), the expense of newer biological DMARD, and the urgency of identifying efficacious treatment to minimise joint damage in individual patients, the ability to predict response to treatment would have substantial clinical and economic impact.
The National Institute for Health and Clinical Excellence and the British Society for Rheumatology recommend discontinuation of antitumour necrosis factor (TNF) therapies after 6 months in the absence of an adequate response.1 2 The American College of Rheumatology (ACR) recommends re-evaluation of patients who have not achieved clinical benefit within 12 weeks of initiating anti-TNF therapy.3 Given the short period of time in which physicians are expected to make treatment decisions, it has become increasingly important to identify features in individual patients that may assist in decisions to continue or discontinue a treatment regimen.
Etanercept is a human TNF receptor-Fc fusion protein that binds to TNF and inhibits its interaction with cell surface TNF receptors. Etanercept is approved for the treatment of moderately to severely active RA. Using data from a pivotal trial of etanercept, the objective of this analysis was to derive and validate a decision tree that was able to categorise patients within 12 weeks after starting etanercept with or without methotrexate into one of three groups: patients predicted to achieve low disease activity (LDA) at 1 year; patients predicted not to achieve LDA at 1 year and patients who were not able to be categorised at 12 weeks and would need additional time on therapy. Additional analyses substituted the new ACR/European League Against Rheumatism (EULAR) remission definition4 for the LDA outcome and categorised patients at 12 weeks as being likely or not to achieve remission at 1 year.
Data from patients enrolled in the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO)5 were used in this analysis. Patients 18 years or older with active, adult-onset RA were enrolled in TEMPO. Patients in TEMPO received etanercept (25 mg twice a week), methotrexate (7.5 mg escalated to 20 mg oral capsules once a week within 8 weeks if patients had any painful or swollen joints), or both.
Definition of LDA
The primary outcome of this retrospective analysis was LDA (disease activity score in 28 joints (DAS28) ≤3.2)6 at week 52 (or week 48 if the DAS28 measurement was missing at week 52). LDA as a goal is consistent with recent treat-to-target recommendations suggesting that while remission is an optimal goal, LDA is acceptable, especially for RA patients with established disease.7 As a secondary outcome, patients were considered to have LDA if they had a clinical disease activity index (CDAI) of 10 or less at week 52.8 An additional secondary outcome required remission using the ACR/EULAR Boolean definition (tender and swollen joint ≤1, C-reactive protein (CRP) ≤1 mg/l, and patient global assessment ≤1 on a scale of 0 to 10).4
Patient demographics (age, sex, race (white vs non-white)), clinical data (tender joint count, swollen joint count, rheumatoid factor status, DAS28 raw score at baseline and change from baseline score at weeks 4, 8 and 12, health assessment questionnaire disability index (HAQ-DI) score, patient pain, physician global assessment, patient global assessment, CDAI and laboratory data (erythrocyte sedimentation rate, CRP)) at baseline and at each visit to week 12 were included as candidate variables as predictors of LDA at week 52.
Patients were included in this analysis if they had DAS28 assessments at 48 or 52 weeks of therapy and had received etanercept alone or etanercept plus methotrexate. Patients who dropped out early because of unsatisfactory efficacy were considered to be non-responders. Patients who dropped out of the study for safety reasons were excluded from the analysis.
Classification and regression trees (CART) software (Salford Systems, San Diego, California, USA) was used to develop and validate models for identifying week 12 and earlier assessments that would predict LDA at week 52. The CART model relies on statistically optimum recursive splitting of the patients into subgroups based on critical levels of the prognostic variables. In the general implementation of CART, the dataset is split into the two subgroups that are the most different with respect to the predictor variable outcomes, and subgroups are split further based on the same principle. The percentages of patients with LDA were calculated for each node of the regression tree.
Patients predicted to have LDA at week 52 by the predictor variables were classified as responders and patients predicted to not achieve LDA at week 52 were classified as non-responders. The remaining patients (who had an approximately 40–60% predicted likelihood of achieving LDA) had an unclear likelihood of response and were classified as indeterminate responders needing additional time on treatment.
Two analyses were performed: the primary analysis used LDA based on DAS28 at 52 or 48 weeks; secondary analyses used LDA based on CDAI at 52 or 48 weeks and remission at 52 or 48 weeks. A 10-fold cross-validation technique9 10 was used to guard against model overfitting, a potential problem in prediction models in which the model fits the dataset used to derive it but would fit other datasets less well. Misclassification penalties of 3:1 were implemented, placing greater emphasis on correctly classifying patients who were predicted to be non-responders. This procedure optimises the prediction model for patients predicted to be non-responders, as it is these patients for whom a decision to change the treatment regimen at 12 weeks would be likely to be made.
Finally, we examined the tradeoff between the degree of accuracy of a prediction model that might be minimally acceptable to a clinician and the resulting proportion of patients who could be classified with that amount of accuracy. The best-performing decision tree derived from the combination etanercept plus methotrexate dataset was evaluated using 1000 simulated datasets with bootstrapping techniques. Patients receiving etanercept plus methotrexate were sampled 1000 times with replacement to generate 1000 bootstrap samples of equal size to the original TEMPO etanercept plus methotrexate arm. For each of the 1000 samples, the performance of the decision tree was evaluated iteratively by varying the level of required accuracy across a range from 50% to 100%. Accuracy was defined as the proportion of patients who could be correctly classified by the decision tree. The proportion of the patients in each node of the tree who could be classified with the required amount of accuracy relative to the total sample size was then plotted against the accuracy level for that iteration. The process was then repeated for the remainder of the 1000 datasets. All points were plotted and a Loess smoothing curve was fitted; this figure described the proportion of the population that could be predicted to have LDA at 1 year as a function of the accuracy required for that prediction.
Patient demographics and disease characteristics at baseline by treatment group and LDA status at weeks 52 or 48 as defined by DAS28 are shown in table 1. Thirty-nine per cent of patients receiving etanercept and 60% of patients receiving etanercept plus methotrexate achieved an LDA response at week 52 based on DAS28. Demographic and clinical characteristics at baseline were similar across treatment groups and between responders and non-responders.
High concordance between LDA assessed by DAS28 and LDA assessed by CDAI at week 52 was demonstrated by the κ coefficients: κ=0.64 for patients receiving etanercept; κ=0.78 for patients receiving etanercept plus methotrexate.
Patients receiving etanercept plus methotrexate
LDA in patients receiving etanercept plus methotrexate was predicted by DAS28 at week 12 and change in DAS28 from baseline at week 8 (figure 1A). Patients receiving etanercept plus methotrexate were categorised into three groups by 12 weeks: responders (63% of all patients, 81% accuracy); non-responders (25% of all patients, 88% accuracy) and patients with an indeterminate likelihood of response (12% of all patients).
Response to therapy in patients receiving etanercept plus methotrexate combination therapy was predicted by CDAI at week 12 and swollen joint count at week 8 (figure 1B). Patients were categorised at week 12: responders (54% of all patients, 94% accuracy); non-responders (29% of all patients, 80% accuracy) and patients with an indeterminate likelihood of response (17% of all patients).
In summarising the two models presented in figures 1A and 1B, 83–88% of patients could be classified as responders or non-responders by 12 weeks. The accuracy of prediction for these individuals was approximately 85%. For the remainder of the 12–17% of patients, additional time on therapy would be necessary to determine their treatment response at 1 year.
The model that predicted remission at 1 year is shown in figure 2. A total of 26% of patients receiving etanercept plus methotrexate achieved remission. The key predictor variables included the tender and swollen joint count (both at week 12), patient pain (at week 12), and CRP (measured at week 4). At 12 weeks, 95% of patients could be classified as responders or non-responders. The accuracy of prediction for patients classified at week 12 as non-responders (58% of all patients) was 98%.
Patients receiving etanercept monotherapy
Response to therapy in patients receiving etanercept monotherapy was predicted by DAS28 at week 12 and tender joint count at week 8 (figure 3A). Patients receiving etanercept monotherapy were categorised into three groups by 12 weeks: responders (53% of all patients, 61% accuracy); non-responders (39% of all patients, 93% accuracy) and patients with an indeterminate likelihood of response (only 8% of all patients).
The second model constructed using CDAI is shown in figure 3B. Response in patients receiving etanercept monotherapy was predicted by HAQ-DI at week 8, change in CDAI at week 4 and CDAI at week 12. Patients were categorised by week 12 as responders (49% of all patients, 81% accuracy), non-responders (27% of all patients, 91% accuracy) and patients with an indeterminate likelihood of response (24% of all patients). Too few patients in the etanercept monotherapy arm achieved remission to warrant deriving a prediction model for this treatment group.
Tradeoff of degree of accuracy and the proportion of classifiable patients in a prediction model
As shown in figure 4, there was a tradeoff between accuracy and the proportion of patients who could be classified with that amount of accuracy. Simulations represented in the cluster of points on the left side of the figure show that only 25% of the patients sampled from the bootstrapped TEMPO study population could be classified with approximately 90% accuracy. Assuming a willingness to tolerate somewhat lesser accuracy of 80–85%, the substantial majority of the study population sampled from TEMPO could be classified by week 12; indeed, approximately 60–65% of the TEMPO patients could be classified with 85% accuracy, and 85–90% of patients could be classified with 80% accuracy.
Patients with RA have a heterogeneous pattern of response to currently available therapy, including TNF inhibitors and other DMARD's. Researchers have proposed that predicting which patients will develop aggressive versus mild disease is important in order to tailor therapy that will be promising and predictable.11 Whereas baseline disease activity may assist in the selection of the type of medications used to treat RA, baseline measures are generally inadequate to predict treatment outcome,12 as we have also shown here (table 1). For that reason, we built DAS28 and CDAI-based models using early treatment response (to 12 weeks) to show that approximately 80–90% of patients in TEMPO could be classified by 12 weeks with respect to achieving LDA at 1 year. For the remaining approximately 15% of patients, additional time on therapy would be needed to determine their longer-term treatment response. Substituting the alternative outcome of RA remission for LDA, the proportion of patients able to be classified at week 12 was higher (95%); overall accuracy of the remission-focused model was similar to the prediction models with LDA as the outcome. Model accuracy for predicted non-responders (98%) using the remission outcome was higher than for the LDA outcome.
Results from a study by Verstappen et al13 showed that early response to non-biological DMARD therapy in the first year, rather than the kind of initial treatment given, predicted disease remission in patients with early RA. Similarly, in patients receiving anti-TNF therapy, the likelihood of continuation of treatment was predicted by the response that they had during the first 3 months of treatment. This finding suggested that decision-making regarding the continuation of non-biological and biological DMARD therapy might be considered as early as 3 months,14 which led to our decision to use the response to treatment to 12 weeks as the key predictor variables.
Aletaha et al15 analysed pooled data from several clinical trials of patients with early and established RA.5 16,–,19 Similar to our results, they found that disease activity after 3 months of DMARD or anti-TNF therapy, but not at baseline, determined the treatment response at 1 year. We have extended those findings to be able to provide a clinically useful, albeit preliminary, decision tree to classify patients as being responders, non-responders, or those for whom more time (beyond 12 weeks) is needed to predict response at 1 year. In contrast to that earlier report, our results incorporate multiple predictors at different time points to improve the validity of prediction and increase the proportion of patients for whom such a prediction can be made by week 12.
Three other studies specifically examined the ability to predict treatment outcomes based on short-term response to treatment with anti-TNF therapies. Gülfe et al14 found that response to treatment as early as 6 weeks predicted the continuation of current therapy at 3 months. Pocock et al20 found that a substantial number of patients who had not achieved response at 3 months of treatment were able to continue treatment and achieve a response at 6 months, supporting a need for longer therapeutic trials before discontinuation in some patients. This result is also supported by a previously published study by Kavanaugh et al,21 which also suggested that some patients who did not respond by 3 months did achieve some response by 6 months. Our results provide guidance on which patients need more time beyond 12 weeks to make a clinical decision, and which patients probably do not. Patients we classified as ‘indeterminate’, comprising only approximately 15% of the TEMPO population, probably would benefit from additional time on therapy. In contrast, we were able to predict treatment response accurately for the remainder of the patients (85%) by 12 weeks; for those predicted to be non-responders at 12 weeks, a decision might be made to switch treatment regimens, and no additional time on therapy would be necessary.
We based our prediction model on clinical and laboratory-based factors. The identification of surrogate markers of response with genetics or proteomics that might assist in predicting treatment response or response to a particular therapy is an attractive possibility but has been elusive. The current lack of genetic markers or biomarkers with the ability to discriminate between patients who will or will not respond to therapy places the emphasis on clinical evaluations and patient-reported measures. We view our LDA response models as providing a useful framework to which more sophisticated biomarker-based predictors might be added.
The CART methodology used in our study has been shown to provide more robust analyses of data containing non-linear features, colinearity and interactions than conventional logistic regression analyses.22 This non-parametric tree-based method of modelling is useful to identify the best predictors of treatment response and has a simple and visual interpretation. The validity and reproducibility of CART decision trees is enhanced by the cross-validation technique we used, which provides an estimate of how well any classification tree performs on similar but non-identical datasets. However, despite efforts to avoid model overfitting, all prediction models, including these, should be revalidated in an independent dataset.
There is a tradeoff between the accuracy of prediction and the proportion of patients whose treatment response can be predicted with that degree of accuracy. As we have shown in a simulated data example in figure 4, and using one of the prediction models we derived, there were patients for whom we could predict treatment response or non-response with 90% or greater accuracy; only about 25% of a population such as the patients enrolled in TEMPO could be predicted with this amount of accuracy. A higher proportion of patients could have their treatment response predicted if lower amounts of accuracy are acceptable (eg, 80–85%). The degree of accuracy that clinicians require to make a treatment change at 12 weeks for those predicted to be non-responders is clinician dependent, but it is reassuring that the prediction accuracy of most of the non-responder groups in our models was approximately 90%, and was even higher (98%) for the remission outcome. Further illustrating the tradeoff between the accuracy of prediction and the proportion of patients able to be classified, an analysis by van der Heidje23 showed that patients who did not achieve a change in their DAS28 of greater than 1.2 units at 12 weeks after starting certolizumab pegol had only a 1% likelihood of achieving LDA at 1 year; however, only 13% of patients in the study could be classified as non-responders using this criterion.
In summary, approximately 80–90% of patients in the TEMPO study who initiated etanercept with or without methotrexate could be classified within 12 weeks of starting therapy as likely to have a good response or not at week 52. Additional time on therapy was needed to determine whether to continue or discontinue etanercept for the remaining 10–20% of patients. These exploratory decision trees are specific to our study population; we expect that prediction models may need to vary based on the RA patient population (early vs established disease), the level of baseline disease activity (high vs moderate/low), biological-naive versus biological-experienced patients, and perhaps even the specific agent or treatment regimen used. Separate prediction models might be necessary for each of these different RA patient populations; this remains to be tested. However, at least for RA patients whose clinical and disease characteristics are similar to patients enrolled in TEMPO, these models may assist the clinician in assessing the likelihood of response to TNF inhibitor therapy. This could aid clinical decision-making at an earlier time point. Additional prediction models built with easily measured clinical and laboratory data will probably provide a useful framework to allow for adding predictors of response, including biomarker data.
The authors thank Edward Mancini and Larry Kovalick of Amgen Inc. and Julia R Gage on behalf of Amgen Inc. for assistance with writing the manuscript.
Funding This study was sponsored by Immunex, a wholly owned subsidiary of Amgen Inc. and by Wyeth, which was acquired by Pfizer Inc. in October 2009. Data were obtained from Wyeth (Pfizer). JRC receives support from the Agency for Healthcare Research and Quality (R01HS018517) and the NIH (AR053351).
Competing interests JRC has received research grants from Amgen, Genentech, Bristol-Myer-Squibb, Abbott, Centocor and CORRONA; has consulting arrangements/honoraria from Genentech, UCB, Centocor, Amgen and CORRONA. SY, LC and IN-M have no competing interests to disclose. GSP and BB are compensated employees and shareholders of Amgen Inc. BW is a compensated contractor of Amgen Inc. AK has received research grants from Amgen.
Ethics approval This study was conducted with the approval of the multinational participating centres, obtained from local institutional review boards.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.