Article Text

Download PDFPDF

Revisiting the use of remission criteria for rheumatoid arthritis by excluding patient global assessment: an individual meta-analysis of 5792 patients
  1. Ricardo J O Ferreira1,2,
  2. Paco M J Welsing3,
  3. Johannes W G Jacobs3,
  4. Laure Gossec4,5,
  5. Mwidimi Ndosi6,
  6. Pedro M Machado7,8,9,
  7. Désirée van der Heijde10,
  8. Jose A P Da Silva1,11
  1. 1 Rheumatology, Centro Hospitalar e Universitário de Coimbra EPE, Coimbra, Portugal
  2. 2 Health Sciences Research Unit: Nursing (UICISA: E), Escola Superior de Enfermagem de Coimbra, Coimbra, Portugal
  3. 3 Rheumatology and Clinical Immunology, UMC Utrecht, Utrecht, The Netherlands
  4. 4 Institut Pierre Louis d'Epidémiologie et de Santé Publique, INSERM, Sorbonne Université, Paris, France
  5. 5 Rheumatology, Pitié Salpêtrière Hospital, AP-HP, Paris, France
  6. 6 Faculty of Health and Applied Sciences, University of the West of England Bristol, Bristol, UK
  7. 7 Centre for Rheumatology & Department of Neuromuscular Diseases, University College London, London, UK
  8. 8 Rheumatology, University College London Hospitals NHS Foundation Trust, London, UK
  9. 9 Rheumatology, Northwick Park Hospital, London North west UniversityHealthcare NHS Trust, London, UK
  10. 10 Rheumatology, Leiden University Medical Center, Leiden, Zuid-Holland, The Netherlands
  11. 11 Clínica Universitária de Reumatologia, and i-CBR Coimbra Institute for Clinical and Biological Research, Faculty of Medicine, University of Coimbra, Coimbra, Portugal
  1. Correspondence to Professor Jose A P Da Silva, Rheumatology, Centro Hospitalar e Universitario de Coimbra EPE, 3000-076 Coimbra, Portugal; jdasilva{at}ci.uc.pt

Abstract

Objectives To determine the impact of excluding patient global assessment (PGA) from the American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) Boolean remission criteria, on prediction of radiographic and functional outcome of rheumatoid arthritis (RA).

Methods Meta-analyses using individual patient data from randomised controlled trials testing the efficacy of biological agents on radiographic and functional outcomes at ≥2 years. Remission states were defined by 4 variants of the ACR/EULAR Boolean definition: (i) tender and swollen 28-joint counts (TJC28/SJC28), C reactive protein (CRP, mg/dL) and PGA (0–10=worst) all ≤1 (4V-remission); (ii) the same, except PGA >1 (4V-near-remission); (iii) 3V-remission (i and ii combined; similar to 4V, but without PGA); (iv) non-remission (TJC28 >1 and/or SJC28 >1 and/or CRP >1). The most stringent class achieved at 6 or 12 months was considered. Good radiographic (GRO) and functional outcome (GFO) were defined as no worsening (ie, change in modified total Sharp score (ΔmTSS) ≤0.5 units and ≤0.0 Health Assessment Questionnaire–Disability Index points, respectively, during the second year). The pooled probabilities of GRO and GFO for the different definitions of remission were estimated and compared.

Results Individual patient data (n=5792) from 11 trials were analysed. 4V-remission was achieved by 23% of patients and 4V-near-remission by 19%. The probability of GRO in the 4V-near-remission group was numerically, but non-significantly, lower than that in the 4V-remission (78 vs 81%) and significantly higher than that for non-remission (72%; difference=6%, 95% CI 2% to 10%). Applying 3V-remission could have prevented therapy escalation in 19% of all participants, at the cost of an additional 6.1%, 4.0% and 0.7% of patients having ΔmTSS >0.0, >0.5 and >5 units over 2 years, respectively. The probability of GFO (assessed in 8 trials) in 4V-near-remission (67%, 95% CI 63% to 71%) was significantly lower than in 4V-remission (78%, 74% to 81%) and similar to non-remission (69%, 66% to 72%).

Conclusion 4V-near-remission and 3V-remission have similar validity as the original 4V-remission definition in predicting GRO, despite expected worse prediction of GFO, while potentially reducing the risk of overtreatment. This supports further exploration of 3V-remission as the target for immunosuppressive therapy complemented by patient-oriented targets.

  • rheumatoid arthritis
  • patient perspective
  • outcomes research
  • inflammation
  • disease activity

Statistics from Altmetric.com

Key messages

What is already known about this subject?

  • Few previous studies compared the prediction of good structural and functional outcomes between patients who fulfilled all four criteria of the current American College of Rheumatology/European League Against Rheumatism Boolean-based definition of remission (‘4V-remission’) versus those who attained only three (‘3V-remission’), that is, excluding patient global assessment (PGA). No significant differences were found, but the two groups of patients evaluated significantly overlap.

What does this study add?

  • This was the first study comparing these outcomes between patients achieving 4V-remission (23%) and those missing this status due solely to PGA above 1/10 (4V-near-remission) (19%). It is based on individual patient data meta-analysis of 11 recent clinical trials in rheumatoid arthritis (5792 patients).

  • The rate of good radiographic outcome (≤0.5 units progression over the second year) was numerically higher in patients in 4V-remission (81%; 95% CI 74% to 87%) than in those in 4V-near-remission (78%; 95% CI 69% to 86%), but the difference is not statistically significant.

  • In this population, if a ‘treat-to-remission’ strategy had been applied, the 3V-remission definition would have prevented therapy escalation in 19% of all patients, at the cost of an additional 6.1%, 4.0% and 0.7% of patients having a change in modified total Sharp score >0.0, >0.5 and >5 units over 2 years, respectively.

How might this impact on clinical practice or future developments?

  • These results suggest that the use of 3V-remission as the target for immunosuppressive therapy, together with a separate assessment of disease impact on patient’s lives, a dual target approach, deserves further consideration and research.

Video Abstract

Introduction

Disease remission has become the guiding target in the management of rheumatoid arthritis (RA), as it conveys the best possible outcomes.1 Current treatment recommendations advise that remission (or at least low disease activity) should be attained as soon and as consistently as possible, and changes in treatment should be considered when this does not happen.2 3

The most influential and authoritative definition of remission was published in 2011 under the auspices of the American College of Rheumatology (ACR), the European League Against Rheumatism (EULAR) and the Outcome Measures in Rheumatology (OMERACT) groups.4 A Boolean-based definition was endorsed, and requires that scores of tender and swollen 28-joint counts (TJC28 and SJC28), C reactive protein (CRP, mg/dL) and patient global assessment of disease activity (PGA, 0–10 scale) are all ≤1.4

The inclusion of PGA in the definitions of remission in RA was justified because it added predictive value for later good radiographic and functional outcomes while conveying the much-needed patient’s perspective.4

Despite this, the inclusion of PGA remains controversial.5–9 Using the definitions mentioned previously, studies in different clinical practice cohorts10–15 have reported that as many as 10%13 to 38%14 of all patients with RA do not reach remission solely due to a PGA score >1, a state that has become designated as ‘4V-near-remission’.14 16 Moreover, it has been demonstrated that PGA bears little relationship with markers of the disease process, which drives structural damage, rather reflecting pain, fatigue and function.9 17 18 This is especially evident when analyses are restricted to the lower levels of disease activity, in the range where the definition of remission has a decisive impact on whether to maintain or to escalate immunosuppressive treatment. According to this perspective, patients in 4V-near-remission would not benefit from additional immunosuppression, as this cannot be expected to improve their condition or foster remission,9 17 and are exposed by current recommendations to the risk of overtreatment and unjustified side effects.19

These observations have led to the suggestion that the patients’ interest would be better served by the adoption of two separate complementary targets: the first focused on remission of the inflammatory process, guided by an instrument without PGA; the second focused only on patient-reported impact measures.9 16 20 However, this proposal would not be sustainable if, as suggested in the original ACR/EULAR/OMERACT paper, removing PGA from the Boolean-based remission significantly diminishes its ability to predict good radiographic and functional outcome.4 A systematic literature review indicated that, among the individual components included in the definitions of remission, only swollen joints and acute phase reactants are associated with radiographic progression.21 Two other studies, using data from a clinical cohort13 and from clinical trials,22 compared the prediction of good radiographic outcome by ‘4V-remission’ versus ‘3V-remission’ (without PGA) achieved in patients with RA: no significant differences were observed, but the two groups were not mutually exclusive. No study has ever compared the radiographic outcomes between the 4V-remission and 4V-near-remission groups.

The primary aim of this study was to compare 4V-near-remission and 4V-remission regarding their association with radiographic damage progression. Secondarily, we aimed to explore the impact of using 3V-remission instead of 4V-remission in patients with RA, both in terms of prevalence of remission and association with structural damage progression and functional impairment.

Methods

Design and study selection

This was an individual patient data meta-analysis of published randomised controlled trials (RCTs) selected through a systematic literature review. The study protocol was registered in PROSPERO with the number CRD4201705709923 and published elsewhere.24

RCTs were included if they tested the efficacy of biological disease-modifying antirheumatic drugs (bDMARDs) on ≥2-year radiographic outcomes in patients fulfilling the 1987 ACR or the 2010 ACR–EULAR criteria for RA.25 26 Information on the processes of identifying and selecting studies, as well collecting data are reported in the protocol.24

Risk of bias assessment of individual studies

Studies selected for retrieval were assessed by two independent reviewers (RJOF and MN) for methodological validity prior to inclusion in this review, using the ‘Risk of Bias 2’ tool.27 Any disagreements between the reviewers were resolved through discussion, or with a third reviewer (JAPS). The full protocols of the studies were consulted, and their authors contacted to request missing or additional data for clarification, where required.

Specification of outcomes

Primary outcome

The primary outcome of this study was the percentage of individuals with a good radiographic outcome (GRO) during the second year of the trial (ie, between month 12 and month 24), defined as a change (Δ) ≤0.5 units in the van der Heijde modified total Sharp score (mTSS).28

This ≤0.5 cut-off is preferred29–31 over the one used in the ACR/EULAR pivotal publication (≤0 cut-off) because 0.5 is the optimal cut-off if the average of two readers is used,32 as it allows to the very minimum difference of 1 unit out of 448 between the two readers.

Secondary outcomes

Two secondary endpoint cut-offs were used to define good radiographic outcome during the second year of the trial:

  1. ΔmTSS ≤5 units, a higher, frequently used rate (sometimes referred to as clinically non-relevant radiographic progression);

  2. ΔmTSS ≤0 units, to allow comparisons with the results obtained in the ACR/EULAR study.4

Also as secondary outcome, we studied the percentage of individuals with a good functional outcome (GFO) during the second year of the trial (ie, between month 12 and month 24), defined as no worsening, that is, a change (Δ) ≤0.0 units in the Health Assessment Questionnaire–Disability Index (HAQ-DI). This definition has been preferred over the one used in the ACR/EULAR pivotal publication (ΔHAQ ≤0.0 and HAQ ≤0.5 at both time points) because this is believed to be too strict, representing a better outcome even than expected for general population.4 33 Despite this consideration, this definition of GFO was also tested to allow comparison with the original ACR/EULAR paper.

Comparisons: mutually and non-mutually exclusive definitions of remission

Analyses were based on different definitions of remission states, assessed at two time points, 6 months and 12 months, following the methodology adopted by the ACR/EULAR committee,4 as follows:

1. ACR/EULAR Boolean-based remission,4 also designated in this study as ‘4V-Remission’ (ie, TJC28 ≤1, SJC28 ≤1, CRP ≤1 mg/dL and PGA ≤1/10).

2. ‘4V-near-remission’,11 14 defined as TJC28 ≤1, SJC28 ≤1, CRP ≤1 mg/dL and PGA >1.

3. ‘Non-remission’ defined as TJC28 >1 and/or SJC28 >1 and/or CRP >1 mg/dL, irrespective of PGA value.

The three definitions are mutually exclusive, that is, each patient was categorised in one group only.

4. ‘3V-remission’ defined as TJC28 ≤1, SJC28 ≤1 and CRP ≤1 mg/dL. This is a combination of 4V-remission and 4V-near-remission—patients classified in 4V-remission also meet the 3V-remission criteria (figure 1).

Figure 1

Definitions of remission tested in the study. Legend: CRP, C reactive protein, mg/dL; PGA, patient global assessment, range 0–10=worst; SJC28, swollen 28-joint count, range 0–28; TJC28, tender 28-joint count, range 0–28. Footnote: In general, in no remission states, disease-modifying antirheumatic drug (DMARD) therapy will be intensified, while at remission states, DMARD therapy will be unchanged or tapered. The no remission/4V-near-remission state (hatched) has a risk of overtreatment if DMARD therapy is intensified.

Figure 2

Flowchart with the process of study identification and data access. IPD, individual patient data; RA, rheumatoid arthritis; RCT, randomised controlled trial.

All definitions of remission were considered fulfilled if they were achieved at 6 or 12 months’ follow-up and patients were classified according to the most stringent definition they satisfied (for instance, if a patient was in 4V-near-remission at 6 months and in 4V-remission at 12 months, he/she was classified as in 4V-remission).

Data analysis and synthesis

Data analysis

All ‘primary’ analyses were performed with SAS software (V.9.3), within the online secure platforms. For each trial, we determined the number of patients with GRO in each definition group (4V-remission, 4V-near-remission, 3V-remission and non-remission). The rates of true positive (TP), that is, remission and GRO; true negative (TN), that is, non-remission and not-GRO; false negative (FN), that is, non-remission and GRO; and false positive (FP), that is, remission and not-GRO, cases were also determined for all definitions. The percentage of patients with accurate prediction of having and not having GRO were also determined (sum of TP and TN) for the 4V-remission and 3V-remission. Missing data were not substituted. Similar analyses were performed for the secondary outcomes.

Meta-analysis

Frequency of remission status and outcomes

The frequency/proportion of each remission state observed in each of the trials were meta-analysed, irrespective of the treatment arm. The same procedure was used to determine the pooled prevalence of GRO and GFO according to remission status.

Primary analysis

Likelihood of achieving GRO for 4V-near-remission compared with 4V-remission and with non-remission

From our hypothesis that PGA might lead to false-negative rating of remission when using the 4V-remission definition, we aimed to analyse the value of 3V-remission definition, excluding PGA. Direct comparison of 4V-remission and 3V-remission however is not possible, given the overlap between the two states (see figure 1). Therefore, for each trial, we determined the differences in the proportion/chance (∆ proportion) of GRO (∆mTSS ≤0.5) between 4V-near-remission and 4V-remission, mutually exclusive states, and then pooled these differences with the random-effects model to obtain an overall estimate of the difference (with 95% CI). We also compared this between 4V-near-remission and non-remission states. The risk ratio or relative risk (RR, 95% CI) for GRO between these groups were also calculated.

Secondary analyses

The likelihood of achieving each of the secondary outcomes for 4V-near-remission compared with 4V-remission and with non-remission was assessed using similar methods for the different definitions.

Sensitivity analyses

Different sensitivity analyses were performed regarding radiographic progression. The first was to explore the likelihood of GRO between remission states after excluding the seemingly outlier trials.

The second was a multivariate analysis. Multivariate logistic regressions were performed in each trial to explain GRO (dependent variable) using the mutually exclusive remission states as independent variables, adjusted for important covariates at baseline: gender, age, disease duration (except for three trials due to >50% of missing data in this covariate), rheumatoid factor status, level of radiographic damage and treatment arm. The OR obtained in each trial and its 95% CI and SE were meta-analysed to obtain the pooled OR of GRO comparing different mutually exclusive remission states. However, we hypothesise that this covariate adjustment may constitute an overcorrection because patients in remission are ‘naturally’ different from patients not in remission regarding these prognostic factors. For this reason, these sensitivity analyses are presented cautiously and only in online supplemental material.

The third was to clarify the value of PGA as a predictor of radiographic damage progression, selecting only the patients in 4V-near-remission (in 8 of the 11 trials, 796 patients, due to restrictions in accessing the data). We used Poisson regression models with 2y mTSS as dependent variable and PGA as independent variable. To assess the specific, independent impact of PGA, we corrected for SJC28, TJC28 and CRP, determined as the mean of the observation at 6 and 12 months, by also introducing them as independent variables, together with baseline mTSS. To allow the combined analysis the different variables, we standardised their values using z-scores. A meta-analysis was then performed to obtain pooled rate ratios (RR with 95% CI) per variable.

The last was to explore the proportion of patients in 3V-remission (8 trials; 1937 patients) who have radiographic damage progression ≥0.5 and those who have radiographic progression ≥5 during year 2, according to PGA score ≤1 versus >1 at 6 and 12 months.

Likelihood of reaching good radiographic and functional outcomes with 4V-remission compared with 3V-remission

If the null hypothesis of this study (the chance of GRO in 4V-near-remission group are similar to the 4V-remission group) is not rejected, the current 4V-remission and the proposed 3V-remission can be compared in terms of their positive (LR+) and negative likelihood ratios (LR−) of GRO per remission group. The TP, TN, FN and FP values were used to synthesise these measures. Similar procedures were performed regarding GFO.

All meta-analyses were performed with the OpenMeta[Analyst] software,34 using the DerSimonian-Laird random-effects method35 and the arcsine-transformed proportion.36 STATA software (V.14) was used only to determine OR adjusted to covariates (sensitivity analyses). The I2 of Higgins and Thompson was calculated to quantify heterogeneity.37

Results

Studies and participants

From a total of 27 identified studies, we were granted access to 17 through secure online platforms, but only 11 trials reported radiographic damage progression during the second year, thus allowing inclusion in the final analyses. Reasons for the non-inclusion of 16 out of the 27 trials initially identified are described in figure 2 and online supplemental table S1. The critical appraisal results for each of the 11 RCTs are summarised in online supplemental figure S1 (low risk of bias in all items assessed for all the trials). We had access to data from 100% of the randomised patients in 9 out of the 11 trials and from 93% of patients in the remaining two, resulting in a total sample of 8114 patients. Most trials tested anti-TNFα therapies (n=9), and included patients with insufficient response to methotrexate (n=7) and with established disease (>2 years) (n=9)—online supplemental table S2. The mean (SD) DAS28CRP3v ranged from 4.7 (0.9) to 5.3 (0.8) at baseline. The van der Heijde mTSS was used as the scoring method of radiographic damage progression in 10 of the trials. The remaining used the Genant method. The mean mTSS at baseline ranged from 5.9 (14.5) to 69.0 (55.8) (online supplemental table S2).

Altogether, 2322 patients (29%) were excluded from the final analyses (online supplemental table S3). The main reason for exclusion was the lack of data on radiographic outcome (71% of all cases). Those excluded from these analyses were older (1.3 years on average), reported higher PGA and HAQ, and had more active disease according to physician’s global assessment. Regarding disease status at 6 or 12 months, 305 of the excluded patients had no data and the remaining 2017 had lower rates of 4V-remission and higher rates of non-remission, compared with those included.

Frequency of remission status, radiographic and functional outcomes

A total of 5792 (71%) patients had information on both the remission definition and on the primary outcome (radiographic progression) (table 1). Pooled meta-analytic frequency (95% CI) of 4V-remission at 6 or 12 months was 23.0% (18.0% to 28.0%), while for 4V-near-remission, it was 18.9% (15.4% to 22.1%), considering all treatment arms together (table 1).

Table 1

Frequency of remission and good radiographic outcome in the included studies

Good radiographic outcome was observed in 74.1% (66.2% to 82.0%) of all patients using the primary cut-off (∆mTSS ≤0.5) and by 94.6% (92.9% to 96.4%) using ∆mTSS ≤5 (table 1). Good functional outcome, which could only be assessed in eight RCTs (3904 patients), was observed in 70.6% (66.7% to 73.5%) of all patients using the elected cut-off (∆HAQ-DI ≤0.0), and by 31.1% (24.9 to 37.2%) using ∆HAQ-DI ≤0.0 and HAQ-DI ≤0.5 (table 1).

Likelihood of reaching good radiographic outcome for patients in 4V-near-remission compared with patients in 4V-remission and with patients in non-remission

Overall, the proportion of GRO for the primary score (∆mTSS ≤0.5) was high (71.8% to 81.1%) for the three mutually exclusive remission categories (table 2). The proportion of patients with GRO did not differ significantly between those in 4V-near-remission and 4V-remission: −2.9% (95% CI −7.3% to +1.5%). Patients in 4V-near-remission had a significantly higher chance of achieving GRO compared with patients in non-remission (+6.2%; 95% CI 2.3% to 10.1%). Results for these comparisons are shown in table 2 and figure 3. Similar observations were made for GRO defined as ∆mTSS ≤5 (table 2). None of the differences was statistically significant when ∆mTSS ≤0 was used (table 2).

Figure 3

Meta-analyses of risk ratio of obtaining good radiographic outcome (∆mTSS ≤0.5 units); 4V-near-remission vs 4V-remission and vs non-remission. Legend: 4V-remission=SJC28, TJC28, CRP (mg/dL) and PGA (0–10), all ≤1; 4V-near-remission=SJC28, TJC28 and CRP (mg/dL) ≤1 and PGA (0–10) >1; non-remission=SJC28 >1 and/or TJC28 >1 and/or CRP (mg/dL) >1, irrespective of PGA value; at 6 or 12 months of follow-up in all cases. CRP, C reactive protein; ∆mTSS, change in the modified total Sharp score during the second year of follow-up; GRO, good radiographic outcome; PGA, patient global assessment; SJC28/TJC28, swollen/tender 28-joint counts.

Table 2

Pooled outcomes* and measures of association between remission categories and good radiographic and good functional outcomes, during the second year of follow-up

We performed a sensitivity analysis by excluding the three apparent outliers in figure 3 (the DE019, GO-FURTHER and TEMPO trials) which confirmed no significant difference in the meta-analytic RRs (∆mTSS ≤0.5) between 4V-remission and 4V-near-remission (RR 0.99; 95% CI 0.95 to 1.03).

Likelihood of reaching good functional outcome for patients in 4V-near-remission compared with patients in 4V-remission and with patients in non-remission

Overall, the proportion of GFO for the elected outcome (∆HAQ-DI ≤0.0) was high (68.8% to 77.6%) for the three mutually exclusive remission categories (table 2). The proportion of patients with GFO was significantly lower in 4V-near-remission than 4V-remission: −11.0% (95% CI −16.3% to −5.7%). Patients in 4V-near-remission had a similar chance of achieving GFO compared with patients in non-remission (−2.2%; 95% CI −6.8% to +2.4%). The differences between 4V-near-remission and 4V-remission were more striking for the GFO defined as ΔHAQ-DI ≤0 and HAQ-DI ≤0.5: −39.6% (95% CI −48.4% to −30.9%). The difference between 4V-near-remission and non-remission was non-significant (+1.7%; 95% CI −7.4 to +10.8).

Comparison of the 4V-remission and the proposed 3V-remission regarding prediction accuracy for radiographic and functional outcome

Having shown that the difference in the probability of GRO between 4V-remission and 4V-near-remission was neither statistically nor clinically relevant,38 we were allowed to evaluate the difference between the 4V-remission and 3V-remission (the latter combining the 4V-near-remission and 4V-remission) groups (table 3). The results indicated that the likelihood ratio of having GRO (ΔmTSS ≤0.5) was higher for patients in 4V-remission compared with 4V-non-remission (LR+=1.36, 1.15 to 1.61) than between patients in 3V-remission versus 3V-non-remission (LR+=1.26; 1.13 to 1.41), although there was a large overlap in 95% CIs. Conversely, the likelihood of having GRO in the absence of remission was significantly smaller for the 3V-remission (LR−=0.86; 0.79 to 0.94) and non-significant for the 4V-remission (LR−=0.92; 0.81 to 1.04) versus their counterparts (table 3).

Table 3

Meta-analyses of good outcomes likelihood ratios for the 4V-remission and 3V-remission status

The same comparisons were made regarding functional outcomes (table 3). The likelihood ratio of having GFO (ΔHAQ≤0.0) was significantly higher for patients in 4V-remission compared with in 4V-non-remission (LR+=1.34; 1.16 to 1.54), while it was not significantly different between patients in 3V-remission versus 3V-non-remission (LR+=1.08; 0.99 to 1.17). Contrariwise, the likelihood of having GFO in the absence of remission was not significantly different from that for either the 3V-remission (LR−=0.94; 0.88 to 1.02) or the 4V-remission (LR−=0.90; 0.79 to 1.02) versus their comparator groups (table 3).

The proportion of patients whose prediction of GRO was accurate (=TP+TN) was, overall, quite low for both definitions of remission (≤53%). It was, however, higher for the 3V-remission definition than for the 4V-remission definition: 6.5%, 10.6% and17.2% higher at ΔmTSS ≤0.0, ≤0.5 and ΔmTSS ≤5, respectively (see figure 4). As expected, the improved accuracy of the 3V-remission is a result of a substantially lower percentage of FN, that is, patients without remission who do not have radiographic progression, at the cost of a much smaller increase in the percentage of FP, that is, the patients with remission who do have progression.

Figure 4

Pooled meta-analytic prediction accuracy of 4V-remission and 3V-remission status for the good radiographic and functional outcomes. Footnote: The sum of the meta-analytic percentages of TP, FN, FP and TN is slightly less than 100% due to error estimation when multi-category (k>2) prevalence is estimated.35 All meta-analyses used double arcsine transformation as the preferred method to correct this situation.35 The panels from A to F include 5792 analysed patients (11 randomised controlled trials (RCTs)), E and F include 3904 (8 RCTs), and G and H 5262 analysed patients (11 RCTs). Legend: 4V-remission=SJC28, TJC28, CRP (mg/dL) and PGA (0–10), all ≤1; 3V-remission=SJC28, TJC28 and CRP (mg/dL) ≤1; ΔHAQ, change in Health Assessment Questionnaire score; ∆mTSS, change in the modified total Sharp score from 12 months to 24 months; CRP, C reactive protein; FN, false negative; FP, false positive; PGA, patient global assessment; SJC28, swollen 28-joint count; TJC28, tender 28-joint count; TN, true negative; TP, true positive; accurately predicted=TP+TN. Between brackets is the pooled 95% CI.

Regarding the elected definition of GFO, the proportion accurately predicted with the 3V definition (50.3%; 46.0 to 54.6) was significantly higher than with the 4V definition (43.8%; 40.9 to 46.6). The percentage accurately predicted was much higher for the alternative definition of GFO, the statistically significant difference being favourable for the 4V definition.

Figure 5 presents a ‘clinical eye’s’ summary of good/bad radiographic outcomes observed according to the current and the proposed (3V) Boolean-based definitions of remission (95% CI and I2 statistics are presented in online supplemental table S4). Overall, 73.3% (95% CI 63.9% to 81.8%) of the patients in non-4V-remission still had GRO (ΔmTSS≤0.5), and the same was observed for 71.8% (95% CI 62.1% to 80.5%) of those in non-3V-remission. The percentages of GRO increase to 81.1% (95% CI 74.4% to 86.9%) and 79.6% (95% CI 72.2% to 86.1%) among those in 4V-remission and 3V-remission, respectively. None of these differences were statistically significant.

Figure 5

Reclassification of remission status and respective radiographic outcomes (n=5792). Percentages were calculated through meta-analyses. Footnote: Excluding PGA from the remission of remission (3V-remission) almost duplicated the percentage of patients in remission but showed only a slight increase in the rate of bad outcome when compared with 4V-remission. The radiographic outcome in the group of patients who had no overt signs of inflammation but who presented with high PGA (4V-near-remission) was also not statistically different from patient in 4V-remission. Legend: 4V-remission=SJC28, TJC28, CRP (mg/dL) and PGA (0–10), all ≤1; 4V-near-remission=SJC28, TJC28, CRP (mg/dL) ≤1 and PGA (0–10) >1; non-remission=SJC28 >1 and/or TJC28 >1 and/or CRP (mg/dL) >1, irrespective of PGA value; 3V-remission=SJC28, TJC28, CRP (mg/dL) ≤1. All definitions as observed at 6 or 12 months. Note: CIs and I2 statistics of pooled radiographic outcomes can be found in online supplemental table S4. ∆mTSS, change in the modified total Sharp score during the second year of follow-up; CRP, C reactive protein; PGA, patient global assessment; SJC28/TJC28, swollen/tender 28-joint counts.

The overall proportion of patients achieving 3V-remission was almost double of those reaching 4V-remission (41.9% vs 23.0%).

Sensitivity analyses

Adjustment to co-factors

The models adjusted for co-factors for the same comparisons showed even smaller differences between 4V-near-remission and 4V-remission categories regarding the prediction of good radiographic outcomes (online supplemental tables S5 and S6).

Exploration of radiographic damage in 4V-near-remission

Within the subgroup of patients in 4V-near-remission, PGA (at 6 and 12 months) is not a statistically significant predictor of radiographic progression over 2 years (RR 1.05 per SD unit increase, 95% CI 0.93 to 1.16); similarly, non-significant results were obtained for SJC28 and TJC28 (both 0 vs 1 in this subgroup): RR 1.09; 95% CI 0.90 to 1.27, and RR 0.86; 95% CI 0.68 to 1.04, respectively. Only CRP was a (borderline) statistically significant predictor of radiological progression (RR 1.06, 95% CI 1.00 to 1.12).

Radiographic damage progression according to PGA

In the subgroup of patients reaching 3V-remission, a ∆mTSS >5 units was observed in 2.3% (95% CI 1.0% to 4.3%) of patients scoring PGA >1 and in 1.3% (0.6 to 2.3%) of those with PGA <1. The corresponding values for ∆mTSS >0.5 units were 18.4% (13.8% to 23.5%) and 15.2% (9.9% to 21.4%), respectively (online supplemental table S7).

Discussion

This is the first study assessing the prevalence of 4V-near-remission in RCTs and the first comparing radiographic damage progression between patients in 4V-near-remission and in 4V-remission. The pooled rate of 4V-near-remission was almost the same of 4V-remission (19% vs 23%). These mutually exclusive groups did not differ significantly in terms of subsequent radiographic damage accrual. Patients in 4V-near-remission had a significantly better radiographic outcome than those in non-remission.

These observations legitimised the next step in our analyses: to explore the implications of choosing between the 3V and the 4V definitions of remission. The odds of good structural outcome were slightly higher for the 4V-remission, but without statistical or, in our view, clinical significance. The 3V-remission showed a better performance in terms of true estimations of significant damage (ie, sum of TP and TN estimations). If a ‘treat-to-remission’ strategy had been applied in this population, the 3V-remission definition would have prevented therapy escalation in 19% of all participants when compared with the 4V-remission. This would occur at the cost of having an excess of 6.1% of patients having a ΔmTSS >0.0, 4.0% of patients having a ΔmTSS >0.5 and of 0.7% having ΔmTSS >5 units. These trade-offs may be differently valued by different observers. Our proposal to use the 3V-remission definition is also rooted in solid clinical common sense: a (major) part of patients who fail remission solely because of PGA is not be expected to benefit from additional immunosuppressive therapy, as PGA does not reflect disease activity in these patients. However, clinical judgement is needed as to decide in individual patients whether the PGA level >1 indicates residual disease activity that might be successfully treated with more intensive RA treatment, or reflects another cause, for which more intensive RA treatment would be unnecessary and potentially harmful. Guiding definitions and recommendations should always be aligned with good clinical wisdom.

The data also emphasises that all remission concepts have a relatively poor predictive value regarding radiographic damage, as shown by low LRs (although better in 4V-remission) and predictive accuracies below 53% (better in 3V-remission). This reflects the fact that 73% of patients in non-4V-remission had good radiographic outcomes and 19% of those in 4V-remission still presented radiographic progression (∆mTSS >0.5).

4V-remission was associated with significantly higher rates of GFO (77.6%) compared with 4V-near-remission (66.9%); this latter rate is similar to that observed in non-remission (68.8%). The differences were more marked in favour of a 4V-remisision if the definition of GFO adopted by the ACR/EULAR committee was used (4V-remission=60.5%, 4V-near-remission=22.5%, non-remission=21.2%). Positive likelihood ratios also favoured 4V-remission, while negative LRs did not reach significance in favour of 4V-near-remission. The predictive accuracy of 3V-remission for the elected functional outcome was numerically better than for 4V-remission, nearly reaching statistical significance.

The results regarding functional outcome demand a critical appraisal. Overall, PGA and HAQ-DI are correlated to the level r=0.5 to 0.7. In higher disease activity states, both PGA and HAQ-DI predominantly reflect disease activity. In remission, they are expected to remain correlated, even if one assumes (as we do) that neither of them substantially reflects inflammation at this stage, because they are essentially determined by similar subjective factors and comorbidities.9 14 17 39 It follows that, irrespective of disease activity, PGA is bound to predict HAQ-DI, and this obviously questions the use of HAQ-DI to assess the use of PGA, especially in a definition of remission, if it is intended to guide decisions on immunosuppressive therapy. The current results confirm this interpretation: How else could we coherently explain that, also in our study, 4V-remission is associated with significantly higher prevalence of GFO than 4V-near-remission if these two conditions share similar levels of SJC28, TJC28 and CRP (all ≤1) and similar levels of radiographic progression? The only difference is PGA.

The robustness of this work is supported by (1) the use of individual patient data, allowing uniform analyses procedures, (2) the availability of data collected under stringent RCT conditions, (3) the inclusion of over 5700 patients and (4) the use of both crude and adjusted statistical analyses. This study also has potential limitations and biases. The definition of remission was based only on two independent time-points (6 or 12 months) and used to predict radiographic progression over the following year. Although this was also the methodology used by the ACR/EULAR group,4 it is recognised that alternative ways exist to quantify sustained remission, which might be useful both in understanding the construct of remission and investigating its relationship with structural damage accrual.4 Good outcome was assessed only within the second year after randomisation. Although this is the efficacy endpoint used in most trials, longer follow-up assessment could provide different results.40 When 3V-remission is agreed to be an acceptable endpoint for evaluating disease-modifying treatment in RA, the ability of the 3V-remission definition to detect differences between (effective) treatments, that is, its responsiveness, should be established and compared with that of 4V-remission and other established trial endpoints in RA. Patients with missing data, excluded from the analysis, had higher PGA and HAQ-DI scores and more active disease at 6 and 12 months, but they were not significantly different with regards to other factors recognised as relevant for radiographic outcome. The exclusion of these patients might have changed the relationship between disease activity status and the outcomes under consideration in an unknown direction. It should be noted that we did not analyse within-trial arms and used the data of clinical trials as in observational studies, therefore discarding the effects of randomisation. As patients fulfilled inclusion criteria for RCTs, generalisability of our results is limited to patients with high disease activity starting treatment. In 7 out of the 11 RCTs, joint assessments were performed by independent assessors, and the 4 other studies did not use an independent joint assessor. We do not know whether this may have affected the (interpretation of the) results of our study in any way. Finally, some changes to the published protocol for this study need to be disclosed, namely the use of ΔmTSS ≤0.5 units as the primary outcome instead of the ≤0 cut-off, for the reasons outlined in the methods section.

The most relevant implications of this study for clinical practice and research relate to the most appropriate definition of remission and its use as the guiding target for therapy. Our results demonstrate that patients in 4V-near-remission do not differ significantly from those in 4V-remission in terms of radiographic damage accrual, while they can be clearly separated from those in non-remission. This supports the aggregation of the first two groups, that is, the proposed 3V-remission definition. Contrary to ACR/EULAR,4 but in line with previous and current evidence,13 21 22 41 our results demonstrated that the 3V-remission definition does not significantly diminish the ability to predict structural damage, while it may significantly reduce the risk of overtreatment, but this should be validated in clinical settings.19 20 The implications of these observations should be further tested in the remission definitions based on composite indices Simplified Disease Activity Index and Clinical Disease Activity Index, as also endorsed by ACR/EULAR.

The ACR/EULAR committee also addressed the 3V-definition and reached the opposite conclusion.4 This may be explained by differences in methodology and reasoning. First, ACR/EULAR tested one single and very strict cut-off to define good radiographic outcome (ΔmTSS ≤0), which is, in our view, excessively stringent, as it does not even allow for a difference of one unit in change score in the total of 448 joints assessed by the two radiograph assessors, which is averaged to 0.5. Both cut-offs are well below the smallest detectable change within one subject: 2–3 units according to an OMERACT expert panel.38 However, in our study, the ΔmTSS ≤0 was the one with more favourable results for the 4V compared with the 3V-remission in terms of GRO prediction, predictive accuracy and rate of FN, but not in LR, for which the ΔmTSS ≤0.5 was more favourable. While considering these issues, one should take into account that ΔmTSS=1 has been estimated to justify a decrease of the HAQ score of only 0.01.42 Second, the ACR/EULAR committee limited their analysis to 4V versus 3V, which significantly overlap, thus ‘diluting’ the characteristics of a very unique group of patients: 4V-near-remission. Also, the number of patients analysed by ACR/EULAR was much lower. Furthermore, the decision of the ACR/EULAR committee was, seemingly, strongly influenced by the much better prediction of good functional and ‘overall’ good outcomes for the 4V-remission versus the 3V-remission. This position was recently reaffirmed.22 The reasons why we disagree with this approach are presented previously. Furthermore, the ACR/EULAR study analysed primarily the methotrexate-alone treatment groups of 3 trials, while we included all arms in each of 11 trials. This may explain why our likelihood ratios of GRO between 4V-remission and non-remission are much lower than the ACR/EULAR study, given that inhibition of radiographic damage by bDMARDs has been demonstrated even in the absence of remission, thus reducing the predictive accuracy of disease activity for radiographic damage.43–45 However, we performed a sensitivity analysis, using data from patients in the monotherapy bDMARD arms (in nine RCTs), which showed that bDMARDs indeed reduce structural damage, and result in GRO in the majority, but not universally. Altogether, 28% of all patients exposed to bDMARDs monotherapy presented ΔmTSS ≥0.5 (11% to 57% in the individual trials; data not shown). In summary, we believe that our approach is valid and provides a better representation of current clinical practice. However, it will not fit contexts where access to bDMARDs is severely limited. Finally, the selection of tools by the ACR/EULAR committee was “based (…) on the need to include patient-reported outcomes”, among other factors.4 PGA was selected because it is associated with better prediction of the combination of radiographic and functional outcome.4 While this is valid in the overall spectrum of disease activity, this argument is no longer true when the disease process is under control (SJC28, TJC28 and CRP ≤1) as demonstrated in this study and elsewhere.17 It has been proposed to raise the cut-off value of PGA,22 46 47 but this is at best a partial solution: we previously found that among 4381 international patients in 3V-remission, 63% scored PGA >1, but still 44% scored it >2, 32% >3 and 0.6% scored PGA as high as 10.17 In addition, PGA at low disease activity states is essentially determined by subjective factors and comorbidities,9 17 18 in contrast to, for example, swollen joint counts and CRP. The current study shows that PGA has no significant relationship with radiographic damage progression, both by comparing the 4V and 3V remission groups and by analysing the relationship between the two parameters within the specific group of patients in 4V-near-remission. These observations support our view to leave it out of the treatment target definition used to control inflammation (biological remission).

It has been recognised that treating to target often leaves room for improvement.48 For patients with active disease, there is little doubt that controlling the disease is the most important means to improve the patient’s condition, both at short and long term. Once low disease activity or remission is achieved, a persistently high disease impact should become the guiding target: after a diligent search for remaining (undetected) disease activity, it needs to be analysed and understood so as to choose the best adjunctive intervention, such as analgesia, rehabilitation or anti-depressive therapy, among other pharmacological and non-pharmacological therapies.49 PGA score is not appropriate for this purpose, and more analytic instruments, such as the Patient Reported Outcome Measurement Information System (PROMIS),50 the RA Impact of Disease (RAID) score51 52 or the RA Flare Questionnaire,53 are required.

Overall, these results support the proposal that the 3V definition of remission in parallel with a separate evaluation of the patient’s perspective, that is, the dual target strategy, deserves consideration. The first target aims to control inflammation (biological remission) and the other one to control disease impact (symptom remission), guided by clinically informative PROMs.9 16 20 Pursuing and achieving the first is an important contribution, but no guarantee that the second will be fulfilled. Further research, specifically regarding adjuvant interventions required to achieve effective control of disease impact endured by patients in biological remission designed to bring patients from 4V-near-remission into full remission, is warranted to validate the concept of dual target. Improving symptoms and signs of RA, both short and long term, is the major goal of treatment and it deserves being highlighted by an independent treatment target.

Acknowledgments

We would like to acknowledge the invaluable support provided from Jos van der Velden (SAS Portugal), who assisted us with the use of SAS software and access to the SAS Clinical Trial Data Transparency Portal. We also acknowledge the support from Adam LaMana (SAS International) and from the personal from ‘data sharing’ teams from Pfizer, AbbVie, Roche, UCB and YODA. We also would like to acknowledge the support of Eduardo Santos (Coimbra, Portugal) in performing the meta-analyses.

References

Footnotes

  • RJOF and PMJW are joint first authors.

  • Handling editor Josef S Smolen

  • Twitter @FerreiraRJO, @ndosi, @pedrommcmachado

  • Contributors All authors designed the study and protocol, which was firstly drafted by RJOF and JAPS. RJOF and PMJW performed the data analyses. RJOF and JAPS wrote the initial draft of the manuscript, which was critically revised and refined by all authors. All authors formally approved the final manuscript.

  • Funding This manuscript is based on research using data from data contributors AbbVie, Pfizer and UCB that have been made available through Vivli, Inc. This study was also supported by CSDR (ClinicalStudyDataRequest), which has an agreement with Roche Inc. (Project no. 1808). Data were also obtained from the Yale University Open Data Access Project (YODA Project no. 2017-1451), which has an agreement with Janssen Research & Development, LLC. PMM is supported by the National Institute for Health Research (NIHR) University College London Hospitals (UCLH) Biomedical Research Centre (BRC). RJOF was supported by a grant from ARCo – Associação de Reumatologia de Coimbra, a non-profit association of health professionals.

  • Competing interests RJOF reports a research grant from Abvvie and speaker fees from Sanofi Genzyme, Amgen, MSD and UCB Pharma. JWGJ reports a research grant from Roche. LG reports a research grant from Lilly, Mylan, Pfizer and Sandoz, and speaker fees from AbbVie, Amgen, Biogen, Celgene, Janssen, Lilly, MSD, Novartis, Pfizer, Sandoz, Sanofi-Aventis and UCB Pharma. MN reports a research grant from Bristol Myers Squibb, and speaker fees from Janssen and Pfizer. PMM reports speaker fees from Abbvie, Celgene, Janssen, Lilly, MSD, BMS, Novartis, Pfizer, Roche and UCB Pharma. DvdH is Director of Imaging Rheumatology bv and reports speaker fees from AbbVie, Amgen, Astellas, AstraZeneca, BMS, Boehringer Ingelheim, Celgene, Cyxone, Daiichi, Eisai, Eli-Lilly, Galapagos, Gilead, Glaxo-Smith-Kline, Janssen, Merck, Novartis, Pfizer, Regeneron, Roche, Sanofi, Takeda and UCB Pharma. JAPS reports a research grant from Pfizer and Abvvie, and speaker fees from Pfizer, AbbVie, Roche, Lilly and Novartis.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Ethics approval Ethical approval to this study was granted by the Centro Hospitalar e Universitário de Coimbra Ethics Committee (CHUC-047–17).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles