Article Text

Definition of rheumatoid arthritis flare based on SDAI and CDAI
  1. Victoria Konzett1,
  2. Andreas Kerschbaumer1,
  3. Josef S Smolen1,
  4. Eirik Klami Kristianslund2,
  5. Sella A Provan2,
  6. Tore K Kvien2,3,
  7. Daniel Aletaha1
  1. 1Department of Medicine III, Division of Rheumatology, Medical University of Vienna, Vienna, Austria
  2. 2Center for Treatment of Rheumatic and Musculoskeletal Diseases (REMEDY), Diakonhjemmet Hospital, Oslo, Norway
  3. 3Institute of Clinical Medicine, University of Oslo, Oslo, Norway
  1. Correspondence to Professor Daniel Aletaha, Department of Medicine III, Division of Rheumatology, Medical University of Vienna, Vienna, Austria; daniel.aletaha{at}meduniwien.ac.at

Abstract

Objective To develop and validate definitions for disease flares in rheumatoid arthritis (RA) based on the quantitative Simplified and Clinical Disease Activity Indices (SDAI, CDAI).

Methods We analysed RA treatment courses from the Norwegian disease-modifying antirheumatic drug registry (NOR-DMARD) and the Vienna RA cohort. In a receiver operating curve analysis, we determined flare definitions for absolute changes in SDAI and CDAI based on a semiquantitative patient anchor. NOR-DMARD was sampled into an 80%-training cohort for cut point derivation and a 20%-test cohort for internal validation. The definitions were then externally validated in the independent Vienna RA cohort and tested regarding their performance on longitudinal, content, face, and construct validity.

Results We analysed 4256 treatment courses from NOR-DMARD and 2557 from the Vienna RA cohort. The preliminary definitions for absolute changes in SDAI and CDAI for flare are an increase of 4.7 and 4.5, respectively. The definitions performed well in the test and external validation cohorts, and showed clinical face and construct validity, as flares significantly impact both functional (∆Health Assessment Questionnaire flare vs no-flare +0.43; p<0.001) and structural (∆modified Sharp Score 43% higher after flare; p<0.001) disease outcomes, and reflect consistent worsening across all disease core sets, both patient reported and objective.

Conclusion We here provide novel definitions for flare in RA based on SDAI and CDAI, validated in two large independent real-world cohorts. In times of highly effective medications for RA, and consideration of their tapering, these definitions will be useful for guiding decision making in clinical practice and designing clinical trials.

  • Arthritis, Rheumatoid
  • Outcome and Process Assessment, Health Care
  • Epidemiology
  • Recurrence
  • Therapeutics

Data availability statement

Data are available upon reasonable request. Data are available upon reasonable request. Please contact sella.provan@diakonsyk.no (NOR-DMARD) and victoria.konzett@meduniwien.ac.at (Vienna RA cohort).

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • While measures of improvement and state of disease activity are well established in rheumatoid arthritis (RA), distinct definitions for worsening (‘flare’) are to date lacking. For novel treatment strategies identifying therapy reduction in patients on treatment target, a standardised definition of flare is highly warranted.

WHAT THIS STUDY ADDS

  • In two large, independent real-world cohorts, we estimated and validated quantitative definitions for flare based on the Simplified and Clinical Disease Activity Indices (SDAI, CDAI). We performed internal and external validations, including content, construct, longitudinal, and face validity analyses, to support future application of these criteria in clinical practice and studies.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • With highly effective medications available, treatment tapering is an increasingly important strategic element in RA care. Available definitions of flare will allow for a standardised identification of flare in routine patient care, and the use of consistent and reliable criteria for endpoints in tapering or withdrawal studies.

In chronic inflammatory conditions like rheumatoid arthritis (RA), biologic and targeted synthetic disease modifying antirheumatic drugs (DMARDs) tremendously advanced the potential of controlling the immune-mediated inflammatory disease process that drives systemic and local inflammation responsible for joint destruction and functional disability.1 2 The fact that a high proportion of patients reaches treatment goals like low disease activity or remission (LDA, REM) with these compounds has linked into new considerations about treatment tapering or withdrawal.3 With these strategic developments, the prevention of a relevant worsening of disease activity, typically referred to as a ‘flare’ of disease, has become a new goal in contemporary RA management.4

While response to treatment and the achievement of treatment targets are consistently evaluated with quantitative scores and indices that allow for objective and reliable outcome measurement guiding clinical management,5 6 comparable measures for a relevant worsening of disease activity are lacking to date. This results in very heterogeneous concepts of ‘worsening’, impeding the recognition of lasting treatment effects in both clinical trials as well as routine care.7 8

We therefore here propose definitions of RA disease activity flares based on the quantitative scales of two well-established composite measures of disease activity: the Simplified and Clinical Disease Activity Indices (SDAI, CDAI).5 6 We derive and validate cut points for absolute increases in SDAI and CDAI that correspond to a flare for patients with an initial response to DMARDs in two large independent real-world RA cohorts.

Methods

Patient cohorts

The Norwegian disease-modifying antirheumatic drug registry (NOR-DMARD) contains longitudinal data on treatment courses of patients with inflammatory conditions like RA. The study was initiated in 2000 and recruitment is still ongoing, now with inclusion of patients from four different centres.9 The Vienna RA cohort is an independent observational registry of outpatients visiting the Vienna General Hospital between 1996 and 2021. In addition to demographic and clinical variables, plain radiographs were collected regularly over the years and retrospectively quantified.

All data collection was approved by the respective local ethics committees (EC) (no. 2011/1339 and 2017/2041 for NOR-DMARD; no. 2002/2014 and 1448/2019 for the Vienna RA cohort). All patients had consented to the data collection and anonymised data analysis.

Patient-reported anchors for flare

Norwegian disease-modifying antirheumatic drug registry

In NOR-DMARD, patient perception of response to DMARDs is compared with baseline (ie, start of a new treatment regimen) using a 5-point Likert scale (much worse—worse—unchanged—better—much better) at multiple timepoints (months 3, 6 and 12, and every 12 months thereafter). Flare was defined as a patient’s 6-month rating of response that was at least two categories worse that the rating at month 3; patients who at month 3 already reported any degree of disease worsening were not included in the analyses, as we considered primary worsening after treatment initiation not compatible with the conceptual framework of a flare.

Vienna RA cohort

A comparable patient anchor is recorded in the Vienna RA cohort, where patients are asked to score their perception of changes in disease activity on a 5-point Likert scale (from much worse to much improved); however, change is here referenced to the last clinical visit only (and not to baseline as in NOR-DMARD). In this cohort, we considered a worsening in follow-up visits after an initial improvement as flare.

Derivation of flare definitions

We considered the individual treatment courses as main units of our analysis. Analyses were conducted using R (V4.2, Vienna, Austria).10 In a diagnostic testing approach using receiver operating characteristic (ROC) curve analyses, we estimated the performance of a series of potential cut points for absolute and relative changes in SDAI and CDAI with respect to the patient flare anchor.11 The area under the receiver operating curve (AUC) was calculated, and cut points that would provide at least 80% specificity for a flare identified.12

Internal and external validation strategies

For internal validation, we used bootstrapping with 1000 resamples to estimate 95%-confidence intervals (CI) for the AUC as well as for sensitivity and specificity of different cut points for SDAI and CDAI in the NOR-DMARD training cohort.13 In addition, we assessed model performance in a 20%-random sample of the original NOR-DMARD dataset that was separated as test set before starting the analyses. We then performed external validation analyses in the Vienna RA cohort using the described patient anchor.

Thereafter, the novel definitions were tested in both cohorts, regarding their performance on longitudinal, face, content, construct, and criterion validity. A detailed summary and description of all patient subcohorts used for the respective validation analyses is provided in online supplemental table S1.

Supplemental material

Reliability at different timepoints

Flares may occur at any point over the course of a chronic condition, and a definition of flare should be independent of when the flare event occurs. We therefore investigated whether the novel definitions performed equally well for flare detection at later (ie, the 6- and the 12-month) timepoints in the NOR-DMARD cohort.

Face validity

Clinically, a flare of RA would be expected to be followed by an adaptation of treatment. Hence, face validity was assessed in the Vienna RA cohort by comparing treatment changes in patients with and without identified flares.

Content validity

For content validity and internal consistency evaluation of the novel definitions, we explored mean changes in important disease core sets—swollen joint count of 28 joints (SJC28), tender joint count of 28 joints (TJC28), patient and evaluator global assessment of disease activity (PGA, EGA), C-reactive protein (CRP), as well as pain and fatigue scores—in a Vienna RA subcohort of conventional synthetic DMARD (csDMARD) responders within 6 months, where we compared flare visits and no flare visits between the 6-month and 2-year timepoints.

Agreement

SDAI and CDAI scores have been shown to agree well with each other.6 In both cohorts, we performed Kappa statistics to confirm consistency of flare identification between SDAI and CDAI.

Construct/criterion validity

Final validation analyses in the Vienna RA cohort then allowed the assessment of functional and structural consequences of flare or no flare, defined again using the novel flare definitions. These outcomes represent important variables of disease control,14 15 and would be expected to be influenced by flares.

The Health Assessment Questionnaire (HAQ) was used as outcome measurement for functional disability.16 We analysed mean changes in total HAQ within the first 2 years from treatment start, as well as proportions of patients showing minimal clinically important worsening in HAQ scores (HAQ +0.15),17 again in csDMARD responders within 6 months from the Vienna RA cohort. HAQ trajectories during flares were compared with HAQ trajectories recorded in routine follow-up visits from patients who did not flare after adequately responding to treatment.

Structural damage was examined in radiograph follow-up data from the Vienna RA cohort, where regular follow-ups of radiographic damage progression were scored using the van der Heijde modified Total Sharp Score (mTSS).18 The impact of flares on radiographic progression (∆mTSS) was assessed using descriptive statistics and multilevel mixed modelling approaches for longitudinal data analysis, and visualised with cumulative probability plots.

The regular radiographic follow-up examinations performed in the Vienna RA cohort further allowed us to perform a target trial emulation for causal effect estimation of a distinct clinical flare visit on radiographic progression (∆mTSS), and thereby to better address potential confounders.

Detailed descriptions of the statistical methodology applied in the construct/criterion validity analyses are provided in the supplementary methods section (online supplemental methods S1-3).

Results

Patient and treatment characteristics

We analysed 4256 treatment courses of 2837 patients registered in NOR-DMARD between 2002 and 2022. Patients had a median disease duration of 4.2 years, 41% were methotrexate naïve at baseline, 74% were biologic DMARD (bDMARD) naïve. The main analysis was performed in the training dataset that contained a random sample of 80% of the records from the original dataset (n=3385). For the majority of flares recorded between the 3- and 6-month timepoint, the criteria for flare were achieved through a 1-point clinical improvement at month 3 (‘better’), followed by a 1-point worsening at month 6 (‘worse’), both referenced against baseline. A descriptive summary of patient anchor distribution in NOR-DMARD is provided in the supplementary results section (online supplemental figure S2).

The validation cohort (Vienna RA cohort) represented 2557 treatment courses of 1999 RA outpatients visiting the clinic between 1996 and 2021; csDMARD courses represented 72% and biologic/targeted synthetic DMARD (btsDMARD) courses 28% of treatment starts. Baseline characteristics as well as 3- and 6-month data from both cohorts are summarised in table 1.

Table 1

Baseline, 3- and 6-month data from the NOR-DMARD main analysis cohort and the Vienna RA validation cohort

Main analysis and cut point validation

ROC curve analyses for absolute changes in SDAI and CDAI to define ‘flare’ by patient anchor showed an AUC of 0.83 (95%-CI 0.80 to 0.86) and 0.83 (95%-CI 0.80 to 0.85), respectively (figure 1A, SDAI; figure 1B, CDAI). The cut points achieving at least 80% specificity in 1000 bootstrap resamples for the flare criteria were 4.7 worsening of SDAI and 4.5 worsening of CDAI. On application of the derived cut points, 21% of the 6-month clinical visits classified as flares in the main analysis cohort.

Figure 1

Receiver operating characteristic (ROC) curves for absolute changes in SDAI (A) and CDAI (B); 95%-confidence intervals for the area under the receiver operating curve (AUC) were estimated using bootstrapping with 1000 resamples. CDAI, Clinical Disease Activity Index; SDAI, Simplified Disease Activity Index. (data from NOR-DMARD)

In the 20% random sample records in the test cohort (n=871), model performance was comparable to the training cohort (online supplemental table S2); the novel cut points of 4.7 for SDAI and 4.5 for CDAI identified 21% and 22% flares, respectively.

The first external validation analysis was performed in a subcohort of 99 patients from the Vienna RA cohort, using a comparable patient anchor (as described in the methods section). We analysed 6-month follow-up visits of 106 treatment courses. Specificity, sensitivity and accuracy for detection of a patient-reported flare (again described in the methods section) with the novel SDAI and CDAI flare definitions was 0.94, 0.36, and 0.88 for both. In the overall Vienna RA registry, we recorded flare rates of 17.6% and 17.5% for SDAI and CDAI, in a median of 13 follow-up visits in 1999 RA outpatients.

Reliability at different timepoints

For testing the novel definitions’ performance for flare detection between the 6- and 12-month timepoint, we analysed 4801 treatment courses from NOR-DMARD, with flare rates of 22% and 21% recorded during this time period for SDAI and CDAI, respectively. Both cut points performed equally well in this validation analysis, with a specificity of approximately 0.81, a sensitivity of 0.60 and an accuracy of 0.80 achieved for both the SDAI and CDAI definition (a detailed summary is provided in the supplementary results section; online supplemental table S2).

Analysis of agreement

Agreement was high between the SDAI and CDAI definition in both cohorts (κ=0.92 in NOR-DMARD and κ=0.90 in the Vienna RA cohort), and comparable to values calculated in prior analyses of agreement between SDAI and CDAI.6

Face validity of flare definitions

The specificity of flare, defined with the novel cut points, for a subsequent treatment change was 0.85 in this validation analysis, both for the SDAI and CDAI definition. Sensitivity was low as expected (around 0.30), as the reasons for changes of therapy in chronic conditions are manifold, and not limited to disease flares alone. The estimated odds ratio (OR) for treatment changes following flares, vs no flares, was 2.43 (95%-CI 2.30 to 2.58; p<0.001; 46.0% of flare visits followed by subsequent treatment changes, vs 25.5% of no flare visits).

Content validity

Internal consistency of the flare definitions across important disease core sets was assessed in 329 csDMARD responders from the Vienna RA cohort. Median (IQR) age of patients in this cohort was 55 (23) years, median (IQR) disease duration at baseline 1.5 (25.2) months and median (IQR) SDAI at baseline 19.2 (12). Figure 2 visualises the distribution of mean changes across the disease components, being both SDAI and CDAI core sets,5 6 as well as additional relevant disease outcomes (pain and fatigue scores).19 20 Summary statistics and visualisations with boxplots are provided in the supplementary results section (online supplemental table S3; figure S3).

Figure 2

Spider plots visualise changes in disease core sets (swollen joint count of 28 joints, SJC28, tender joint count of 28 joints, TJC28, patient global assessment of disease activity, PGA, evaluator global assessment of disease activity, EGA, C-reactive protein, CRP, pain score, fatigue score), in flare vs no flare follow-up visits (n=1150 for SDAI; n=1252 for CDAI); in 329 treatment responders (SDAI low disease activity or remission, within 6 months afte csDMARD start in SDAI moderate or high disease activity); 6-month to 2-year follow-up visits were classified using the novel flare definitions Embedded ImageSDAI +4.7, ∆CDAI +4.5); EGA, PGA, pain, and fatigue are given in cm; CRP in mg/dL; thin borders mark 95%-confidence intervals. CDAI, Clinical Disease Activity Index; csDMARD, conventional synthetic disease-modifying antirheumatic drug; SDAI, Simplified Disease Activity Index. (data from the Vienna RA cohort)

Construct/criterion validity

Functional disability

HAQ response trajectories were analysed in a Vienna RA subcohort of 272 csDMARD responders within 6 months, in which at least one flare after reaching LDA was recorded in 53% of the patients. The mean (SD) time to first LDA was 3.6 (1.6) months, the mean (SD) time to first flare 8.2 (5.6) months in this cohort. The mean (SD) HAQ scores at baseline and first LDA were 0.98 (0.68) and 0.50 (0.58), respectively. The mean (SD) change in HAQ was significantly stronger at flare vs no flare, with a mean (SD) HAQ of 0.84 (0.70) recorded at flare, vs 0.41 (0.57) at the 1-year routine control visits (∆HAQ 0.43; 95%-CI 0.27 to 0.60; p<0.001), and HAQ stayed high in the flare group until the last follow-up in this 2-year period (mean last HAQ 0.76, SD 0.70). The proportion of patients showing a minimal clinically important worsening in HAQ score (∆HAQ +0.15)17 after being on target was significantly higher in the flare group compared with 1-year routine control visits in the no flare group (46.2% for LDA to flare, vs 12.8% for LDA to routine follow-up; p<0.001). HAQ response trajectories are visualized in figure 3.

Figure 3

HAQ trajectories after csDMARD start; assessed in 272 treatment responders (SDAI low disease activity, LDA, or remission, within 6 months after csDMARD start/baseline, BL, in SDAI moderate or high disease activity); stratified by flare vs no flare in the 6-month to 2-year follow-up visits; when no flare was recorded, the mean HAQ after 1 year of follow-up (as being the approximate mean time of first flare in the event cohort) was used for between-group comparison; stratification by flare is shown for the visit before flare, prior, the flare visit, event, and two follow-up visits, 1st FU and 2nd FU; point estimates mark mean values in both groups, error bars indicate 95%-CIs; for HAQ scores, last observation was carried forward when treatment was changed. CDAI, Clinical Disease Activity Index; csDMARD, conventional synthetic disease-modifying antirheumatic drug; FU, follow-up; HAQ, Health Assessment Questionnaire; SDAI, Simplified Disease Activity Index. (data from the Vienna RA cohort)

Structural damage

For assessing the impact of flares on structural damage, changes in the mTSS (∆mTSS) were analysed in 1765 follow-up radiographs from 763 patients in the Vienna RA cohort. The median (range) number of radiographs assessed per patient was 3 (2–8), the median (IQR) time between two consecutive radiographs 14 (6.5) months. Mean (SD) mTSS of the respective baseline radiograph was 30.9 (43.4), and mean (SD) radiographic progression to the consecutive radiograph 4.1 (6.4). 16.9% (SD 22.8%) of clinical visits between two consecutive radiographs were flare visits, on average. This proportion of flare visits for each patient was significantly associated with radiographic progression. The estimated ∆mTSS was higher in cycles with frequent flare compared with rare flare (4.8±7.7 vs 3.6±5.8; with a mean difference of 1.13; 95%-CI 0.51 to 1.75; p<0.001). This was consistent with estimated coefficients when applying more complex modelling approaches (β (∆mTSS) 1.00; 95%-CI 0.43 to 1.56; p<0.001; in a random intercept model adjusted for age, sex, time and the baseline mTSS). When comparing changes in mTSS≥5, 34.3% of patients progressed beyond that threshold in the flare group, compared with 27.4% of patients without flares (p=0.002). Results are presented as cumulative probability plot (figure 4), as well as in the supplementary results section (online supplemental table S4).

Figure 4

Change in van der Heijde modified Sharp Score, ∆mTSS, between two consecutive radiographs; stratified by the proportion of flares between consecutive radiographs (rare flare with <17% flare visits; frequent flare with ≥17%; threshold was chosen based on the mean proportion of flares between radiographs of 16.9%); no. of follow-up radiographs=1765, no.of patients=763. (data from the Vienna RA cohort)

For target trial emulation, 393 clinical visits of 233 patients from the Vienna RA cohort were enrolled by virtue of fulfilling all inclusion and exclusion criteria (baseline radiograph available within 100 days before, and follow-up radiographic within 30–300 days after key visit; for details see online supplemental method S3). Baseline radiographs were taken 2.0 (0.9) months before the key visit, follow-up radiographs after 7.3 (2.3) months, on average (SD). The mean (SD) baseline mTSS was 25.5 (35.5), mean (SD) radiographic progression (∆mTSS) to follow-up was 3.02 (4.85); 18% of the clinical visits were flare visits in this study cohort. These flares predict radiographic progression within 300 days from the event; with an incidence rate ratio of 1.43 (95%-CI 1.04 to 1.96; p<0.001) indicating an expected 43% higher ∆mTSS after flare vs no flare events; when adjusting for age, sex, and baseline mTSS. Detailed model results are provided in the supplementary results section (online supplemental table S5).

Discussion

The definition of flare continues to be an unmet need in clinical practice and RA research: with more and more patients reaching clinical targets of REM or LDA following contemporary treatment strategies (Treat-to-Target, T2T),21–26 treatment reduction has become part of RA management algorithms.3 In this phase of management, it is important to make consistent decisions about when a potential worsening must be considered as flare, quite similar to the consistent definition of the target when using T2T. In addition, disease activity may worsen at any time during treatment, and the concept of flare should not be limited to the initial treatment phase or to withdrawal/tapering studies only.

A proper and homogeneous definition of flare allows to perform studies aiming at identifying risk of (and predictors of) a flare, so that in the best case a flare can be prevented. In analogy to T2T, which requires the use of composite scales that include objective measures of the target organ (ie, joint counts), we aimed to use the same concept as a basis for a scale on which flare can be defined. The American College of Rheumatology (ACR)/EULAR remission definition proposes to use the SDAI and CDAI scales for determining remission on quantitative scales27 28 which we adopted also as the basis for a flare definition: we identified a flare as an increase in disease activity of 4.7 on the scale of the SDAI and an increase of 4.5 on the scale of the CDAI. The difference of 0.2 evident from the analyses is reasonably reflecting the contribution of CRP to the SDAI in the flare situation. The high agreement identified for the SDAI- and CDAI-based definitions makes them interchangeable, which is an important note, as the CDAI may be attractive for evaluation on the spot in clinical practice and can be applied also in all clinical trials, since it is independent of direct interference of certain agents, such as interleukin 6 inhibitors, with CRP production.

The identification of the optimal cut point of the SDAI/ CDAI was based on a diagnostic testing procedure that would map the best cut point for SDAI or CDAI worsening to the concept of disease flares. Since no measurable ground truth is available for this concept, we used a patient anchor as gold standard for the analysis. Importantly, while using pure patient reporting as definition of flare may not be ideal (see concept of T2T), our aim was to use a patient-based anchor, but to then map it to objective scales that are also proposed for T2T. Definitions of flares that are solely based on patient reports have been proposed previously,29–33 but have not yet reached use in clinical trials or as monitoring tools. In fact, various studies underlined the importance of swollen joint counts, which are associated with longer-term structural progression; similarly, this is true for acute phase reactants.34–36

Validation was a very important aspect of our study. We chose a multifaceted methodological approach to validate the cut points in two large real-world RA cohorts. For internal validation, we decided to use resampling and split-sample strategies, and varied the variables of the derivation analysis to check for robustness of the findings (eg, by testing the cut points in clinical visits between the 6- and 12-month timepoint). This showed high internal consistency in the NOR-DMARD cohort. For validation, we used another, independent, large, real-world cohort, with patient characteristics differing from NOR-DMARD (eg, years of patient enrolment, disease duration, csDMARD/bDMARD use). There, the new flare definitions were scrutinised for their performance in various ways, but importantly also by testing them against another patient anchor with a different wording than in NOR-DMARD. Particularly, the analysis on construct/criterion validity was complex and almost a study on its own: the multilevel modelling for longitudinal data analyses and the target trial emulation both confirmed the utility of the new definition for use in clinical practice and clinical trials.

Limitations of our study are mainly those typical for any criteria development activity. While derivation of the definition was purely data driven, as in many analyses, assumptions have been taken by the authors which need to be discussed: first, the external anchor, often referred to as the ‘gold standard’, could possibly be defined in many different ways, and there is no ground truth for a definition of flare. The gold standard applied here was used as stringent criterion by asking for substantial worsening (at least 2 points on a 5-point Likert scale). In addition, also the criteria for mapping SDAI/CDAI cut points were using a high (80%) specificity preference. While it is reasonable to argue that the definition should be specific in order to prevent unnecessary reintroduction of therapy, the purpose of a flare definition must be to identify those that are truly flaring. The claim of specificity is further supported by the undulating nature of RA disease activity, and the fact that patient perception of improvement or worsening may be influenced by various factors. The 80% specificity cut point was therefore used to provide this stringency, similar to previous studies,12 and strengthened through internal validation using bootstrapping. The opposite, a highly sensitive definition, may neither serve the patient by creating anxiety nor the physicians by potentially escalating treatment without substantial biological reason.

Of importance, we did not investigate whether the concept of flare maps well on relative changes of SDAI or CDAI, as relative changes are less intuitive in clinical practice and require a reference to some previous measurement; for example, the relative ACR response criteria37 have never reached utility in clinical practice. In addition, relative measures better homogenise observed changes if the baseline is highly variable, which in the case of flare (eg, in patients who had reached their treatment target before) is less relevant. Overall, the flare definition was internally and externally validated by modifying (a) the gold standard, (b) the assumptions, (c) the models and their variables and (d) the cohorts and subgroups.

One important aspect that was not part of this study is the use of imaging, particularly ultrasound, in the context of defining a flare. Ultrasound is an excellent and sensitive tool to identify flare in joints before they become clinically apparent.38–41 While we could not analyse this given the absence of ultrasound data in our cohorts, we strongly call, based on the arguments about sensitivity and specificity above, for future studies to investigate ultrasound as a screening tool for predicting the (specific) clinical flare.

Flare definitions, similar to response definitions, will depend on the context and will be influenced by various factors; our approach using real-world cohorts with mixed populations was to derive single pragmatic (and usable) cut points for potential application as endpoints in clinical practice and studies. Of note, the two cohorts from Europe including predominantly Caucasian patients may not be representative of patients in other parts of the world, which can be considered as limitation of this study.

Finally, the interpretation of having a flare or not needs to consider context. All existing RA disease activity measures are influenced not only by the inflammatory disease activity but may be affected by comorbidities and other external factors. For example, an infection could lead to an increase in SDAI above the flare cut point solely due to an increase in CRP, or an acute depressive episode could increase the patient global measurement in both SDAI and CDAI. However, the use of such integrative tools will best balance the effect of individual measures that may not be consistent with other aspects of disease activity.

In summary, we here propose and validate definitions of RA disease activity flares based on absolute changes in two well-established disease activity scores, the SDAI and the CDAI. Using various statistical methods, we estimated and validated the definitions in two large real-world RA cohorts, where they performed well both in internal and external validation analyses. The new definitions of RA flare will serve as objective measures supporting clinical decisions for intervening rapidly and consistently in patients who need treatment escalation, but also allow consistent investigation of predictors of flare in trials and observational studies.

Data availability statement

Data are available upon reasonable request. Data are available upon reasonable request. Please contact sella.provan@diakonsyk.no (NOR-DMARD) and victoria.konzett@meduniwien.ac.at (Vienna RA cohort).

Ethics statements

Patient consent for publication

Ethics approval

Data collection, analysis and result publication were approved by the regional ethics committee of South-Eastern Norway (no. 2011/1339 and 2017/2041) and of the Medical University of Vienna (no. 2002/2014 and 1448/2019) for both datasets used in this study. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors thank GM Supp and T Deimel for scoring radiographs in the Vienna RA cohort, and J Sexton for NOR-DMARD data management and preparation.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Dimitrios T Boumpas

  • Twitter @victoriakonzett, @DanielAletaha

  • Contributors Planning and conception of the study: DA, VK, JSS and TKK. Data management and preparation: TKK, EKK, SAP, DA, AK and VK. Statistical analyses: VK, DA, AK and JSS. Figure development: VK, DA and AK. Interpretation of results: DA, VK, AK, JSS, EKK, TKK and SAP. Manuscript draft and preparation: DA, VK, JSS, AK, EKK, TKK and SAP. All authors had access to both (VK and DA) or one (NOR-DMARD: EKK, SAP and TKK; Vienna RA cohort: AK and JSS) datasets. All authors controlled the decision to publish and accept full responsibility for the finished work of the study. VK and DA are the guarantors of the study. All authors gave their final approval of the final document version to be published.

  • Funding The Centre for Treatment of Rheumatic and Musculoskeletal Diseases (REMEDY) is funded as a Centre for Clinical Treatment Research by the Research Council of Norway (project 328657). The authors have not declared other grants for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests VK: None declared. AK: received honoraria (speaker’s bureau, consultancy) from AbbVie, Amgen, BMS, Eli Lilly, Galapagos, Gilead, Janssen, Merck Sharp and Dohme, Novartis, UCB and Pfizer. JSS: received grants from AbbVie, AstraZeneca, Galapagos, Eli Lilly, Novartis, Roche and honoraria from AbbVie, Amgen, AstraZeneca, Astro, BMS, Chugai, Janssen, Eli Lilly, MSD, Novartis-Sandoz, Pfizer, R-Pharma, Roche, Samsung, UCB, Celltrion, Gilead-Galapagos and Sanofi. EKK: None declared. SAP: received grants from Boehringer Ingelheim and honoraria (consultancy) from Boehringer Ingelheim and Novartis. TKK: received grants from AbbVie, BMS, Galapagos, Novartis, Pfizer and UCB, and honoraria (speaker’s bureau, consultancy) from AbbVie, Galapagos, Gilead, Janssen, Novartis, Pfizer, Sandoz, UCB, Amgen, Celltrion, Egis, Evapharma, Ewopharma, Grünenthal, Hikma, Janssen, Oktal, Sandoz and Sanofi. DA: received grants from AbbVie, Amgen, Galapagos, Eli Lilly and Sanofi, and honoraria (speaker’s bureau, consultancy) from AbbVie, Amgen, Galapagos, Eli Lilly, Janssen, Merck, Novartis, Pfizer and Sandoz. JSS and DA are members of the ARD Editorial Board.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.