Statistics from Altmetric.com
The natural course of axial spondyloarthritis (axSpA) includes periods of flares and remission.1 Flares are an important attribute of disease activity, and assessment of flares is useful in clinical practice and in clinical trials to better understand disease status and treatment efficacy. In the context of clinical trials, the assessment of flares is necessary in two situations: in ‘flare-design trials’, trial treatment is introduced only in case of flare being the consequence of interruption of the ongoing/previous treatment (eg, in axSpA if non-steroidal anti-inflammatory drugs (NSAIDs) have been stopped)2; and in tapering or discontinuation trials, if the treatment (eg, tumour necrosis factor inhibitors (TNFis)) is (usually progressively) tapered or discontinued in patients being in a stable disease activity state, and the outcome measure is (time to) flare.3 ,4
Thus the concept of flare—or disease activity worsening—needs to be well established in axSpA. This is particularly important since one can anticipate an increasing number of studies will concern drug discontinuation in patients being in remission or low disease activity on treatment. Criteria to define ‘flare’ may help harmonising trial designs for new clinical trials and may lead to better assessment of axSpA and its fluctuations. However, to date, a broadly accepted definition of ‘flare’ in axSpA is lacking. Indeed, a succinct check of flare definitions used in published trials indicates important heterogeneity.
The Assessment of Spondyloarthritis (ASAS) group is an international, independent group of experts of spondyloarthritis (SpA) with a methodological focus, which has developed and validated most of the criteria and outcome measures currently used in SpA clinical trials.5–7 The ASAS group has decided to explore the definition of ‘flare’ in axSpA. Ongoing work on flares in rheumatoid arthritis (RA) is exploring differences in the perception of flares by physicians and patients, with the objective to develop a specific outcome measure, that is, a new questionnaire, to assess flares in RA.8 ,9 There are previously published studies on the perception of flare by the patient in SpA.10–12 However, in the present project, it was decided not to explore the patients’ perspective per se, but rather to focus on the definition of ‘flare’ based on validated outcomes already widely used to assess disease activity in axSpA, as has recently been done in a French study.13
The aim of this project was to develop a consensus definition of ‘flare’ (or worsening) in axSpA, based on validated composite indices, to be used in clinical trial designs and designs of longitudinal studies.
Material and methods
This project had two main steps to collect data: a systematic literature review (SLR) and a case-vignette exercise. This was followed by a consensus step.
Systematic literature review
First, to gain an overview of flares, studies specifically focusing on flares in patients with axSpA, with any or no intervention, were searched for in Medline Pubmed and Embase in May 2014. The key words were derived from ‘ankylosing spondylitis’ and ‘flare, exacerbation, relapse, recurrence, clinical reactivation’.
A second SLR was performed to collect all the definitions of ‘flare’ used in randomised controlled trials (RCTs) of NSAIDs or TNFi in patients with axSpA, up to May 2014. The search was based on two previous systematic reviews and updated in Medline PubMed, Embase and Cochrane for articles published in English, German, French or Spanish. Unpublished RCTs from main rheumatology congress abstracts for 2012–2014 and ongoing trials from the website http://www.clinicaltrials.gov were also analysed. The key words used were derived from ‘ankylosing spondylitis’ and ‘clinical trials’. The search strategy and the full key words are shown in online supplementary table S1.
One investigator (AP) selected all the studies referring to the concept of flare in adult patients with axSpA.
General data regarding study characteristics and specific flare data were collected. The outcome of interest was the definition used for ‘flare’. If present, information was collected about the instrument used, the cut-off level if flare was measured by a combination of several instruments or as a single instrument only and if flare was conceptualised as a relative change, an absolute change or an absolute value (status).
Analysis was descriptive and included the instrument used to define ‘flare’, use of one instrument or of a combination, cut-off used to determine flare, use of a relative or absolute change or use of an absolute value.
To assess ASAS members’ opinions on what constitutes a flare in axSpA, a case-vignette exercise was conducted. Vignettes are brief written case histories of a fictitious patient based on a realistic clinical situation accompanied by one or more questions that explore what a physician would think if presented with the actual patient.14
Development of the case-vignettes
The case-vignettes were designed by three authors (LG, AP and MD) based on only one scenario. Full information is given in online supplementary table S3. It was decided to use the case of a 32-year-old man with a well-established diagnosis of axSpA in order to avoid diagnostic discussions. In the scenario, the patient had visits at two successive time points, and a description of the patient's status at both time points was given using results of scores. It was decided that flare would be defined as a change in status between the two time points, that is, a flare is an absolute change between two values: the observed value of the outcome at the time of the flare minus the referral value (previous status before the flare). The scores used here were: (a) patient-reported pain numerical rating scores (pain due to axSpA, range 0–10); (b) Bath Ankylosing Spondylitis Disease Activity Index (BASDAI15 range 0–10); (c) C-reactive protein (CRP) as a continuous result (in milligram per litre), coupled with change in BASDAI; and (d) the Ankylosing Spondylitis Disease Activity Score—CRP16 (ASDAS-CRP) as a global score. For illustrative purposes, the elements of the ASDAS-CRP were shown for each ASDAS result: the ASDAS includes back pain, duration of morning stiffness, patient global assessment, peripheral pain/swelling and CRP.16–18
The patient's initial status (referral value of the outcome) varied from no symptoms to moderate/high disease activity (eg, pain level of 6/10), thus excluding very high initial values, since it was considered that definitions of ‘flares’ are only relevant for patients initially not in high/very high disease activity. Many possible steps of worsening in the patient's disease activity status were constructed; in the end, 140 vignettes were designed (see table 1 and online supplementary table S3). An example of a vignette for BASDAI is the following: ‘A 32-year-old man with a well-established diagnosis of axSpA consults you at two successive time points. In comparison with the previous visit and according to the following data, and all other things being equal (physical examination, CRP and NSAID intake), do you consider this patient is flaring at the second visit? Yes or No. Please give an answer (yes or no) even if you are unsure’.
Initial (first visit) BASDAI (0–10): 2; final (second visit) BASDAI (0–10): 4; Flare: Yes/No.
Initially, variations in CRP alone, as well as in NSAID intake (ie, 65 additional vignettes), were also constructed but were not retained for the final definitions since the group considered that isolated variations in acute phase reactants or in NSAID intake, without changes in any other parameters, were unlikely to reflect a flare. These results are therefore not presented here.
The timeframe between the two visits was not determined to allow better external validity of the definition.
Distribution of the vignettes
All the 159 ASAS experts were asked to assess a sample of 46 vignettes between July and December 2014; each sample was intentionally constructed to include vignettes for each outcome and a distribution of changes in status. The ASAS experts were asked to answer for each vignette if the patient was considered flaring (yes/no).
For each outcome separately, the vignettes were analysed per stratum of change in outcome, that is, for an absolute change of outcome of at least X (thus all vignettes with a BASDAI increase of at least three points were analysed together, then all vignettes with an increase of at least four points and so on). The absolute change in each outcome was then coupled to the value of the variable at the time before the flare (referral value) and the value observed at the time of flare (eg, change in pain of at least 2 points and pain value at time of flare of at least 4 points on a 0–10 scale).
Using the outcome values as the test, and the ‘flare-judgement’ by the rheumatologist as the ‘gold-standard’, sensitivity and specificity could be calculated for each of the outcomes and receiver-operating characteristic (ROC) curves were constructed. Areas under the ROC curve were calculated and optimal cut-off values for defining a ‘flare’ were established. The corresponding sensitivities, specificities, positive predictive values (PPV) and negative predictive values (NPV) were then calculated. For example, the sensitivity is the proportion with a BASDAI change ≥X calculated among those considered in flare by the physician. The specificity is the proportion with a BASDAI change <X calculated among those considered not in flare by the physician. The PPV is the proportion with a flare calculated among those who have a BASDAI change ≥X, and the NPV is the proportion with no flare among those who have a BASDAI change <X.
Results were presented to the ASAS experts during a plenary workshop in January 2015 and consensus on a preliminary set of draft definitions was reached.
SLR of definitions used for flare in axSpA studies
A total of 1013 articles initially screened resulted in 38 studies using some definition of ‘flare’ in axSpA (see online supplementary table S2). There were 23 RCTs proposing definitions of ‘flares’, assessing either NSAIDs (N=16) or TNFi (N=7): 19 of them concerned flares between screening and baseline, and 4 concerned flares after drug discontinuation. Of these RCTs, 11 (65%) were published over the last 2 years or were ongoing studies found in clinicaltrials.gov. Additionally, there were 15 studies referring specifically to flares: 8 were trials, 3 were qualitative studies and 4 had another study design.
The 38 studies used 27 different definitions of ‘flare’ (table 2). The frequency of flares using these definitions was not always reported but when reported, ranged from 7% to 91% (see online supplementary table S2). The two most frequent definitions used were: absolute BASDAI ≥4/10 with absolute physician assessment ≥4/10 used in six studies, and increase in pain ≥30% with absolute pain ≥4/10 used in six studies.
Overall, all 38 (100%) studies with ‘flare’ definitions used patient-reported outcomes of which 17 (45%) used BASDAI (table 2). BASDAI was used to define flares, either alone (N=7, 41% of 17 studies), or in combination with other instruments (N=10, 59% of 17 studies). Of note, in the literature a flare defined by BASDAI was generally based on a change of at least 1 or 2 points on a 0–10 scale.
Pain was used in 14 (37%) articles to define ‘flares’, either alone (N=10), or in combination with other instruments (N=4).
ASDAS was used only once to define ‘flare’ using a cut-off of 2.1 (absolute value).
Five studies (13%) used elements of physical assessment and four (10%) used acute phase reactants to define ‘flares’ (table 2).
Vignette exercise and final consensus
Of the 159 ASAS members, 121 (76%) completed the exercise (some of them partly), yielding a total of 4999 responses to analyse. The analyses and the consensus process led to 12 preliminary definitions of flare; the performances of these different definitions are shown in table 3 and ROC curves are presented as online supplementary figure S1. Further information is given below.
The prevalence of the event ‘flare’ was 63.1% (387 of 613 answers) in the pain vignettes. The ROC curve allowed the selection of two cut-offs for pain variations (on a 0–10 pain scale), with best sensitivity/specificity trade-offs: increase in pain ≥2 points and increase in pain ≥3 points. For these two cut-offs, performances were calculated for different referral (first visit) pain values and observed (second visit) pain values.
The resulting figures (not shown) indicated (a) considering a pain change ≥2 points, more than 70% of the doctors will consider there is a flare if the referral level of pain is ≤4. (b) Considering a pain change ≥2 points, more than 60% of the doctors will consider there is a flare if the final value is ≥4. (c) Considering a pain change ≥3 points, more than 80% of the doctors will consider there is a flare if the referral level of pain is £4. (d) Considering a pain change ≥3, more than 80% of the doctors will consider there is a flare if the final value is ≥5.
Based on these results, and as the referral value defines the context of the study whereas the observed value at the time of the flare defines the flare, it was proposed to keep two preliminary definitions based on pain: (a) an increase in pain of ≥2 and an observed value at the time of the flare of ≥4; (b) an increase in pain of ≥3. The performances of these cut-off values are given in table 3. Additional discussions during the consensus process led us to propose the following combined definition: if the observed value is ≥4, a ‘flare’ is defined as an increase in pain ≥2 points, otherwise, flare is defined as an increase in pain ≥3 points (table 3).
The prevalence of the event ‘flare’ was 68.1% (421 of 618 answers) in the BASDAI vignettes. The ROC curve allowed the selection of two cut-offs for BASDAI (on a 0–10 scale): increase in BASDAI ≥2 points and increase in BASDAI ≥3 points. For these two cut-offs, the performances were again calculated for different referral and observed values. (a) Considering a BASDAI change ≥2, more than 80% of the doctors will consider there is a flare if the referral BASDAI is ≤4. (b) Considering a BASDAI change ≥2, more than 60% (or 70%) of the doctors will consider there is a flare if the observed value is ≥4 (or 5). (c) Considering a BASDAI change ≥3, more than 80% of the doctors will consider there is a flare if the referral BASDAI is ≤4. (d) Considering a BASDAI change ≥3, more than 70% of the doctors will consider there is a flare if the observed value is ≥4 or 5.
Thus the selected preliminary cut-offs for BASDAI are based on an increase of at least two or at least three points, with or without an observed value of at least four (table 3). An additional (combined) definition was derived during the consensus process as follows: if the observed value of BASDAI is ≥4, ‘flare’ is defined as an increase in BASDAI ≥2 points; otherwise, ‘flare’ is defined as an increase in BASDAI ≥3 points (table 3).
In the BASDAI+CRP vignettes overall, the prevalence of ‘flare’ was 77.6% (662 of 852 answers). Not unexpectedly, the analyses suggested a greater role of CRP in defining a flare when the change in BASDAI was ≥2 points than when the change in BASDAI was ≥3 points. In addition, in patients in whom there was no increase of CRP more flares were defined by the physician if the referral value of CRP was abnormal (data not shown). The final decisions that were made were to not propose the association of a change in BASDAI and a change in CRP as a preliminary definition for flare, but rather to focus on the ASDAS that aggregates this information into one score.
The prevalence of the event ‘flare’ was 51.4% (591 of 1150 answers in the ASDAS-CRP vignettes). The ROC curve allowed the selection of three cut-offs for ASDAS-CRP changes: increase in ASDAS-CRP ≥0.6, 0.9 or 1.1. For these three cut-offs, the performances were calculated for different referral and observed values. (a) In contrast to pain and BASDAI, there was no effect of the referral value on the performance of the changes in ASDAS-CRP to define a ‘flare’. (b) Regarding the observed values of ASDAS-CRP at the time of flare, there was also no clear effect of this observed ASDAS value on the performance of the cut-offs. Of note, however, only a few vignettes addressed this issue. Based on expert opinion only, an additional preliminary definition of ‘flare’ based on change in ASDAS associated with an observed value (at the time of flare) of ≥1.3 (ie, not being in inactive disease18) was added (table 3).
This consensus process, instigated by the ASAS group, has led to 12 preliminary definitions of ‘flare’ in axSpA, based on widely used indices. Further steps will allow the assessment of these preliminary definitions on real patient data in order to select the most relevant definition(s). This work is important in the context of clinical trial design, for example, for designing tapering trials, to better define ‘flares’ in future clinical studies.
The initial objective of this initiative was to define a single definition for ‘flare’ in axSpA. However, a discrepancy was found between the definitions of ‘flare’ used in the literature and the results of the ‘case-vignettes’ (in particular, the thresholds to define a ‘flare’ in the ‘case-vignettes’ were higher than the thresholds found in the literature). This led ASAS to decide that it was too early to propose a single definition of ‘flare’. However, based on the results of both the systematic literature research and the vignette exercise, we are able to focus future studies on 12 potential definitions of ‘flare’.
The strengths of this study include an extensive literature review, an extensive vignette process and a strong consensus process, within a well-recognised group of experts in axSpA. A weakness of this study is the limitation of the scenario which does not allow discussions of flares in different subgroups (eg, men vs women; or patients with extra-articular manifestations vs those without). However, the objective of this study was to obtain one simple and uniform definition for ‘flare’ to be used mainly in clinical trials and studies rather than multiple definitions to be applied in different contexts. Vignette exercises have limitations too, since they only reflect a part of all potential information collected in a real patient/physician consultation; in this case, the vignettes were by nature artificial since patients were considered to show variation in only one outcome, all other things being equal, which is not usually the case in clinical practice. However, vignette exercises are well-recognised ways of obtaining input from many participants.19 ,20
The outcomes chosen in the present initiative can be discussed. BASDAI and pain were selected because these were the two most frequently used instruments in the literature to define ‘flares’ in axSpA. The ASDAS score was selected because this is a recent instrument validated in axSpA.16 ,18 As the ASDAS-CRP is the instrument of choice proposed by ASAS, only ASDAS-CRP (not ASDAS based on the erythrocyte sedimentation rate) was used. CRP was selected because a number of studies used this instrument to assess flares in axSpA. However the interpretation of CRP variations alone (ie, in the absence of concomitant changes in symptoms) was difficult, giving rise to discussions, for example, in case of concomitant infections. Finally, NSAID intake was initially explored to be used in a ‘flare’ definition, since it may reflect a worsening of the disease, but the interpretation of isolated changes in NSAID intake was very complex.21 In this vignette exercise, initial levels of symptoms were low to moderate/high since pain could, for example, start at 6/10. In clinical studies, however, most patients will start at low levels, for example, remission. This study does not explore the patient's perspective on flares. Ongoing work in RA has shown that patients and physicians have different perspectives on flares in that disease.9 ,22 In axSpA also, it appears patients and physicians may value disease activity differently.10–12 However, the objective here was not to develop a new score focusing on flares, but rather to define an optimal cut-off value corresponding to a flare or a disease worsening, and applicable to widely used and well-validated outcome measures reflecting disease status in axSpA. It is arguable if a ‘flare’ can be defined solely as a worsening of disease activity. In the present study we assumed a ‘flare’ would indeed be best defined as disease worsening. Of note, we did not give any indication, in the vignette exercise, to the ASAS experts of what they should consider to be a flare (eg, worsening necessitating a treatment change), which may have increased the variability in our results.
For the outcomes used in the present study, cut-off values to define improvement have already been defined.23 However, it is known that minimal clinically important differences are not of the same magnitude when defining an improvement and a worsening. In this regard, this innovative initiative is very much in keeping with the ASAS objectives that aim to provide data-driven approaches to SpA measurement and measure interpretation.
This study focused on the definition of a clinically relevant change in a specific outcome measure reflecting a worsening/deterioration/flare of the disease (ie, minimal clinically important deterioration, MCID), keeping in mind that previously reported studies have proposed definitions of a clinically relevant change reflecting an improvement of the disease (ie, minimal clinically important improvement, MCII). It has been shown in different diseases and for different outcome measures that, for a specific outcome measure of a specific disease, the MCID is usually lower than the MCII.24
For example in RA, a change of at least 1.2 in the Disease Activity Score DAS28-ESR is usually considered an MCII and a change of at least 0.6 an MCID.25 In the field of axSpA, an absolute change in BASDAI of at least 2 points or a relative change of at least 50% have been proposed as an MCII.26
Concerning ASDAS-CRP, changes of at least 1.1 and 2.0 have been proposed to define a clinically important improvement (which is in the current context similar to the MCII) and a major improvement, respectively.18 If we accept the concept that for a specific outcome measure the MCID is at a lower level than the MCII, in our study, the data provided by the SLR might be more relevant than the data from the case-vignette study. The discrepancies observed in our study between the SLR and the case-vignette study might be explained by the fact that the participants in the study (all experts in SpA) were aware of the proposed MCII and unconsciously applied these cut-offs when evaluating a specific scenario.
When discussing flares, the referral status (ie, the patient's status at the time before flare) was arbitrarily defined as a favourable (low activity) status. Indeed, it does not seem rational to define ‘flares’ for patients who are already in high disease activity. The referral status can be inactive disease, remission or PASS (Patient Acceptable Symptom State).18 ,27 The present study does not define the referral status precisely, in order to allow for better generalisability.
The durability of the status of flare was not explored in the present vignette exercise, but ASAS members felt that a ‘flare necessitating treatment intensification’ might be defined as a flare observed at least 2 weeks apart or at least at two consecutive visits. This remains to be further explored.
In conclusion, the preliminary definitions of ‘flare’ given in the present work will now need to be validated on real patient data.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.