Article Text

Extended report
Validation of OMERACT preliminary rheumatoid arthritis flare domains in the NOR-DMARD study
  1. Elisabeth Lie1,
  2. Thasia G Woodworth2,
  3. Robin Christensen3,
  4. Tore K Kvien1,
  5. Vivien Bykerk4,
  6. Daniel E Furst2,
  7. Clifton O Bingham III5,
  8. Ernest H Choy6,
  9. the OMERACT RA Flare Working Group
  1. 1Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
  2. 2Division of Rheumatology, UCLA, Los Angeles, California, USA
  3. 3Departmant of Rheumatology, MSU, The Parker Institute, Copenhagen University Hospital, Frederiksberg, Copenhagen, Denmark
  4. 4Department of Rheumatology Hospital for Special Surgery, New York, New York, USA
  5. 5Division of Rheumatology, Johns Hopkins University, Baltimore, Maryland, USA
  6. 6Section of Rheumatology, Institute of Infection and Immunity, Cardiff University School of Medicine, Cardiff, UK
  1. Correspondence to Dr Elisabeth Lie, Department of Rheumatology, Diakonhjemmet Hospital, P.O. Box 23 Vinderen, Oslo 0319, Norway; elisabeth_lie{at}


Objective Domains identified as a result of qualitative research and Delphi exercises to assess rheumatoid arthritis (RA) flare include pain, function, swollen and tender joints, patient and physician global, laboratory measures, participation, stiffness, self-management and fatigue. Here we examine aspects of construct and content validity of these domains in a longitudinal observational study.

Methods A total of 1195 patients with RA treated with non-biological disease-modifying antirheumatic drugs (DMARDs) or biologics were eligible for the analyses. Working definitions of ‘flare’ included patient-reported worsening between 3 and 6 months (primary) and treatment change at 6 months (DMARDs and/or systemic corticosteroids) (secondary). Available outcome measures were mapped to the flare domains. Changes between 3 and 6 months were compared between patients with and without ‘flare’. Convergent and divergent construct validity and content validity were assessed by correlation analyses and logistic regression analysis, respectively.

Results Applying the flare working definition based on patient-reported worsening, standardised mean differences (SMDs) were >0.5 for the majority of outcomes. The largest SMDs were observed for Pain visual analogue scale (1.30), SF-36 Bodily pain (1.24), Patient global (1.20) and morning stiffness intensity (1.17). The flare working definition based on treatment change yielded lower SMDs (<0.5 for most variables). Consistently stronger intradomain than corresponding interdomain correlations supported convergent and divergent validity of the domains.

Conclusions Probing a flare definition via outcome measures, the identified flare domains discriminated well between patients with and without worsening. Interdomain and intradomain correlation and logistic regression analyses provide further support for construct and content validity of the identified flare domains.

  • Rheumatoid Arthritis
  • Outcomes research
  • Patient perspective

Statistics from


Among both clinicians and patients with rheumatoid arthritis (RA) episodes of worsening disease activity beyond day-to-day variation, often referred to as ‘flares’, are a recognised feature of the disease.1 Qualitative research from patients across several countries has revealed considerable heterogeneity in the signs and symptoms that may constitute a flare.2 ,3 There is currently no established definition or standardised method for measuring flares. It has been acknowledged that such a definition and instrument are needed—both to guide decisions in clinical trials as well as to serve as an outcome in randomised clinical trials (RCTs) and longitudinal observational studies (LOS), for example, in trials in which the protocol involves tapering and/or discontinuation of study medication.1 ,4 ,5 The latter is becoming increasingly important as early application of effective antirheumatic therapies, and particularly targeted therapies including biologics, brings the potential to achieve true disease modification.6

Since 2006, the Outcome Measures in Rheumatology (OMERACT) RA Flare Working Group has worked to develop a data-driven, patient-centred, consensus-based definition of flare in RA for use in RCTs, LOS, as well as in clinical practice.1 ,4 ,5 Through a process involving dedicated patient focus groups in five countries and three rounds of Delphi exercises with both healthcare professionals (HCPs) and patients with RA participating, a set of 14 candidate domains for flare were identified—pain, tender joints, swollen joints, physical function, patient global assessment, physician global assessment, laboratory measures, fatigue, stiffness, participation, self-management, systemic features, sleep and emotional distress.5 ,7 The latter six domains are not part of the core set for assessment of RA,8 ,9 while fatigue was identified by OMERACT patients as important to assess and has been included in the most recent recommendations on reporting of disease activity in clinical trials.10 ,11 In the final Delphi exercise, with both HCPs and patients participating, more than 70% overall agreement was reached on essential or important areas to define a flare for the domains pain, physical function, swollen and tender joints, participation, stiffness, patient global assessment and self-management.7 The key components of the OMERACT filter are the demonstration of truth, discrimination and feasibility through a data-driven approach.12 Our objective was, through analysis of patient reported outcomes (PROs) and clinical data from an ongoing LOS of RA patients, to assess aspects of the construct and content validity of the potential RA flare domains identified through the Delphi process conducted by the OMERACT RA Flare Group.


The NOR-DMARD study

Since December 2000, adult patients (>18 years of age) with inflammatory joint diseases starting treatment with traditional non-biological disease-modifying antirheumatic drugs (DMARDs) and/or biologics in five Norwegian rheumatology departments have been included in the NOR-DMARD study, a LOS with follow-up assessments at 3, 6 and 12 months and then yearly.13 The five centres together cover more than 1.5 million inhabitants, about 30% of the Norwegian population. Follow-up has been based on DMARD courses, and a switch or addition of DMARDs would lead to a new follow-up course. Data collection in all patients include diagnosis, demographics, medication, comorbidities, adverse events, employment, use of healthcare, 28-swollen joint counts and 28-tender joint counts (28-SJC and 28-TJC, respectively), erythrocyte sedimentation rate (ESR) in mm/first hour, C-reactive protein (CRP) in mg/L, 100 mm visual analogue scales (VAS) for physician's and patient's global assessment of disease activity as well as for joint pain and fatigue, the Medical Outcomes Study 36-item Short-Form Health Survey (SF-36),14 the Modified Health Assessment Questionnaire (MHAQ),15 a question about acceptable state at each visit, a transition question about change in disease activity since start of current DMARD treatment (see next paragraph),16 and from 2006 onwards the EQ-5D (EuroQoL),17 the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)18 and Bath Ankylosing Spondylitis Functional Index19 questionnaires, completed by all patients. The disease activity score based on 28-joint counts (DAS28) was calculated with ESR. The study was conducted with approval by the South-Eastern Norway Regional Ethics Committee and the Data Inspectorate. All patients gave written informed consent before participation.


For the current analyses, we selected patients with a clinical diagnosis of RA who were included in the NOR-DMARD study up to the end of 2009. The diagnosis was made by the treating rheumatologist based on clinical judgement. Since patients starting a new DMARD treatment can be assumed to have relatively high disease activity levels, we chose to study changes in flare variables in the interval between 3 and 6 months after initiation of treatment. Further, we wanted to exclude patients who were primary non-responders to treatment and limited our analysis to patients who experienced improvement at month 3: these were the patients answering ‘Much improved’ or ‘Improved’ to the question ‘Since the start of treatment in this follow-up study, has your rheumatic disease improved, remained unchanged or deteriorated’ (ie, a transition question related to baseline, with the five response categories ‘much improved’, ‘improved’, ‘unchanged’, ‘worse’, ‘much worse’).16

Flare variables

To study changes in the proposed flare domains, we introduced three working definitions of flare to serve as ‘anchors’. The primary working definition of flare was based on patient-reported worsening, defined as reporting ‘Worse’ or ‘Much worse’ since start of current DMARD treatment at 6 months, as opposed to ‘Improved’ or ‘Much improved’ at 3 months. In a sensitivity analysis, we also included patients who were ‘Unchanged’ at 6 months in the definition of flare. Second, we included the OMERACT Flare Group working definition based on treatment change, defined as either change in DMARD treatment due to lack of efficacy or any increase in or institution of systemic corticosteroids (oral, intramuscular or intravenous) at 6 months (±1 month).1 ,4 A large proportion of included patients had a routinely dose-increase of methotrexate monotherapy within the first few months, and for subcutaneously injected biologics the data on dosing interval were unfortunately not detailed enough. Consequently, increase in DMARD dose or reduced dosing interval was not counted as change in DMARD treatment for this purpose. Third, a stricter definition, based on the occurrence of both patient-reported worsening and treatment change as defined above, was used. We assessed how each working of definition of flare performed by analysing DAS28 states and changes in patients with and without flare at 6 months by the various definitions.

The NOR-DMARD study included one or more variables to represent, or serve as surrogates for, all the potential flare domains listed above except self-management, sleep and systemic features: Pain (VAS and SF-36 Bodily pain), physical function (MHAQ score and SF-36 Physical functioning), 28-SJC, 28-TJC, participation (SF-36 Social functioning, SF-36 Role limitations physical and SF-36 Role limitations emotional), stiffness (intensity and duration of morning stiffness—questions 5 and 6 from BASDAI, respectively), patient global assessment (VAS), fatigue (VAS and SF-36 Vitality), physician global assessment (VAS), laboratory measures (ESR and CRP) and emotional distress (SF-36 Mental health). As BASDAI data were collected from 2006, stiffness data were available for 39% of patients included in the analyses.

Statistical analyses for assessment of validity

To assess discriminative construct validity 3–6-month changes in the various flare domain variables were compared between patients with and without flare by the various working definitions of flare with calculation of standardised mean differences (SMD) with 95% CIs. These analyses were performed for both the primary and the two secondary working definitions of flare. For the primary working definition, we also calculated the SMD for DAS28 to serve as a reference. To assess convergent and divergent construct validity of variables representing various domains, we performed Spearman correlation analysis between variables representing the same domain (eg, Pain VAS and SF-36 Bodily pain) to assess intradomain correlations, and subsequently, calculated the mean Spearman correlation coefficients between variables representing different domains to assess interdomain correlations. We postulated that strong correlations between variables representing the same domain would support convergent validity and that weak(er) correlations between variables representing different domains would support convergent validity. Content validity was assessed by stepwise forward logistic regression analysis with flare as the dependent variable and variables representing the various flare domains as covariates. The Nagelkerke R2 upon addition of variables was studied, and Hosmer–Lemeshow goodness-of-fit statistics were performed. This analysis was only performed for the primary working definition of flare, based on patient-reported worsening. The sequence for addition of variables was guided by the results from the final Delphi exercise held by the OMERACT RA Flare Group (variable for domain with highest agreement added first),7 and in cases in which more than one variable per domain was available, the variable yielding the highest SMD between patients with and without flare was selected. The analyses were first performed in all patients (with the stiffness variable added last due to missing data) and then in patients with available stiffness data.

Statistical analyses were performed using IBM SPSS Statistics, V.19.0 and Microsoft Excel 2007.


The inclusion criteria were fulfilled by 1457 patients, and 1195 of these had available 6-month follow-up data and could be included in the analyses. The patient characteristics are shown in table 1. Age, gender and the percentage of rheumatoid factor positivity were quite typical for an RA population and the majority of patients were treated with methotrexate monotherapy (table 1).

Table 1

Patient characteristics

Working definitions of flare

Based on the primary working definition of flare, based on patient-reported worsening, there were 79 flares at 6 months. With application of the secondary definition based on treatment change, 162 flares were identified, and among these patients, 63 had change in DMARD treatment due to inadequate efficacy and 110 had increase/initiation of systemic corticosteroids (11 patients had both). The combined working definition of flare yielded 35 flares, that is, the overlap between patient-reported worsening and treatment change. Including cases that were ‘Unchanged’ at 6 months in the definition of patient-reported flare (sensitivity analysis) yielded 221 flares (the overlap with treatment change was only 71 cases).

DAS28 over 6 months

All three working definitions of flare identified patients with worsening disease activity at 6 months as demonstrated by changes in DAS28 over the period (table 2). Patients who flared by the various definitions had, on average, a worsening of DAS28 between 3 and 6 months, while there was little change in patients who did not flare. The mean worsening in DAS28 was more pronounced for the flares defined as patient-reported worsening than for the flares defined by treatment change, and the proportion who had a worsening of DAS28>0.6 was 75% vs 52% (table 2).

Table 2

DAS28 states and changes in patients with and without flare at 6 months according to the three working definitions of flare

Discriminative validity

The 3–6-month changes in variables representing 11 different potential flare domains, in patients with and without flare defined as patient-reported worsening at 6 months, are presented in table 3 (p<0.001 for all comparisons). The SMDs were large (above 0.5) for the majority of variables (table 3). The largest SMDs were observed for Physician global (1.31), Pain VAS (1.30), SF-36 Bodily pain (1.24) and Intensity of morning stiffness (1.17) (table 3 and figure 1A). Comparatively, the SMD for DAS28 was 1.26. For the secondary working definition of flare based on treatment change, the SMDs were substantially smaller and <0.5 for all variables except Physician global (0.58) and Patient global (0.53) (figure 1B). For the combined working definition of flare (patient-reported worsening and treatment change; 35 flares), SMDs were again larger and largest for Intensity of morning stiffness (2.18) followed by SF-36 Bodily pain (1.44) and Patient global (1.25) (figure 1C). Results from the sensitivity analysis are shown in online supplementary figure S1. The SMDs were, as expected, lower than for the primary working definition of flare (figure 1A), but higher than for the definition based on treatment change (figure 1B) and above 0.5 for 11 of 18 variables (see online supplementary figure S1).

Table 3

Three-to-six-month changes in flare domain variables (grouped according to domains)

Figure 1

SMDs of changes in patients with and without flare by various definitions. (A) Patient-reported worsening, (B) Treatment change and (C) Patient-reported worsening and treatment change. The red lines represent the 0.5 level above which SMDs are considered large. SMD, Standardised Mean Difference, VAS, Visual Analogue Scale; SF-36, Short Form-36; MHAQ, Modified Health Assessment Questionnaire; SJC, Swollen Joint Count; TJC, Tender Joint Count; ESR, Erythrocyte Sedimentation Rate; CRP, C-Reactive Protein.

Convergent and divergent validity

The intradomain and interdomain correlations are displayed in table 4. The interdomain correlations (ie, mean Spearman correlation coefficients between the variable(s) representing a certain domain and variables representing all other domains) were generally weak, with a small mean coefficient (<0.3) for all domains, except patient global, pain and stiffness. For domains represented by more than one variable (eg, Fatigue VAS and SF-36 Vitality for fatigue), intradomain correlations were consistently stronger than interdomain correlations (table 4).

Table 4

Intradomain and interdomain correlations to address convergent and divergent construct validity

Content validity

In the logistic regression analysis, the Nagelkerke R2 increased gradually from 0.28 for the model with only one independent variable (Pain VAS) to 0.52 (52% of the total variation explained) for the model with variables representing the domains pain, function, swollen joints, participation, tender joints, patient global, physician global and emotional distress (Hosmer–Lemeshow goodness-of-fit test p=0.51) (see online supplementary table S1). With addition of the stiffness variable, the Nagelkerke R2 was 0.48 (Hosmer–Lemeshow p=0.69). The R2 increased by 0.06 when adding function and by 0.05 when adding swollen joints, and increased less when adding additional variables (see online supplementary table S1). In the analysis performed only in patients with available stiffness data, the model Nagelkerke R2 increased from 0.35 with only pain included to 0.48 when variables representing all the 11 domains listed above were included (see online supplementary table S2).


The current post-hoc analyses were conducted to examine whether the previously identified candidate flare domains fulfil the OMERACT filter of Truth, Discrimination and Feasibility.12 Although the outcome measures included in the NOR-DMARD were not specifically designed for assessing these domains, useful surrogates were available for most of the domains, so that content and construct validity and discriminative ability could be addressed. We found that all domains tested worsened considerably more in patients defined as ‘in flare’ compared with those not in flare, with the possible exceptions of the domains laboratory measures and emotional distress. Furthermore, results from interdomain and intradomain correlation and logistic regression analyses gave further support to the construct and content validity of the identified flare domains. Although positive results could be regarded supportive in this regard, negative results should be interpreted with some caution as they may reflect the performance of the instrument rather than domain and the potential problems with operationalisation of flare (addressed below).

The current study has several potential limitations that merit further discussion. First, the working definitions of flare used as ‘anchors’ are inherently arbitrary and limited by which data were available in the study. For the primary definition (patient-reported worsening), a transition question about change in state since previous visit would have been preferable, rather than a question related to change compared with baseline (relating 6-month responses to 3-month responses), being more intuitive and having less problem with recall bias. However, the definition that we used seemed relatively robust, given the size of the SMDs observed. This working definition may by its construction have selected a subgroup of more ‘severe flares’. The lower SMDs that were observed with a modified version of this flare definition in the post-hoc sensitivity analysis support this, but still the majority of the SMDs were above the 0.5 level with this modified definition, supporting the validity of the domains. Using patient-reported worsening as the anchor definition will probably also favour PROs over joint counts, acute phase reactants and the physician global.

The definition based on treatment change, that is, change in DMARD treatment due to inadequate efficacy or any increase in systemic corticosteroids, also identified patients who worsened, but performed poorly than expected. This is possibly due to the inclusion of some patients with primary non-response to treatment, even though we tried to limit this through the selection criteria for patients included in the analysis. Further, as many as 44 patients reported worsening without a treatment change. A possible explanation is that these data were accumulated over a 9-year period from 2000 and that the treatment approach might have been less aggressive in the beginning of the period than it is now, after implementation of the ‘treat-to-target’ principle. Our definition of flare based on treatment change did not include intra-articular corticosteroid injections since the data collected on this were not detailed enough.

Furthermore, the shortest time interval that we could study was 3 months, and episodes of flare occurring between visits might have been missed. Of the domains included in the core domain set for RA flare, self-management was the only domain that could not be addressed through our study. In this setting, the domain of self-management refers to actions of patients in response to flares, including both pharmacological and non-pharmacological interventions. These self-management strategies may vary from patient to patient and from flare to flare and are dependent upon many additional contextual factors2; however, there was no instrument or surrogate included within the outcome measures available within our study. In addition, for those domains covered, the method of assessment might not have been ideal in reflecting a given domain. Specifically, this may concern participation and fatigue. In the main analyses as well as for the flare definition based on both patient-reported worsening and treatment change, we found that Intensity of morning stiffness seemed to be more discriminative than Duration of morning stiffness. The CIs were, however, rather wide, since the stiffness data were only collected from 2006 onwards. More data on the value of measuring stiffness intensity versus duration as well as wording regarding ‘morning stiffness’ versus merely ‘stiffness’ and generalised stiffness versus localised joint stiffness are needed to determine how stiffness should best be measured in the context of RA flare.

In the Delphi exercises leading up to this work, there was notable discordance between patients and HCPs for some domains; a high proportion of HCPs considered laboratory results important while fewer than half of the patients did, while domains such as self-management and fatigue were more strongly favoured by patients. Many HCPs, and physicians in particular, might argue that a flare definition could be solely based on an increase in synovitis, but the extensive involvement of patients in the work of the RA Flare Working Group has demonstrated that the symptomatology of flare is more complex and heterogeneous.2 Thus the patient and HCP perceptions of flare can be considered complementary.20 Several different DAS28-based criteria for flare have been used in studies over the past few years, and these were recently validated by van der Maas et al21 These criteria are generally based on reversing of more established criteria for improvement and low disease activity state/remission, which might not be an ideal way to measure episodes of worsening. Further, it is currently unknown if the DAS28-based criteria are adequate to capture flare episodes as described by patients.

An RA flare core domain set consisting of physical function, swollen and tender joints, patient and physician global assessments of disease activity, fatigue, laboratory measures, stiffness, participation and self-management was recently endorsed by OMERACT.22 The next phase in the effort to develop a flare measure involves testing of existing or new items/instruments for each of the domains considered important so that these items can eventually be combined to a flare index. The current results contribute to this process.

In conclusion, the OMERACT provisional flare domains were discriminative for patients with and without flare by various working definitions. The current analyses also supported the construct and content validity of the domain set. Furthermore, the validity of the domains not included in the American College of Rheumatology (ACR) core set but identified through the Delphi process, that is, fatigue, stiffness and participation, was supported, but there is still the possibility that these domains are redundant. Further validation of domains and items is needed and prospective collection of specific flare data in several RCTs and LOS is currently underway.


The authors thank the patients for participating in this study and the local rheumatology staff for data collection.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Gerd R Burmester

  • Contributors Study design: EL, TGW, RC and EHC; Data acquisition: TKK; Data analysis: EL, RC and EHC; Manuscript preparation: EL, TGW, RC, TKK, VB, DEF, COB III and EHC; all authors have approved the final manuscript.

  • Funding There was no specific funding for the current work. The NOR-DMARD study has received unrestricted grant support from Abbott, Amgen, Wyeth/Pfizer, Aventis, MSD, Schering-Plough/Centocor, Bristol-Myers Squibb, UCB, Roche and the Norwegian Directorate for Health and Social Affairs. RC: The Musculoskeletal Statistics Unit, The Parker Institute is supported by grants from the Oak Foundation.

  • Competing interests TGW, RC, VB, DEF, COB and EHC are members of the OMERACT RA Flare Steering Committee. EL was the OMERACT 11 fellow of the OMERACT RA Flare Working Group. EL has received honoraria as speaker and/or consultant from Roche, Pfizer, Bristol-Myers Squibb and Abbott (not related to flare in RA). TKK has received honoraria as speaker and/or consultant from Abbott, Bristol-Myers Squibb, MSD/Schering-Plough, Pfizer/Wyeth, Roche and UCB (not related to flare in RA). VB has received consultancy honoraria from Amgen, Pfizer, UCB, Bristol-Myers Squibb and Genentech/Roche (not related to flare in RA). DEF has received grant/research support, consultant honoraria, speaker's bureau and/or other honoraria from AbbVie, Actelion, Amgen, Bristol-Myers Squibb, Biogen Idec, Centocor, Gilead, GSK, Janssen, NIH, Novartis, Pfizer, Roche/Genentech, UCB (not related to flare in RA). The remaining authors do not declare any financial relationships that are relevant to flare in RA.

  • Ethics approval South-Eastern Norway Regional Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.