Article Text


Extended report
Identifying core domains to assess flare in rheumatoid arthritis: an OMERACT international patient and provider combined Delphi consensus
  1. Susan J Bartlett1,2,
  2. Sarah Hewlett3,
  3. Clifton O Bingham III2,
  4. Thasia G Woodworth4,
  5. Rieke Alten5,
  6. Christoph Pohl5,
  7. Ernest H Choy6,
  8. Tessa Sanderson3,
  9. Annelies Boonen7,
  10. Vivian Bykerk8,
  11. Amye L Leong9,
  12. Vibeke Strand10,
  13. Daniel E Furst11,
  14. Robin Christensen12,
  15. The OMERACT RA Flare Working Group
  1. 1Department of Medicine, McGill University, Montreal, Canada
  2. 2Department of Rheumatology, Johns Hopkins, Baltimore, Maryland, USA
  3. 3Faculty of Health and Life Sciences, University of the West of England, Bristol, UK
  4. 4Leading Edge Clinical Research, Stuart, Florida, USA
  5. 5Department of Internal Medicine, Rheumatology, Schlosspark-Klinik, Teaching Hospital Charite, Berlin, Germany
  6. 6Department of Rheumatology, Cardiff University, Cardiff, UK
  7. 7Department of Internal Medicine, Division of Rheumatology, Maastrich University Medical Center, Maastricht, The Netherlands
  8. 8Hospital for Special Surgery, Weill Cornell Medical School, New York, New York, USA
  9. 9Healthy Motivation, Santa Barbara, California, USA
  10. 10Division of Immunology and Rheumatology, Stanford University, Palo Alto, California, USA
  11. 11Department of Rheumatology, University of California at Los Angeles, Los Angeles, California, USA
  12. 12Copenhagen University Hospital at Frederiksberg, The Parker Institute, Copenhagen, Denmark
  1. Correspondence to Dr Susan J Bartlett, Department of Medicine, McGill University, Royal Victoria Hospital, Ross 4.31, Montreal, Quebec H3A 1A1, Canada; susan.bartlett{at}


Objective For rheumatoid arthritis (RA), there is no consensus on how to define and assess flare. Variability in flare definitions impairs understanding of findings across studies and limits ability to pool results. The OMERACT RA Flare Group sought to identify domains to define RA flares from patient and healthcare professional (HCP) perspectives.

Methods Flare was described as a worsening of disease activity of sufficient intensity and duration to consider a change in therapy. International patients and HCPs participated in separate and combined rounds of Delphi exercises to rate candidate flare domains previously generated in patient focus groups. Core domains were defined as those with ≥70% ratings of being ‘essential’ according to the third/final Delphi exercise.

Results The final Delphi included 125 RA patients from 10 countries and 108 HCPs from 23 countries who rated 14 domains. Patients had a mean (±SD) age of 56±12 years and disease duration of 18±12 years. HCPs included physicians from clinical practice/research and industry, allied health providers and researchers with 17±11 years experience. Core domains comprised: pain (93%), function (89%), swollen joints (84%), tender joints (81%), participation (81%), stiffness (79%), patient global assessment (76%) and self-management (75%). Fatigue (68%), which did not reach group consensus, will receive additional consideration.

Conclusions As part of the process to develop a measure for RA flare, patients and HCPs agreed on eight core domains. Next steps include identifying items to assess domains and conducting studies to validate and refine a new measure.

Statistics from

Rheumatoid arthritis (RA) is a chronic, disabling inflammatory condition affecting 1% of the population.1 Flares are common and reflect episodes of increased disease activity beyond normal day-to-day variation.2,,4 Importantly, flare may mean different things to patients and healthcare professionals (HCPs) who often have different perspectives5 ,6 and perceptions.7,,9 No standardised criteria exist to define or assess flares in RA.2 ,4 Across clinical trials, flare definitions include a mix of clinical and patient-reported outcomes (PROs), often using arbitrary thresholds.3 Variability in definitions impairs interpretation of findings and limits ability to pool results to address questions of efficacy and safety.2,,4 ,6 Even the use of thresholds for common disease activity indices (eg, disease activity score (DAS), clinical disease activity index (CDAI) or inverse American College of Rheumatology (ACR) or European League Against Rheumatism (EULAR) response criteria) are problematic, as these indices often do not adequately capture change across the disease activity spectrum or in the direction of worsening.2 ,10 ,11 Current approaches to assess worsening or flare emphasise clinical features with limited incorporation of patient-centred outcomes, including, at most, global assessments, pain and physical function.3 ,5 ,12 Recent recommendations emphasise including the patient perspective using a full range of patient-valued domains in outcomes research.5 ,13,,15

Thus, a standardised RA flare definition is needed for research and clinical care that combines both clinician and patient perspectives.2,,4 ,6 ,12 An optimal definition would represent clinically meaningful events, have pathophysiological relevance and facilitate development of a measure to reliably classify patients across clinicians and settings. In trials, a standardised definition could guide decisions about remission and loss of efficacy. In clinics, it could facilitate clinician–patient communication, promote ‘tight control’, and signal the adequacy of treatment. In research, a standardised definition could enhance understanding of potential mechanisms, critical thresholds, impact of treatment, relationship with co-morbidities and other factors (eg, self-management) on outcomes. Similar efforts to define flares are underway for lupus,16 ,17 ankylosing spondylitis,7 ,9 ,18 ,19 juvenile arthritis17and gout.20 ,21

An initial goal of the Outcome Measures in Rheumatology (OMERACT) RA Flare Group was to achieve consensus among an international group of clinicians and RA patients of key features that characterise RA flares.2 We have previously reported on parallel activities in patients and HCPs conducted over 3 years (online supplementary figure S1), integrating patients at all stages.2,,6 In this study, we present results of consensus Delphis with patients and HCPs on key domains. Ethical/scientific approval was obtained at a lead institution (08/H0702/67), followed by site investigators.


Identifying flare characteristics

While flare may be used to describe any increase in disease activity, we sought consensus on a definition representing worsening of signs and symptoms of sufficient intensity and duration to consider a change in therapy.2,,4 ,6 We began with focus groups of RA patients (n=67) across five countries (UK, Germany, USA, Canada, Australia) with qualitative analysis to identify relevant features.5 Investigators including patient research partners identified 48 descriptors representing 14 domains. Next, Delphis were conducted with patients and HCPs.6

Delphi processes

The Delphi technique facilitates consensus with geographically dispersed participants through sequential questionnaire rounds. Participants rate the importance of items; those achieving >70% consensus are taken into the next round until agreement is reached.6 ,22

Patient Delphi

The patient Delphi was conducted from March to May 2010. Focus group participants, OMERACT RA patients and other individuals with RA were contacted by letter, e-mail or by investigators. Sociodemographic and disease information were requested, followed by three tasks. First, respondents were asked to ‘think about flare as the point where your arthritis is so active that you need to ask either for a review, or a change or increase in medication’. Items representing the domains identified in focus groups were presented. Two versions were used that presented items and domains in different orders. Patients rated items as ‘essential for deciding you are in flare (you can't imagine being in a flare without it)', ‘important … (you could still be in a flare without it)’ or ‘not important … (it doesn't help you decide)'. Second, patients ranked up to six items ‘that are absolutely essential for you to decide that you are in flare’. Finally, participants rated 11 early warning signs of flare (previously generated by patients), 5 and ranked up to six in order of importance (French participants did not complete this section).

HCP Delphis 1 and 2

Delphis were conducted concurrently with the patient Delphi. In HCP Delphi 1, OMERACT participants, rheumatologists and other professionals completed an online survey. The stated goal was ‘to seek additional guidance to refine our list (generated from patient focus groups) … and identify additional potential domains that may be important in defining flare from the patient and investigator perspectives’. Sociodemographic and occupational information was requested, followed by three tasks. Domains (rather than items) from the patient focus groups5 were presented along with clinical indicators including tender joint count (TJC), swollen joint count (SJC), laboratory results (labs) and patient and physician global assessments (PGA and MDGA, respectively). First, HCPs rated domains as ‘essential for a definition of flare for clinical trials (can't imagine a definition… without it)', ‘important for a definition of flare for clinical trials (a patient could still be in a flare without it)’ or ‘not important for developing a definition of flare for clinical trials (it doesn't help you decide)', with space for additional suggestions. Second, HCPs rated RA indices (DAS, CDAI, ACR) as ‘essential,’ ‘important’ or ‘not important’ flare indicators. Finally, potential anchors (withdrawal, worsening, flare as an explicit term in adverse events, disease-modifying antirheumatic drug (DMARD) drug change, DMARD titration, re-treatment) were explored for use in secondary data analysis. For HCP Delphi 2, domains with low importance in the HCP Delphi 1 were dropped. Participants were told that five domains had reached consensus for the preliminary core set: SJC (88%), pain (83%), PGA (82%), TJC (76%) and physical function (72%) and the remaining domains were presented for reconsideration. HCPs ranked their top six domains in order of importance and indicated which ‘should be included in a central core of domains’ with space for comments.

Preliminary Analysis

The percentage of patients selecting items in patient Delphi and domains in HCP Delphi 2 were considered together. Domains with ≥90% agreement (essential or important) represented the preliminary core set. These and other high-scoring domains were reconsidered in Delphi 3, using patients' words5 as exemplars to clarify domain headings.

Combined Patient and HCP Delphi

The final Delphi was conducted with patients and HCPs from October to December 2010. A one-page summary (9.3 grade reading level) presented the domains and proportion of patients and HCPs rating each as essential to defining flare. The stated goal was ‘to work as a single group … and together decide which of these issues are ESSENTIAL features of flare.’


Previous patient participants were invited by letter, e-mail or in clinic to complete the online survey or a paper version, with additional patients from previous OMERACT meetings, participating clinic sites and patient organisations. Previous HCP participants and colleagues were invited to complete the survey online.

HCPs were asked to ‘…Think about a flare where RA is so active that the patient needs to ask either for a visit or call to the doctor or a change or increase in medication.’ RA patients were asked to ‘Think back to a time when you were in flare—what were the KEY signs and symptoms of your RA that made you decide you were in a flare?’ Respondents rated the domains as essential or not essential to the definition. Two survey versions presented the domains in different orders.

Statistical analysis

We decided a priori that the primary analyses would be based on the combined data; domains with >70% consensus would represent the core set.6 Missing responses were coded as ‘not essential.’ The Z test was used to test for the null hypothesis (ie, equal proportion ranked the domain essential). Within groups, empirical cumulative distributions were calculated to obtain a weighted mean, and for the purpose of sensitivity analysis.


Separate patient and HCP Delphis

A total of 148 patients completed patient Delphi 1 (table S1). Six domains were selected by ≥90% as essential/important to defining flare (pain, physical function, fatigue, stiffness, systemic features and participation); 28 items representing eight domains were similarly classified by 70–89% of patients and moved forward to Delphi 3 (online supplementary table S2). Early warning signs were identified by 77 (62%) participants (online supplementary table S3). One hundred HCPs completed Delphi 1 (online supplementary appendix 1) and 77 completed Delphi 2. In HCP Delphi 1, pain, TJC, SJC, physical function and MDGA were rated by ≥90% as essential/important and were taken to Delphi 3 (online supplementary appendix 2). Labs, fatigue, stiffness and sleep were carried forward to Delphi 2. Several items within domains were combined and also carried forward: PGA (five), participation (four), self-management (two), systemic features (two) and emotional distress (two). Lowest scoring items (imaging, intimacy, cognitive function) were dropped. Thus, 14 domains were included in Delphi 3.

Combined patient and HCP Delphi 3

A total of 233 respondents (108 HCPs and 125 patients; table 1) from 24 countries completed Delphi 3. Patients were mostly women, well educated and had established RA (range 2–55 years); 8.3% were diagnosed ≤3 years. HCPs were mostly physician scientists, but also allied health providers, epidemiologists and industry researchers. Most (>82%) had participated in the earlier Delphis. Core domains identified included: pain (93%), physical function (89%), SJC (84%), TJC (81%), participation (81%), stiffness (79%), PGA (76%) and self-management (75%). Figure 1 presents the pooled proportions and relative strength of agreement; bubble size reflects the relative precision of each estimate.

Figure 1

Pooled proportions and relative strength of agreement for domains to define rheumatoid arthritis flare among patients and healthcare professionals who care for them. Bubble size reflects the relative precision of each estimate. Domains reflecting the 1993 American College of Rheumatology preliminary core set for disease activity measures for rheumatoid arthritis clinical trials are shown in blue.23

Table 1

Selected characteristics of rheumatoid arthritis patients and healthcare professionals

Patients' and HCPs' results were compared (table 2). There were no statistically significant differences (p >0.05) between groups for pain, function, TJC and stiffness domains. Patients were significantly more likely to classify participation and self-management as essential, whereas HCPs were more likely to rate SJC and PGA as essential. Fatigue was classified by 68% as an essential/important domain (76% patients, 60% HCPs; p=0.011). Significant discrepancies were evident between groups on remaining domains. Patients were more likely than HCPs to rate systemic features, sleep and emotional distress as essential, while more HCPs rated labs and MDGA as essential.

Table 2

Proportion of patients with rheumatoid arthritis and healthcare professionals rating candidate domains as ‘essential’ or ‘important’ to defining flare in rheumatoid arthritis

We conducted exploratory analyses to examine homogeneity of effects within domains by group. Among patients, age, education, duration and language were significant effect modifiers. Older patients (>60 years) were significantly more likely to rate MDGA (62% vs 25%, respectively; p<0.001) and labs (59% vs 38%; p<0.02) as essential to defining flares. Persons with high school education or less were more likely to rate emotional distress (73% vs 50%; p=0.01), MDGA (54% vs 34%; p=0.04), TJC (95% vs 80%; p=0.02) and labs (61% vs 39%; p<0.02) as essential, with a trend evident in stiffness (90% vs 76%; p=0.06). Patients with longer disease duration (>10 years) were more likely to rate fatigue (82% vs 66%; p=0.05) as essential, with a trend in systemic features (75% vs 58%; p=0.06). Native English speakers were less likely to rate emotional distress (51% vs 71%; p=0.04) and more likely to rate fatigue (83% vs 61%; p=0.01) as essential. Among HCPs, sex, age, experience, previous Delphi and OMERACT participation and native English speaking were significant effect modifiers. Women HCPs were significantly more likely than men to rank self-management (74% vs 52%; p=0.02) as essential. Older HCPs (>50 years) were significantly more likely to rank fatigue (71% vs 52%; p=0.05) as essential; conversely, there was a trend for less experienced HCPs (≤15 years) to rank sleep as essential (42% vs 25%; p=0.07). Previous Delphi experience (yes/not sure or no) was associated with a significantly greater likelihood of endorsing fatigue as an essential domain (65% vs 70% vs 35%; p=0.04). Native English speakers and OMERACT 10 participants were less likely to rate labs (65% vs 86%; 68% vs 85% respectively, p<0.05) and more likely to rate function (93% vs 80%; p=0.04) as essential.

Because group sizes were unbalanced, weighted analyses were performed. There was no change in the pooled proportion among core domains (pain, physical function, swollen joint and TJC) or fatigue. Scores changed by 1% in seven domains (participation, stiffness, patient and MDGAs, labs, systemic features and emotional distress) while self-management and sleep decreased by 2%.


We used a rigorous, iterative mixed-methods approach to identify consensus among an international group of RA patients and HCPs on eight essential domains describing RA flare. There was close agreement among all on pain, physical function, stiffness and TJC. Patients were more likely than HCPs to rate participation and self-management as essential; conversely, HCPs rated SJC and PGA more highly than patients.

Consensus was not reached for several domains. Fatigue just missed our predefined threshold.2 ,4 ,6 Fatigue was highly endorsed by patients overall, especially those with longer disease duration. However, only 60% of HCPs rated it as essential to describing flare. Older HCPs and participants in earlier Delphis also rated fatigue more highly. Greater disagreement was evident in other areas. Although MDGA and labs were commonly endorsed by HCPs, they did not achieve group consensus; notably these are two of seven ACR Core Set Preliminary Criteria for Disease Activity.23 Conversely, patients rated PROs (systemic features, sleep, emotional distress) more highly than HCPs, although these also failed to reach the threshold for group consensus for core domains. In subgroup analyses, HCPs and older patients and those with less education were more likely to endorse the importance of traditional clinical variables (MDGA, labs, TJC) as essential indicators of flares.

There have been ongoing calls to include additional domains in the RA core set for disease activity. In 2007, OMERACT attendees recommended that fatigue should be viewed as a core measure and be included in all trials.24 ACR and EULAR have also endorsed fatigue as an important domain for inclusion in clinical trials.25 However, fatigue did not quite reach overall consensus, and therefore requires further exploration as to its ability to reflect flare. EULAR investigators recently developed a patient-derived composite response index – RA impact of disease. Gossec et al queried 96 patients across 10 countries to identify domains (pain, disability, fatigue, emotional well-being, sleep, coping and physical well-being); preference weights were established subsequently in 505 patients.26 Ratings were similar across countries, patients and disease activity levels. These results suggest that while there is overlap between domains used to assess RA impact and disease flare, there are also important differences that indicate the need for specific research toward understanding and measuring RA flares.

The lack of agreement between patients and HCPs in several domains highlighted in figure 1 is significant and underscores enduring differences in perspectives of both groups. Despite growing consensus of the importance of understanding disease impact through the patient's perspective,12 ,15 ,27 little advancement has been made. The 1993 ACR Preliminary Core Set includes four clinical indicators and three PROs.23 Calls to expand this core set began soon after its release. At OMERACT 6 in 2002, patients and providers both indicated that the patient perspective needed to move beyond the three traditional PROs.28 Our results suggest that RA continues to be viewed by many HCPs through the fairly narrow lens of traditional clinical indicators, and that the ACR Core Set fails to address multiple domains that patients have identified as important to them.29 Indeed, the current clinical perspective has shifted remarkably little in 20 years. In part, this underscores the challenge of capturing the multidimensional nature of disease activity, including flares, as well as a paucity of measures available to capture several domains. For example, self-management may prove especially challenging to measure as critical elements may differ among patients and over time. Indeed, the PGA may represent an aggregation of patient concerns. Continued reliance on traditional clinical indices restricts the focus to a few of the domains that patients have identified as important to them. The 2011 ACR/EULAR Definitions of Remission in RA Clinical Trials include PGA as the only PRO.30 To improve the lives of RA patients, we must move beyond traditional indicators and incorporate dimensions that patients identify as essential to them into clinical definitions.

This work provides preliminary face, construct and content validity toward a universal definition of RA flare. This is an important first step toward establishing ‘truthfulness’ when developing a comprehensive measure of flare incorporating patient and HCP perspectives for use in research and clinical settings. Study strengths include the large, multi-national sample of RA patients and providers to facilitate a global perspective. Our qualitative work with patients across five countries to inform domain selection reflects necessary first steps when developing measures with strong psychometric properties. While the representativeness of our convenience samples is unclear, we purposefully sampled across the spectrum of relevant attributes. Most patients had longstanding disease and reported low-moderate disease activity when completing Delphis; validation of these domains in early RA is currently underway. The impact of language, disease activity and other effect modifiers in patients warrants further study. Little is known about response shift in RA PROs.

Our results operationalise a flare definition in RA where a change in therapy may be indicated and complement those of Berthelot et al31 who are developing a PRO querying any disease activity change over the past 3 months; similarities and differences in methods and results are described elsewhere.12 More work is required across the flare spectrum in this era that targets tight control and remission as therapeutic goals. The identification of early warning signs, experienced by two-thirds of patients, merits further study. Flare reflects a dynamic state, and optimal methods to describe episodes are needed. Standardised definitions and measures can aid in identification of meaningful thresholds for clinical evaluation and evaluation of key outcomes such as joint damage, disability and quality of life.

Currently, no RA disease activity measure exists that assesses the eight individual core domains. The next steps to creating a flare measure are to identify existing validated questions or develop new ones to query domains. Rigorous testing across patients with all levels of disease activity, severity, disease duration, in diverse settings and in comparison with existing measures will be required for validation. Until then, it may be possible to identify a combination of validated clinical markers and items in PROs to potentially assess relevant flare domains.


The authors are indebted to patient and healthcare provider participants and James May. The authors also wish to thank Bruno Fautrel, Lyn March and Marita Cross for collecting data from French and Australian participants; David Dowe of iDENK, and research personnel at each site where data was collected.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Funding The following provided financial, logistical and/or technical support for this project: OMERACT, Amgen, Bristol Myers Squibb, Centocor, iDENK, Pfizer, Roche, UCB and XOMA. SB and COB funded through unrestricted grants from The Ira T. Fine Discovery Fund, and the Johns Hopkins Arthritis Center Discovery Fund. RC reports that Musculoskeletal Statistics Unit, The Parker Institute is supported by unrestricted grants from the Oak Foundation.

  • Competing interests None.

  • Ethics approval Barking and Havering NHS Research Ethics Committee ref 08/H0702/67.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.