Objective: To use an evidence-based and consensus-based approach to elaborate recommendations on how to report disease activity in clinical trials of patients with rheumatoid arthritis (RA) endorsed by the European League Against Rheumatism (EULAR) and the American College of Rheumatology (ACR).
Methods: After an initial expert meeting, during which relevant research questions were identified, a systematic literature search was performed using Medline, Embase and the Cochrane Library as sources. To ensure literature retrieved was comprehensive, we emphasised search algorithms that were sensitive rather than specific. The results of the literature search were discussed by the expert panel, modified and expanded, and were used as the basis for the elaboration of the recommendation in the consensus process. Finally, an independent ACR panel approved these items with some minor modifications.
Results: The following pieces of evidence were obtained from the literature search: (1) timing and the sustaining of a response is relevant to achieve better outcomes; (2) composite disease activity indices have been used to define low disease activity and remission and these definitions have been validated as has the American Rheumatism Association (ARA) remission criteria. The “patient-reported symptom state” (PASS) is not yet well validated; (3) evidence was obtained to identify those measures, scales and patient-reported instruments, for which there is a documented association with relevant outcomes; (4) baseline disease activity is associated with disease activity levels at the end of follow-up; and (5) there was not sufficient evidence relating the added benefit of MRI or ultrasound over clinical assessments. Most data stemmed from observational studies rather than clinical trials and literature review was supplemented by input from experts. The results served as the basis for the elaboration of the seven recommendations by the experts.
Conclusions: The approach based on scientific evidence from the literature as well as on expert input provided sufficient information to derive recommendations on reporting disease activity in RA clinical trials. The methodology, results and conclusions of this project were endorsed by EULAR and the ACR.
Statistics from Altmetric.com
Evaluation of disease activity in rheumatoid arthritis (RA) is more complex than in many other diseases. RA has many facets, and different domains may all potentially reflect disease activity. Over a decade ago, core sets were established to harmonise the reporting of disease activity in clinical trials of RA, and many different composite indices and sets of response criteria based on these core sets have been introduced to the scientific community.1–3 While designs of clinical trials mostly adhere to the presentation of the core set variables and some composite activity scores and response criteria, the way these are presented varies considerably. Consequently, making comparisons between trials (for example by virtue of effect sizes) is difficult. Given the plethora of instruments and approaches and the heterogeneity of trial reporting, it was the aim of a specific task force to develop recommendations on how to report disease activity status and response in clinical trials of patients with RA.
The European League Against Rheumatism (EULAR) has adopted a systematic approach to develop recommendations4 that was followed in the current project. These include the following steps: (1) constitution of an expert panel in the respective area of research; (2) definition of relevant research questions by the panel; (3) a systematic literature search addressing these research questions; (4) presentation of results from the literature search to the expert panel; and (5) consensus finding on recommendations based on the results from the literature search and experts.
This publication deals with the methodology of the project (steps 1 to 4), while the final recommendations (step 5) are presented in a simultaneous publication in the current issues of the Annals of the Rheumatic Diseases5 and Arthritis & Rheumatism (Arthritis Care & Research).6
1. Expert panel
According to the EULAR standardised operating procedures, three individuals have predefined roles in the panel. The convenor (DA), is an expert in the respective field of research; the epidemiologist (RL) has fundamental expertise in the methodological aspects of the project; and the research fellow (TK) is responsible for the literature search.
Experts were identified based on their knowledge in the area of outcomes research, as documented by respective publications. In this project, experts from several European and North American countries, as well as patients with RA were invited to attend the initial meeting in November 2006. Only one patient was able to attend (PR).
Given the collaborative effort of EULAR and the American College of Rheumatology (ACR) on a joint set of recommendations, an additional role of the official ACR representative was defined (DF) as the liaison to a respective panel of the ACR (see below).
A total of 19 additional experts agreed to participate and joined the panel, 15 from Europe (MB, SB, BC, MD, PE, JG-R, TCK, EM-M, MM-C, PvR, JSS, TS, DvdH, RvV, AZ) and 4 from North America (EC, JS, MW, GW)
2. Generation of relevant questions for the literature search
The process of elaboration of recommendations started with a selection process to identify relevant questions for a systematic literature search. The questions were generated during the first meeting. Therefore, the panel was divided into 3 breakout groups and each group was asked to contribute 10 questions relating to each of the 3 following general topics regarding disease activity evaluation in clinical trials of RA: content (“Which elements reflect disease activity?”), analytical framework (“How should data on these elements be analysed in clinical trials?”) and contextual framework (“What are relevant variables/conditions that modify the results observed for given elements?”).
In a subsequent selection process using a modified Delphi technique,7 the number of questions was reduced. In the first round each expert was asked to select the 15 most important questions from the list. Questions with ⩾80% positive votes were immediately accepted, and those with ⩽20% positive votes deleted from the list. Questions receiving more than 20% but less than 80% of the votes entered the next Delphi round, in which the process was repeated. After the second round, 7 questions had been selected, and 12 had been eliminated. After the third Delphi round, which yielded a final set of 11 questions, the process was stopped. In a plenary session the wording of the questions was discussed and, if applicable, modified to meet the requirements of a systematic literature search. As can be seen in table 1, questions from all three general topics were represented in the list of final questions.
3. Literature search
A systematic search of the literature published up to January 2007 was performed using the Cochrane Library, Medline and Embase. For the Medline search Medical Subject Heading Search (MeSH) terms were also used to obtain more specific results. Retrieved items were screened for eligibility based on title, abstract and/or full content. Reference lists of the included studies were also examined for further relevant publications.
Evidence was categorised according to the study design using a traditional hierarchy.4 The questions posed were answered with the best available evidence.
4. Presentation of results
The results of the literature search were distributed to the panel before the second meeting in May 2007 giving members the opportunity to comment on the results and to provide additional references that might have been missed by the literature search. Only for a few questions (3, 4 and 9) were the majority of studies identified by the targeted literature search. For most questions the majority of studies were identified during the search on other questions or by the experts. Finally, additional references and suggestions made by the participants of a meeting of experts selected by the ACR in August 2007 were also incorporated in the manuscript. Below, we will present the summary of the results of this process for each of the questions.
Question 1: What literature is available on time as an element of state or response?
The first part of this question addressed if the duration of an observed response is important when a given outcome is assessed in a clinical trial in patients with RA, or in other words: is a sustained response a better result than a response achieved at a single time point? At least with respect to radiographic outcomes this has been shown to be the case, since patients with a sustained clinical response showed less radiographic progression than those, who reached remission at a single time point or flared after achieving remission.8–10 Likewise, it was reported that radiographic progression was lower among those patients with longer time in remission,11 a fact which is well in line with the observation that time-averaged disease activity (eg, the area under the disease activity over time curve) significantly correlates with radiographic progression. This has been shown for disease activity measured by the 28-joint Disease Activity Score (DAS28), the Simplified Disease Activity Index (SDAI) and the Clinical Disease Activity Index (CDAI), as well as by individual disease activity measures12–17 although this relationship turned out to be less relevant in patients treated with tumour necrosis factor (TNF) inhibitors (dissociation theory).15 18 Generally, the degree of radiographic progression in patients in remission also depends on the criteria used to define remission. For example, patients in remission by the modified American Rheumatism Association (ARA) criteria may continue to progress radiographically.9 10 This might indicate that synovial inflammation is not adequately detected by clinical assessment or that some degree of clinical synovitis might exist in the context of a composite index. The benefit of achieving sustained responses is further supported by the observation that radiographic progression is still increased in patients with fluctuation of disease activity as measured by DAS or DAS28.12
The second part of this question focused on time to response, which seems to be important although little data are available to support specific recommendations on assessment. Most data on this issue suggest that early responses are indicative of later therapeutic successes, at least regarding changes in radiographic scores, and vice versa.19–21
Question 2: Which measures of disease activity are validated for defining remission/low disease activity state/patient’s acceptable symptom state (PASS)?
When the literature was searched for evidence of validity of a measure, analytical approach, it often does not include face validity or content validity, but rather construct validity, ie, how a measure or approach cross-sectionally or longitudinally relates to other known “constructs” of the disease. In this sense, the following measures of disease activity have been investigated for defining remission or low disease activity: DAS, DAS28, CDAI, SDAI, preliminary ARA criteria for remission.11 22–27 By contrast, there is no literature validating the remission definition of the US Food and Drug Administration (FDA) or measures of the concept of the PASS.28
Although general validation studies have been performed for the above-mentioned measures, there are considerable differences between the individual instruments. For example, DAS remission has been reported to be more conservative than DAS28 remission.29–31 It has been argued that this discrepancy is due to presence of disease activity in joints not captured by the 28-joint count.31 By contrast, different proportions of patients are classified as being in remission depending on the definition used. For example, the SDAI remission criteria have been reported to be more stringent than both the DAS-based and DAS28-based definitions, and the ACR remission criteria.11 32 33 In particular, it has been shown that a proportion of patients in DAS remission or DAS28 remission had significant residual swollen joint counts.11 30 Therefore, the published cut-points for remission for the DAS and DAS28, although corresponding to the fulfilment of ARA criteria for clinical remission, still remain controversial.11 22 29 34 A likely reason for residual disease activity with these remission measures is that the cut-offs have been determined by a statistical approach (receiver operating characteristic (ROC) analysis) rather than by an approach aiming at highest specificity. Moreover, the point has been made that a modification of the ACR remission criteria and the Minimal Disease Activity criteria (MDA) were not fulfilled by the majority of a community population over 50 years of age.35
Question 3: What are the single variables and composite indices that have demonstrated (a) predictive validity for radiographic progression, (b) correlation with functional impairment, (c) correlation with patient global assessment of disease activity, or (d) correlation with physician/evaluator global assessment of disease activity?
This complex question defined four different standards to compare the measures to. The search for this question was separated into two parts, one on the radiographic outcome (a) and one of the clinical outcomes (b–d). Evidence was considered supportive for a specific measure if that measure showed a correlation with the respective outcome, and it was considered refuting it if such a correlation was absent. While for radiographic progression the outcome was the change in radiographic score, it was the absolute value for the clinical outcomes in functional impairment, patient and doctor global assessments.
As shown qualitatively in table 2, the strongest and least controversial evidence was found for composite disease activity indices (DAS, DAS28, SDAI), as well as for measures of pain, for swollen joint counts and tender joint counts. Evidence on the predictive validity of erythrocyte sedimentation rate (ESR) was more controversial since it included studies supporting and refuting their correlation with the respective outcomes (table 2).
Question 4: What is the added value in terms of predictive validity or discriminant capacity with respect to radiographic progression of ultrasound or magnetic resonance imaging for the evaluation of synovitis in comparison with the value of clinical joint count or other variables?
To date, the added predictive validity of MRI synovitis score over clinical joint counts with regard to radiographic progression is controversial. Four studies were retrieved addressing this question.36–39 In two studies the MRI hypertrophy score or MRI synovitis were predictive of radiographic progression,36 38 while in another study this association was absent.37 Interestingly, in all three studies swollen joint count (SJC) at baseline was not associated with joint destruction at a later time point as assessed by x rays. In an additional cohort study MRI synovitis and clinical joint counts were predictive of radiographic progression, but a comparison between MRI and clinical assessment was not made.39
Studies comparing clinical variables with ultrasound (US) with regard to their association with radiographic progression or its discrimination between two treatment groups could not be found. However, the association of US measures at baseline (synovial thickness and vascularity) with radiographic progression has been established in patients treated with disease-modifying antirheumatic drugs (DMARDs), but was absent in patients on anti-TNF therapy,40 an observation which is in line with the association known from clinical measures,15 18 and probably consistent with the dissociation theory.
Based on reports of patients who satisfied remission criteria, but still experienced radiographic progression,9 10 Brown et al investigated whether joint imaging by MRI and US could improve the accuracy of remission measurement in RA.41 Although these authors did not address the question of radiographic progression itself (see above), they found that asymptomatic patients without clinical signs of joint inflammation still had active synovitis detectable by MRI or US.41 This added sensitivity is supported by another study showing that the MRI synovitis score over time discriminated treatment groups better than C-reactive protein (CRP) or DAS28 over time.42
Question 5: What is the value of including function in a composite index to assess disease activity?
By systematic literature approach and expert consultation, specific studies addressing this question could not be identified. The ACR representatives felt that this question focused on conceptual issues of what elements defined disease activity, that European views of this differed from North American views and that the question was not addressable with research. Although ACR response criteria include a functional measure while other composite indices do not (DAS28, SDAI, CDAI), studies comparing these two groups of measures are not relevant to this question, since the impact of function has not been analysed separately. A study design appropriately addressing this issue will be needed to compare ACR criteria with and without the functional measure or, composite measures that include measures of function with those that do not include them. This issue has subsequently been moved to the research agenda for future studies.5 6
Question 6: What is the evidence that fatigue and other patient-reported health status measures reflect disease activity?
For this question, we searched for evidence of a correlation of patient-reported outcomes (PRO), ie, measures that are subjectively determined by the patient, with measures of disease activity. For the latter, any composite measure of disease activity was accepted as an external standard. The PROs that showed such a correlation are tabulated in table 3.
In addition to general correlational validity, evidence was found that the correlation between PRO (such as HAQ scores) and disease activity measures, as well as the sensitivity to change of PRO decreased with longer disease duration.43–49 Additionally, the patient’s perception of disease activity might differ with similar disease activity depending on the time-point in the disease course,50 and worsening of PROs over time can partly be explained by the effect of aging.51 For some PROs responsiveness was better and placebo treatment response was lower compared to objective disease activity measures (such as DAS28, SJC, tender joint count (TJC), evaluator global assessment (EGA) or ACR response).49 52–62 By contrast, it has been argued that the association of some PROs with disease activity may be secondary to its link with pain and depression.63
Question 7: What is the consequence of having different levels of disease activity as eligibility criteria in terms of responsiveness and statistical power?/Question 8: To what extent does baseline activity influence response and the measures used?
Question 7 and question 8 are conceptually linked, and were therefore addressed during the same literature search.
Several pieces of evidence support positive association of baseline disease activity with disease activity at endpoint of a trial. This has been shown for achievement of remission as well as for reaching ACR response using traditional DMARDs,30 64–66 but not necessarily for patients treated with TNF inhibitors.67 The influence of baseline disease activity on change of PRO is controversial.44 68 By contrast, patient-reported measures, such as the HAQ, partly reflect disease activity. In this regard, data are also controversial, with studies showing that patients achieving remission have lower baseline HAQ scores than those who do not,65 69 and studies reporting a lack of such association.70 By contrast, relative change appears to be similar irrespective of the baseline activity. Thus, baseline disease activity is a major factor contributing to reaching certain disease activity states at endpoint. Different levels of disease activity at trial start will, therefore, influence disease activity at endpoint.
Question 9: What is the evidence that structural damage or duration of RA have an impact on the (discriminant) validity of the disease activity measure?
One study was found in the literature that exactly addressed this question. This study showed that the discriminatory capacity of ACR response rates between active treatment and placebo groups is not affected by disease duration or Steinbrocker functional class.71 By looking only at responsiveness of disease activity, some additional studies showed that longer disease duration (or higher Steinbrocker functional class) is associated with a reduced likelihood of achieving good outcomes (ie, ACR response, remission, or responsiveness of core set measures),64 71 72 while other studies have reported that this is not the case.20 44 67
Several other studies have addressed related issues. It has been shown that the responsiveness of functional measures is reduced in patients with long duration of RA.49 67 However, this observation could be explained by the higher levels of joint damage that may decrease physical function, rather than by longer disease duration.
Question 10: What is the added (predictive or discriminant) validity of a status measure vs a response measure?/ Question 11: What literature supports the use of dichotomous response measures vs continuous change measures as outcomes in trials?
These two questions were addressing related analytical concepts, and were most difficult to subject to a systematic literature search. We decided to perform a highly sensitive approach by using “treatment outcome” as a Medical Subject Heading (MeSH) term (MH) and “patient selection” as a MeSH Major Topic (MAJR) without any further specifications.
In general, RA literature could be found supporting the concept that in clinical trial analyses continuous measures have a better sensitivity to change than dichotomous measures.53 72–74 Also, relative changes in continuous scores correlated much better with the decisions of doctors on patient vignettes than dichotomous responses.73
5. Consensus on final recommendations
During the second meeting of the group the results of the systematic literature research, which now also included references provided by the experts, were presented to the expert panel. The individual questions and results were addressed in detail. Every item was intensively discussed based on the evidence in the literature. During the discussion on the individual questions, thematic items were extracted. Not every addressed question resulted in a respective item, while other questions provided more than one theme each. These themes were grouped into three categories.
The first group of themes comprised items for the bullet recommendations, which are the major part of the final recommendation paper.5 6 After the collection of the initial recommendations, the items were ordered logically and reworded if needed. The second group were items to be considered for the research agenda. The research agenda should be presented in addition to the recommendations in the final paper. Research agenda will comprise items that are deemed important by the experts, but for which there is too little literature available or for which literature is controversial. The final group were items that were thought to be important and should be mentioned or discussed in the final manuscript, but did not necessarily constitute points of recommendation.
Recommendations on trial reporting have been published previously, most prominently when the core sets of disease activity measures were defined in the early 1990s.1–3 Here, we followed the EULAR standardised operating procedures to expand beyond the question on which items should be included (content questions), to elaborate additional aspects that are pertinent to disease activity reporting, such as analytical and contextual items.
The 11 items formulated at the initial meeting were complex questions and the low yield of studies identified by the direct literature search emphasise this issue. In fact, many pieces of evidence were obtained during the search for other questions. In addition, many world experts in the field of RA outcomes research were in the room when the results of the search were presented, and the contribution of relevant studies based on their expertise and knowledge of the literature was an important supplement of the systematic search. The fact that the experts providing this information represented a broad clinical and scientific expertise, different countries and different continents in conjunction with the original literature search suggests that the literature compiled finally was complete. Moreover, based on the large number of involved experts, it is also likely that the provided literature does not represent only the opinions of specific experts. The complete available literature was then evaluated for the elaboration of the recommendations. In the present study, only one patient was part of the panel. Ideally, a larger number of patients is recommended in order to include the opinions of patients appropriately, probably resulting in additional questions.
The questions initially posed did not imply specific recommendations, but were tools to derive the systematic literature search on relevant topics in the area of RA disease activity research. The recommendations developed during a subsequent meeting were the product of the literature results obtained and the discussion of the results during the meeting, where themes were extracted.
A major advantage supporting the generalisability of the recommendations is that the panel was made up as heterogeneously as possible in regards to different scientific views and regional preferences of particular outcomes. The North American representative of the panel was particularly important to bring forward issues that are relevant for the ACR, and to achieve a balance in the wording of the recommendations so that they would finally be endorsed by both the EULAR and the ACR. During the additional meeting with a group of experts identified by the ACR, which took place specifically for the purpose of evaluating the recommendations by the EULAR/ACR panel, only minor changes on the set of recommendations were made, supporting the successful work of the ACR representative and the North American members of the task force during the prior meetings in Europe.
In conclusion, the process involving three rounds of expert meetings, the combination of evidence from the literature and expert knowledge, and joint publication of the recommendations by EULAR and ACR, led to an exciting final recommendations manuscript. The EULAR standardised operating procedures, with a data driven and a consensus driven part, have proven to be efficient in generating recommendations, such as in the present manuscript. The yield of the process may be improved in future by also considering re-analysis of databases in addition to a literature review only, especially in the field of outcomes research, were systematic searches are more difficult.
The recommendations proposed in this project should be used as guidance when investigators report disease activity outcomes of clinical trials of rheumatoid arthritis. It will be to the journal editors, reviewers and the scientific community to judge whether a clinical trial report has the sufficient, relevant, and standardised outcomes recommended.
Funding: This project was fully funded by EULAR and the ACR.
Competing interests: None.
The views presented in this presentation do not necessarily reflect those of the Food and Drug Administration or the National Institutes of Health.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.