Article Text

Extended report
Clinically important changes in individual and composite measures of rheumatoid arthritis activity: thresholds applicable in clinical trials
  1. Michael M Ward,
  2. Lori C Guthrie,
  3. Maria I Alba
  1. Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland, USA
  1. Correspondence to Dr Michael M Ward, Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, NIAMS/NIH, Building 10 CRC, Room 4–1339, 10 Center Drive, Bethesda, MD 20892, USA; wardm1{at}mail.nih.gov

Abstract

Objective Thresholds of minimal clinically important improvement (MCII) are needed to plan and interpret clinical trials. We estimated MCIIs for the rheumatoid arthritis (RA) activity measures of patient global assessment, pain score, Health Assessment Questionnaire Disability Index (HAQ), Disease Activity Score-28 (DAS28), Simplified Disease Activity Index (SDAI), and Clinical Disease Activity Index (CDAI).

Methods In this prospective longitudinal study, we studied 250 patients who had active RA. Disease activity measures were collected before and either 1 month (for patients treated with prednisone) or 4 months (for patients treated with disease modifying medications or biologics) after treatment escalation. Patient judgments of improvement in arthritis status were related to prospectively assessed changes in the measures. MCIIs were changes that had a specificity of 0.80 for improvement based on receiver operating characteristic curve analysis. We used bootstrapping to provide estimates with predictive validity.

Results At baseline, the mean (±SD) DAS28-ESR (erythrocyte sedimentation rate) was 6.16±1.2 and mean SDAI was 38.6±14.8. Improvement in overall arthritis status was reported by 167 patients (66.8%). Patients were consistent in their ratings of improvement versus no change or worsening, with receiver operating characteristic curve areas ≥0.74. MCIIs with a specificity for improvement of 0.80 were: patient global assessment −18, pain score −20, HAQ −0.375, DAS28-ESR −1.2, DAS28-CRP (C-reactive protein) −1.0, SDAI −13, and CDAI −12.

Conclusions MCIIs for individual core set measures were larger than previous estimates. Reporting the proportion of patients who meet these MCII thresholds can improve the interpretation of clinical trials in RA.

  • Rheumatoid Arthritis
  • Disease Activity
  • Outcomes research
View Full Text

Statistics from Altmetric.com

Interpretation of the results of clinical trials requires an assessment of the statistical significance of treatment differences and also of their clinical importance. Clinically important changes are those recognised as meaningful improvements (or deteriorations), most often judged by patients experiencing the change.1 To aid the interpretation of clinical trials, these judgments must be related to corresponding changes in measures of disease activity used as endpoints in trials.2 This translation is critical, because underestimation or overestimation of the thresholds for clinically important changes can have major consequences. If thresholds are set too high, meaningful improvements may be overlooked. If thresholds are too low, treatments that result in trivial improvements may be mistakenly considered beneficial. Clinically important changes are also important components in sample size calculations for clinical trials.

Few studies have attempted to define clinically important changes in rheumatoid arthritis (RA) activity measures, with little consensus. Early studies suggested that a difference of 10 points on a 100-point patient global assessment scale, 6 points on a 100-point pain scale, and 0.20–0.22 units on the Health Assessment Questionnaire (HAQ) Disability Index could be considered clinically important.3 ,4 These data were based on studies of 40–57 patients who rated their status relative to each other after brief conversations. These group-level between-patient contrasts differ importantly from the within-patient longitudinal contrasts that are most directly applicable to clinical trials.2 ,3 In two short-term non-steroidal anti-inflammatory drug trials, changes in the patient global assessment of 14 points, pain score of 15 points, and modified HAQ of 0.24 units were judged as important, while slightly larger changes were identified in a registry study.5–7 However, an observational study of patients in routine care reported thresholds of 11.9 points for the pain score and 0.09 units for the HAQ.8 Despite the growing use of composite RA activity measures, only one study has examined clinically important changes in the Disease Activity Score 28 (DAS28), Simplified Disease Activity Index (SDAI), or Clinical Disease Activity Index (CDAI) from the patient's perspective.9 Testing of clinically important improvements was highlighted as an important next step in the development of RA clinical trials.10

Our aim was to identify thresholds for minimal clinically important improvement (MCII) in the RA core set measures of patient global assessment, pain score and HAQ, and for the DAS28, SDAI and CDAI. We examined longitudinal within-patient changes from the patient's perspective, analysed at the individual level.2 We studied patients with active RA who were having treatment escalation to provide results applicable to clinical trials.

Methods

Participants

We enrolled patients who were receiving ongoing care in our clinics. Participants were required to be aged 18 years or older, have a clinical diagnosis of RA and fulfil the 1987 American College of Rheumatology classification criteria,11 and have active RA based on physician judgment, and at least six tender joints. Patients were enrolled only if they had an escalation of antirheumatic treatment for active RA at the baseline visit. This escalation could be an increased dose of their current DMARD, initiation of prednisone, or initiation of a new disease modifying anti-rheumatic drug (DMARD) or biologic (henceforth, dose-escalation, prednisone, and DMARD groups, respectively). Decisions regarding the specific medication changes were left to the treating rheumatologist and not determined by this study. The study was approved by the institutional review board, and all patients provided written informed consent.

Study procedures

Patients completed two study visits. Because responses to prednisone were anticipated to occur sooner than responses to the other treatments, the follow-up visit for patients in the prednisone group was at 1 month, while for all others the follow-up visit was at 4 months. Interim visits were not performed. We chose 4 months to allow time for clinical responses while limiting the potential for poorer recall with a longer interval.

At both visits, patients had complete joint counts (66 swollen, 68 tender) by the same rheumatologist, and testing of the erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) level. The examining rheumatologist provided a global assessment of arthritis activity. Patients completed questionnaires including a global assessment by Visual Analogue Scale (possible range 0–100 with anchors of very well and very poor), pain score by Visual Analogue Scale (possible range 0–100 with anchors of no pain and severe pain), and HAQ (possible range 0–3, with higher scores indicating more difficulty).12 We used the relevant measures to compute the DAS28-ESR, DAS28-CRP, SDAI and CDAI.13–15

At the follow-up visit, patients completed a transition question on whether they judged their arthritis overall to be improved, unchanged, or worsened since the baseline visit, and to rate the importance of any change on a 7-point scale (from ‘hardly important at all’ to ‘extremely important’).16 Identical questions were asked about changes in pain and ability to do things.

Statistical analysis

We first examined changes in RA activity measures to establish their sensitivity to change and the validity of the judgment process. To provide valid MCIIs, measures must be sensitive to change. If a measure is not sensitive to change, patients may experience important improvements in disease activity, yet these improvements would correspond to only small changes in the measure, resulting in small changes being mistakenly labelled as important. RA activity measures have been reported to be sensitive to change,17–23 but because this can vary with the intervention,24 degree of RA activity,25 or duration of RA,26 ,27 it is important to verify it in each sample. We measured sensitivity to change using standardised response means (SRM), with SRMs ≤−0.50 considered acceptable.28–30 We computed CIs for SRMs using 2000 bootstrapped samples. To assess the validity of the judgment process, we tested if changes in RA measures differed among patients who reported improvement, no change, or worsening, using one-way analysis of variance.

Next, we determined if patients were similar in their judgments of what degree of improvement constituted an important change. If patients were not similar, criteria for groups of patients could not be established. We used receiver operating characteristic (ROC) curves, with responses to the transition question (improved or not) as the outcome and change in the RA activity measure as the independent variable, and computed 95% CIs for the area under the ROC curve.31 ,32 Values of 1.0 for the area under the ROC curve indicate perfect separation between those reporting improvement and those not reporting improvement, while values of 0.5 indicate no discrimination. We considered judgments among patients to be similar if the 95% lower bound of the ROC area excluded 0.5. To provide results with predictive validity, we computed ROC curves on 2000 bootstrapped samples, using non-parametric resampling and the bias-corrected and accelerated method to compute CIs.33

We also used the ROC curves to determine MCIIs. We used the change score at a specificity for improvement of 0.80, following previous studies.7 ,34 This measure indicates the degree of change that 80% or more of patients would indicate as being important. We computed two alternative criteria for the MCII, based on either the Youden Index or the minimal distance to the upper left corner [0, 1] of the ROC curve plot.35 These latter methods do not ensure minimum thresholds for specificity, and may identify thresholds with varying sensitivities and specificity among measures, and therefore, are less appealing for determining MCIIs.

For all measures except the HAQ, we computed the mean change as the MCII. Because the HAQ is an ordinal measure with increments of 0.125, we computed median changes for its MCII.

A sample of 250 patients with 225 patients who reported improvement and 25 who did not report improvement, was estimated to have an ROC curve area that was significantly greater than 0.5 (ie, 0.53, 0.67 with type 1 error (two-tailed) of 0.05).36 Samples more balanced with respect to the proportions reporting improvement would have even more statistical power. In exploratory analyses, we examined patient subgroups.

We used SAS programs V.9.3 (SAS Institute, Cary, North Carolina, USA) for analysis.

Results

Patient characteristics

We enrolled 262 patients, of whom 250 completed the study. Eight patients were lost to follow-up, two withdrew, one died of a stroke before the second visit, and one did not complete questionnaires at the second visit. Most patients were middle-aged women (table 1). The median duration of RA was 6.4 years; 60 patients (24%) had RA for less than 2 years. At enrolment, 87 patients (35%) were being treated with methotrexate (median dose 15 milligrams per week), and 88 patients (35%) were being treated with prednisone (median dose 5 mg/ day). Seventy-one patients (28%) were DMARD-naive.

Table 1

Patient characteristics at study entry (N=250)*

Patients had active RA, with mean DAS28-ESR of 6.16 (table 2). All but three patients had a DAS28-ESR >3.2, and 203 patients (81%) had a DAS28-ESR >5.1. All but 1 patient had a SDAI >11 and CDAI >10; 197 patients (79%) had an SDAI >26 and 214 patients (85%) had a CDAI >22. One hundred and four patients (41.6%) were in the dose-escalation group, 56 patients (22.4%) were in the prednisone initiation group, and 90 patients (36%) were in the DMARD initiation group. In the latter group, 60 patients started methotrexate, 3 started leflunomide, 20 started tumour necrosis factor-α inhibitors, and 7 started other biologics.

Table 2

Changes in rheumatoid arthritis activity measures during the study

Changes in RA activity

Mean RA activity improved substantially during the study (table 2). Each measure was sensitive to change, with SRMs for the patient-reported measures of −0.65 to −0.69, and for the composite measures of −0.95 to −0.98. Overall, 167 patients (66.8%) reported that their global arthritis status had improved, 59 (23.6%) reported no change, and 24 (9.6%) reported worsening. Frequencies of subjective changes in pain (65.2% improved; 23.2% no change; 11.6% worsened) and functional ability (60.4% improved; 30.8% no change; 8.8% worsened) were similar. Ninety-two percent of patients who reported improvement in their global arthritis status rated the improvement as at least moderately important, and 68% rated it as either very important or extremely important. Those who reported improvement had significantly larger changes in each measure than those who reported no change or worsening (table 2).

Similarity of judgments among patients

The mean ROC curve area ranged from 0.74 for the patient global assessment to 0.79 for the HAQ and DAS28-CRP (table 3). The lower confidence limit was greater than 0.5 for each measure, indicating that patients were sufficiently similar in their judgments of improvement that MCIIs for the group could be estimated. ROC curves are presented in figure 1.

Table 3

Minimal clinically important improvement estimates for individual and composite rheumatoid arthritis activity measures, using a specificity of 0.80 as the criterion

Figure 1

Receiver operating characteristic curves for rheumatoid arthritis activity measures. HAQ, Health Assessment Questionnaire Disability Index; DAS28-ESR, Disease Activity Score 28 Erythrocyte Sedimentation Rate; DAS28-CRP, Disease Activity Score 28 C-reactive Protein; SDAI, Simplified Disease Activity Index; CDAI, Clinical Disease Activity Index.

MCII estimates

With the criterion of a specificity of 0.80, an improvement of 18 points on the patient global assessment (rounded estimates are conventionally used in clinical applications) and 20 points on the pain score were the thresholds for MCII (table 3). Similarly, an improvement in the HAQ by 0.375 was the MCII estimate. The MCII for the DAS28-ESR was 1.2 and for the DAS28-CRP was 1.0, while estimates for the SDAI and CDAI were 13 and 12 points, respectively. The CIs show the range of variability in estimates of MCII among the 2000 bootstrapped samples. Sensitivities ranged from 0.58 to 0.64, indicating that by requiring high specificity, a sizable proportion of patients with subjective improvement did not meet the MCII thresholds. The proportion of patients misclassified (ie, either reported subjective improvement and did not meet the MCII, or met the MCII and did not report subjective improvement) ranged from 28.8% for the HAQ to 34.9% for the DAS28-ESR.

MCII estimates based on the Youden Index and the shortest distance were generally smaller than those based on the 0.80 specificity criterion, with lower specificities but higher sensitivities (see online supplemental data). Results for patient subgroups are presented in the online supplemental file.

Discussion

Assessing whether a patient has had a meaningful response to treatment is an activity that occurs in almost all clinical encounters. However, these judgments by patients are implicit and personal. To be applied in clinical trials, these judgments must be made explicit and mapped to corresponding changes in disease activity measures. They must also be aggregated among many patients so that estimates are generalisable. This translation requires attention to several issues: assessment of a study cohort that mirrors the patients to whom the MCIIs would be applied; confirmation of sensitivity to change; testing validity and establishing consistency of patients’ judgments; and ensuring predictive validity using resampling methods. We used this approach to derive MCIIs for RA activity measures.

The cohort was comprised predominantly of middle-aged women with seropositive erosive RA and, therefore, matched the target population in many clinical studies. Patients had active RA, comparable to patients in recent trials.37 ,38 Most patients had substantial responses to treatment escalation, and the sensitivity to change of clinical measures equalled previous reports.18–23 Importantly, 33–39% patients did not judge themselves as improved. This variation was needed to identify MCIIs. Most patients rated their improvement as very important or extremely important, which may reflect the types of interventions we studied, but may also indicate that patients were unlikely to endorse improvement unless it was substantial.

MCII estimates for patient global assessment, pain, and HAQ (−18, −20, and −0.375, respectively) were larger than those of several previous studies. Wells et al4 and Redelmeier and Lorig3 used between-patient comparisons to derive MCII estimates of −10, −6, and −0.22 for these three measures. These MCIIs relied on patients judging themselves against other patients, rather than judging their current status relative to their previous status. Between-patient comparisons may be affected by misinterpretation, nondisclosure and optimistic bias. Given the differences in the judgment process, it is not surprising that these MCIIs differ. Because clinical assessment is concerned with whether individual patients have improved, MCIIs based on within-patient changes are likely more relevant. Even within their study, group mean differences poorly separated improved from unimproved patients, suggesting that generalising these MCIIs may not be valid.3 Tubach et al6 reported MCIIs for patient global assessment and pain of −14 and −15, respectively, based on changes after 4 weeks of treatment with non-steroidal anti-inflammatory drugs. The specificity of these MCIIs was not reported. Pope et al studied a stable cohort in which measures changed little.8 The low MCII estimates of 11.9 for pain score and 0.09 for HAQ likely reflect misapplication in a setting with low sensitivity to change.

Our MCII estimates were similar to those of Kvamme et al,7 who reported MCII of −20, −19, and −0.25 for the patient global assessment, pain and HAQ, respectively, in a RA registry. These investigators also used the 0.80 specificity criterion, which may have contributed similarity with our results. This approach to defining MCIIs is appealing because it establishes criteria that are highly specific and consistent across measures. Confidence that important changes have been identified favours setting the specificity high. MCIIs based on the Youden Index and the shortest distance to perfect discrimination were smaller than those based on the 0.80 specificity definition, although with lower specificity and higher sensitivity. The proportions misclassified by each approach were similar, indicating that one approach did not have major net advantages in accuracy. In a recent survey, a majority of rheumatology researchers endorsed a 20-point improvement in patient global assessment (60%) and pain (53%) as the optimal MCII, consistent with our results, while 20–27% favoured a 15-point improvement.39 Similarly, a panel of pain researchers suggested that a 10-point improvement in pain represented little change, while a 20-point improvement represented meaningful improvement, associated, for example, with reduced analgesic use.40 Analysis of a recent etanercept trial in psoriatic arthritis suggested an MCII for the HAQ of 0.35.41

Although composite RA activity measures have become established outcome measures, few studies have examined their MCIIs. In a registry study of patients starting a new DMARD, MCII for the DAS28, SDAI, and CDAI were estimated as 1.20, 10.95 and 10.76.9 These estimates were similar to our MCIIs of 1.1, 13, and 12, respectively. The MCII for the DAS28 based on our patients’ judgments was very similar to the estimate of 1.2 proposed on purely statistical considerations (ie, twice the measurement error).42 Other studies attempted to estimate MCIIs for the DAS28, SDAI and CDAI indirectly through comparison with American College of Rheumatology response criteria, but response criteria have a different derivation, and the MCIIs were not based on patients’ perspectives.15 ,21 ,43

Our study has several limitations. We could not examine differences between the prednisone and DMARD initiation groups, because these contrasts would be driven by differences in sensitivity to change.9 We could not examine if the timing of the second assessment influenced the results because this was collinear with the treatment group. We assessed responses over 1–4 months, but previous studies suggested that MCII estimates are unrelated to the assessment interval.44 ,45 Also, we did not study worsening. Thresholds for important improvement and worsening often differ, and these MCII estimates should not be considered thresholds for important worsening.45–48 We did not examine associations of the MCII with baseline RA activity, because these are affected by ceiling and floor effects.49 Because MCIIs reflect the composition of the sample, estimates are most representative of groups with similar disease characteristics and range of RA activity, and should not be generalised to groups with low disease activity.

Along with knowing the proportion of patients in remission or low disease activity at the end of a trial, knowing the proportion that had improvement surpassing the MCII provides useful information on treatment response that complements response criteria, such as the ACR20.10 MCIIs might also be considered as thresholds for early escape in clinical trials. Our results should also be applicable to patients with active RA in observational studies. In clinical practice, patients with improvement that exceeds the MCII who yet report no subjective improvement, should signal the clinician to investigate reason for this mismatch, including depression or other situational factors.

Acknowledgments

We thank Abhijit Dasgupta PhD, for statistical advice.

References

View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Handling editor Tore K Kvien

  • Contributors MMW conceived and designed the study. MMW, LCG, and MIA collected the data, and MMW did the analysis. MMW drafted the manuscript and all authors provided critical review and approval of the final version.

  • Funding This study was supported by the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, and NIH RO1-AR45177.

  • Competing interests None.

  • Ethics approval Approved by the NIDDK/NIAMS Institutional Review Board, U.S. National Institutes of Health.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data will be publicly available at the conclusion of the study.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.