Objectives: To evaluate different methods of reporting response to treatment or disease status for their ability to discriminate between active therapy and placebo, or to reflect structural progression or patient satisfaction with treatment using an exploratory analysis of the Abatacept in Inadequate Responders to Methotrexate (AIM) trial.
Methods: 424 active (abatacept ∼10 mg/kg) and 214 placebo-treated patients with rheumatoid arthritis (RA) were evaluated. Methods of reporting included: (1) response (American College of Rheumatology (ACR) criteria) versus state (disease activity score in 28 joints (DAS28) criteria); (2) stringency (ACR20 vs 50 vs 70; moderate disease activity state (MDAS; DAS28 <5.1) vs low disease activity state (LDAS; DAS28 ⩽3.2) vs DAS28-defined remission (DAS28 <2.6)); (3) time to onset (time to first ACR50/LDAS) and (4) sustainability of ACR50/LDAS for consecutive visits. Methods were assessed according to: (1) discriminatory capacity (number of patients needed to study (NNS)); (2) structural progression (Genant-modified Sharp score) and (3) patient satisfaction with treatment. Positive likelihood ratios (LR) evaluated the ability of the above methods to reflect structural damage and patient satisfaction.
Results: MDAS and ACR20 had the highest discriminatory capacity (NNS 49 and 69). Sustained LDAS best reflected no radiographic progression (positive LR ⩾2). More stringent criteria (at least ACR50/LDAS), faster onset (⩽3 months) and sustainability (>3 visits) of ACR50/LDAS best reflected patient satisfaction (positive LR >10).
Conclusions: The optimal method for reporting a measure of disease activity may differ depending on the outcome of interest. Time to onset and sustainability can be important factors when evaluating treatment response and disease status in patients with RA.
Statistics from Altmetric.com
The current gold standard composite assessment used in clinical trials of patients with rheumatoid arthritis (RA) is the American College of Rheumatology (ACR) criteria, evaluated at study endpoint. Increasingly, the disease activity score in 28 joints (DAS28) is also used.1 Different methods can be used to report these composite indices: (1) response to treatment (eg, ACR criteria) versus disease state (DAS28 criteria); (2) “stringent” versus “less stringent” assessment (eg, ACR70 versus ACR20); (3) time to onset of a successful response (eg, time to achievement of ACR50) or (4) sustainability of a response (eg, maintenance of an ACR50 over a given period of time).
To evaluate the performance of these methods, several different perspectives may be considered: those of the clinical trial investigator, the rheumatologist and the patient. For example, the trial investigator seeks to reduce the costs and risks associated with trial design by determining the discriminatory capacity of specific reporting methods. Primarily, this may be achieved by reducing the number of patients needed to study (NNS) to discriminate between active drug and placebo. In order to prevent irreversible loss of physical function, one of the rheumatologist’s primary aims is to inhibit structural damage, as assessed by scores including the Sharp score2 and its modifications.3 Finally, patients are concerned with the impact of their condition on daily life, including dimensions such as quality of care. All three groups are concerned with efficacy, safety, sustainability of response to treatment or disease activity status.
The performance of different methods of reporting ACR and DAS28-based criteria according to the different viewpoints or perspectives described above has not previously been studied in a single patient cohort. To address this, we used data from the phase III, Abatacept in Inadequate Responders to Methotrexate (AIM) trial in patients with RA. The efficacy and safety results from this trial have been reported elsewhere, using prespecified primary and secondary endpoints.4 The objective of the present exploratory analysis was to evaluate different methods of reporting ACR and DAS28-based criteria for their ability to discriminate between active and inactive drugs, to reflect the absence of structural damage progression or to reflect whether patients are satisfied with their treatment. Whereas a range of additional methods is also currently used to assess clinical efficacy,4 5 exhaustive assessment of all of these measures is beyond the scope of this publication and we have focused on the most commonly used composite indices in clinical trials. To simplify the outputs of this analysis further, presentation of data relating to onset and sustainability have been limited to ACR50 and low disease activity state (LDAS; DAS28 ⩽3.2).
The analyses reported here were exploratory assessments of a global, phase III, 1-year, multinational, randomised, double-blind, placebo controlled study of abatacept compared with placebo (2 : 1) in combination with methotrexate in patients with active RA and an inadequate response to methotrexate (clinical trials registration number NCT00048568). The detailed study design of this trial has been reported previously.6
For these exploratory analyses, abatacept was considered the “active” drug and placebo the “inactive” drug. The sample size was based on primary efficacy analyses.6 The analyses presented here are considered exploratory, as they were not prespecified and the sample size may not be appropriate for statistical testing. Because of the nature of this analysis, the imputation of missing data was not appropriate; all analyses are based on patients with data available at the visit of interest (“as-observed”).
The following composite indices were assessed on each visit day before study drug administration for a duration of 6 months, at week 2, week 4 and every 4 weeks thereafter: response to treatment, ACR criteria7 8 (ACR20, 50 and 70) and status of disease, DAS28 criteria (moderate disease activity state (MDAS); DAS28 <5.1), LDAS (DAS28 ⩽3.2) and DAS28-defined remission (DAS28 <2.6)).
Methods of reporting ACR and DAS28 criteria
Response to treatment versus status of disease
To determine the performance of measures that assess response to treatment versus those that assess disease status, the proportion of patients achieving an ACR response or DAS28 status was compared at 6 months.
“Stringent” versus “less stringent” methods
For the ACR criteria, ACR20 was regarded as “less stringent”, ACR50 as “intermediate” and ACR70 as “stringent”. For DAS28, MDAS was considered as “less stringent”, LDAS was considered “intermediate” and DAS28-defined remission was considered “stringent”. Assessments were performed at the end of a 6-month study period.
Based on the results obtained in the above analyses (comparing response versus status and “more stringent” versus “less stringent” methods), results for onset and durability were only reported for ACR50 and LDAS, as these measures were of comparable stringency and numbers of responders were high enough to enable meaningful interpretation of the results.
Onset of action
To determine the importance of onset of action, the proportion of patients achieving a first ACR50 response or LDAS within 1 month, or within the first 2, 3, 4, 5 or 6 months of the evaluation period was assessed.
Sustainability of response or status of disease
To determine the importance of sustainability of a response/status, the proportion of patients experiencing ACR50 or LDAS for at least one, two, three, four, five or six consecutive visits over 6 months of the evaluation period was calculated. Equal weighting was applied for each visit.
Assessment of the different methods of reporting ACR response and DAS28-derived criteria
Discriminatory capacity as assessed by NNS
Discriminatory capacity was calculated based on the number of patients required per treatment arm to perform a two-arm 1 : 1 randomised study comparing active treatment with placebo, based on a difference similar to that observed in the AIM study. The number of patients required was calculated with the appropriate basic testing procedure (with α = 0.05 (two-tailed), β = 0.20, χ2 test for binary variables and Student’s t test for continuous variables). The lowest NNS indicates the greatest discriminatory capacity.
Structural damage and patient satisfaction
Structural damage progression in the hands and feet was assessed as radiographic changes from baseline to year 1, using Genant-modified Sharp scores.3 9 The maximum possible normalised total score (TS) was 290. Data were dichotomised as the percentage of progressors (TS ⩾0) versus non-progressors (TS <0).
Patient satisfaction with treatment was assessed at month 6 or at early termination10 using the following question on a five-point scale: “how would you rate your satisfaction with the treatment you received?”: excellent, 1; very good, 2; good, 3; fair, 4; or poor, 5. Responses were dichotomisd as follows: 1, 2, 3 (favourable) versus 4, 5 (not favourable). A sensitivity analysis was performed using different cut-offs for dichotomization (1, 2, (favourable) vs 3, 4, 5 (not favourable)); results were not affected by cut-off choice (data not shown).
To evaluate the relevance of the different reporting techniques for ability to reflect the inhibition of structural damage progression and patient satisfaction, positive likelihood ratios (LR)11 were calculated. Available abatacept and placebo data for patients in the AIM trial were pooled at each time point (as-observed analysis). An LR determines the likelihood of a given clinical finding in a patient with a studied disorder compared with that in a patient without the studied disorder.12 The LR, which combines information on sensitivity and specificity, is used to select the best and most appropriate diagnostic tests. LR may range from 0 to infinity. An LR greater than 1 indicates an increased probability that the target disorder is present and an LR less than 1 indicates a decreased probability that the target disorder is present. Likelihood ratios of 2, 5 and 10 increase the probability of the studied disorder by approximately 15%, 30% and 45%, respectively.12 13 Based on the literature, a positive LR greater than 2 may be considered of relevant prognostic value and results will, therefore, be presented using this cut-off.14 In our study, the use of LR was transposed to express performance of reporting techniques in reflecting structural damage or patient satisfaction. Higher values are indicative of better techniques. Although an infinite positive LR generally indicates that a technique has good prognostic value, results should be interpreted with caution when the proportion of patients achieving success is low.
In the 1-year AIM study, 433 and 219 patients were randomly assigned and treated with abatacept or placebo, respectively, on background methotrexate. In total, 385 (88.9%) and 162 (74.0%) patients in the abatacept and placebo groups, respectively, completed 6 months of treatment and 375 (86.6%) and 158 (72.1%) patients were ongoing at 1 year. Baseline demographics and clinical characteristics, including disease activity, were comparable between groups and are described elsewhere.6 Patients from one site were excluded from efficacy analyses due to compliance issues; 424 and 214 patients from the abatacept and placebo groups, respectively, were included in the primary efficacy analyses and were considered for this exploratory analysis.
Treatment response and disease activity assessments
Overall, at month 6, 512 (80%) patients in the pooled abatacept and placebo population reported satisfaction with treatment as at least “good”. At year 1, 97 (15%) patients in the pooled abatacept and placebo population demonstrated no radiographic damage progression from baseline (as defined by a change in the TS <0).
Response to treatment versus status of disease
The ability of ACR (response to treatment) versus DAS28 (disease status) either to detect a treatment effect (NNS) or to reflect inhibition of structural damage progression or patient satisfaction was generally comparable, when comparing methods of reporting with similar levels of stringency (table 1).
“Stringent” versus “less stringent” methods of reporting
Table 1 presents the overall percentage of responders, NNS and positive LR values for “less stringent” (ACR20 and MDAS), “intermediate” (LDAS or ACR50) and “stringent” (ACR70 or remission) criteria each evaluated with respect to the discriminatory capacity (at month 6), radiographic progression (at year 1) and patient satisfaction (at month 6).
Using the NNS to determine discriminatory capacity, there was a trend towards greater diagnostic ability with “less stringent” versus “stringent” criteria (table 1). However, interestingly, the NNS for ACR20 and ACR50 were similar.
No difference was observed between “less stringent”, “intermediate” and “stringent” methods of reporting ACR and DAS28-based criteria in their ability to reflect inhibition of structural progression (table 1) and all methods of reporting had a positive LR of less than 2.
For patient satisfaction, positive LR were higher with the “stringent” and “intermediate” compared with “less stringent” criteria, suggesting that more stringent criteria better reflected patient satisfaction with treatment (table 1).
Similar analyses were also performed using the OMERACT definition of DAS28 minimal disease activity4 and DAS28 European League Against Rheumatism (EULAR) good/moderate responders1 (supplemental table 1, available online only). The results for DAS28 minimal disease activity were comparable to those observed for MDAS and the results observed for EULAR good/moderate responders were similar to those observed for LDAS.
Onset of action
Table 2 presents the overall percentage success rate, NNS and positive LR values for the onset of an ACR50 response compared with achievement of an LDAS according to discriminatory capacity or their ability to reflect inhibition of radiographic progression or patient satisfaction. Analysis of the impact of onset on ACR20 and ACR70 and MDAS and remission is presented in supplemental table 2 (available online only).
For discriminatory capacity, the NNS was lowest when the first ACR50 response occurred any time during the 6-month period compared with occurrence in the first month; a similar trend was observed for LDAS (table 2).
For structural damage progression at year 1, positive LR were less than 2.05 for ACR50 and LDAS (table 2) criteria and were comparable regardless of when the first response was achieved.
For patient satisfaction with treatment, positive LR for time to onset of ACR50 was approximately 4 or more for all onset time points explored and there was a trend towards an increased positive LR with earlier onset (table 2). An earlier onset of action was an important factor in the ability of LDAS to reflect patient satisfaction (table 2). The positive LR of greater than 5 for onset of ACR50 or LDAS in the first 1 or 2 months suggests not only prognostic, but also strong diagnostic evidence (table 2).14
Further analyses were performed on the subgroup of patients who achieved an ACR20 or LDAS, examining the impact of whether achievement of this response/status within the first 3 months reflected radiographic progression or patient satisfaction: positive LR were less than 2 for both ACR20 and LDAS (supplemental table 3, available online only).
Sustainability of response or disease status
Table 3 presents the overall percentage success rate, NNS and positive LR values for sustainability of ACR50 versus LDAS according to discriminatory capacity, inhibition of radiographic progression and patient satisfaction. As expected, the overall response rate for ACR50 and LDAS progressively decreased with increasing numbers of required consecutive visits.
For both ACR50 and LDAS, NNS was substantially higher for six consecutive visits compared with one or more consecutive visits (table 3).
For inhibition of structural progression, positive LR was less than 2 for all ACR50 methods of reporting, regardless of how long the response was maintained (table 3). For LDAS, there was a slight trend towards better reflection of inhibition of structural damage (TS <0) with increasing sustainability of response. Sustainability for three or more consecutive visits demonstrated positive LR of 2 or greater (table 3).
Sustainability was an important factor in the ability to reflect patient treatment satisfaction for both ACR50 and LDAS (table 3). LR increased progressively with both sustainability of ACR50 and LDAS.
Analysis of the impact of sustainability on ACR20 and ACR70 and MDAS and remission is presented in supplemental table 4 (available online only).
This exploratory analysis of data from the AIM trial strongly supports the concept that the performance of different methods of reporting ACR or DAS28-based criteria is dependent on the desired outcome (eg, a better discriminatory capacity, inhibition of radiographic progression or patient satisfaction with treatment). This is the first study to evaluate the performance of some of these methods in relation to outcomes deemed to reflect the perspectives of the clinical trialist, rheumatologist and patient.
From the perspective of researchers designing clinical trials to test the efficacy and safety of new compounds (eg, phase II trials), our data suggest that the “less stringent” ACR20 and MDAS criteria assessed at the end of the trial achieved the highest discriminatory capacity, allowing detection of a treatment effect using fewer patients. When onset of action and sustainability were considered, an increase was observed in the NNS required to detect a treatment effect, suggesting that when designing clinical trials, it may not be beneficial to take these aspects into account for ACR and DAS28-based criteria.
For the criteria examined in this analysis, sustainability of good disease status (LDAS) for 3 months or more during the first 6 months of the study was the only method of reporting that reflected the absence of radiographic progression at year 1 (using the positive LR cut-off of 2). Previously reported data support this finding. In a longitudinal study including patients who were followed for up to 9 years, fluctuations in disease activity (compared with sustained LDAS or high DAS) were predictive of more severe radiographic progression.15 All other techniques of reporting assessed here were poor predictors of the inhibition of radiographic progression.
The methods that best reflected patient satisfaction with treatment were “stringent” reporting techniques for both response to treatment (ACR70) and status of disease (DAS28-defined remission). A faster onset (within the first 3 months) and the sustainability of a response/status of disease were important factors in the ability to reflect patient satisfaction. These results would be expected on the basis that a relatively early and sustainable improvement in terms of symptoms, pain, disability and fatigue is likely to be a primary concern for the patient in terms of treatment outcome and quality of life.16 17
These findings highlight the importance of the onset and sustainability of a treatment response or disease activity status and support recent EULAR/ACR recommendations that propose that the reporting of clinical trials should include both the time to onset and the sustainability of the primary outcome.18 19
Interpretation of our findings should be made in the context of the study limitations. Results were obtained in an exploratory analysis of a single clinical trial evaluating a single compound (abatacept); before proposing firm recommendations, similar analyses should be conducted on data from different trials evaluating alternative compounds. Moreover, similar assessments using other criteria, such as LDAS by DAS28 (using the erythrocyte sedimentation rate) or by the simplified disease activity index or the clinical disease activity index need to be evaluated. For most of the analyses, the observed values of positive LR were not conclusive as the values were below 10 or even 5, the thresholds often used to reflect a “relevant” value for diagnostic purposes.14 However, a clear, accepted definition of relevant positive LR thresholds does not exist. For example, in the case of evaluation of the risk of toxic events while taking non-steroidal anti-inflammatory drugs, a positive LR value of 1.4 has been considered unacceptable.20 Conversely, other studies, including one evaluating the ability to predict persistent (erosive) arthritis, have demonstrated that positive LR values of more than 2 may be considered a relevant prognostic value,21 22 whereas more than 10 is considered a diagnostic value.12
Finally, the relatively short duration of follow-up presented here could impact the results. Observations were limited to this time period because the number of patients achieving the more stringent criteria (eg, ACR70 or remission), or onset or sustainability at later time points, was too low to make valid comparisons. However, as most clinicians would expect to observe a treatment effect within 6 months of therapy initiation, this time frame is probably an acceptable period of assessment.
Considering these limitations, our analyses demonstrate that the optimal method of assessment can depend on the outcome of interest, and that onset and sustainability of success may be important factors to consider when assessing the efficacy of therapies for patients with RA. A potential “optimal” technique could be the life-table analyses technique, in which the event is defined by the time taken to reach an acceptable sustained status. Future studies are required to confirm and extend the findings presented here using other RA patient databases, longer study durations and similar analyses in other disease areas.
Competing interests: Declared. CL is an employee of Axial; NS and MLB are employees of Bristol-Myers Squibb and own stocks and options; DA has received consultancies and honoraria from Bristol-Myers Squibb; GW has received consultancies, speaking fees and honoraria from Bristol-Myers Squibb; MD has received consultancies, speaking fees and honoraria from Bristol-Myers Squibb, Abbott, Wyeth, UCB and Roche; PvR has nothing to disclose; MS has received consultancies from Abbott, Amgen, Bristol-Myers Squibb, Wyeth-Ayrest and UCB, speaking fees from Abbott, Amgen and Wyeth-Ayrest and grants from Abbott, Amgen, Bristol-Myers Squibb, UCB, Centocor, Roche, Genentech and Targeted Genetics; JSS has received honoraria and a research grant from Bristol-Myers Squibb.
Funding: This study was supported in part by an unrestricted research grant-in-aid from Bristol-Myers Squibb. Editorial assistance was provided by Medicus International and funded by Bristol-Myers Squibb. Under direction of the authors, editorial assistance was provided at the first draft stage and during subsequent revision of the manuscript.
Ethics approval: Ethics approval was obtained.
▸ Additional supplemental tables are published online only at http://ard.bmj.com/content/vol68/issue4
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.