Objectives To assess the discriminatory capacity of various outcome measures and response criteria in patients with peripheral spondyloarthritis (pSpA).
Methods Data originated from two randomised controlled trials, ABILITY-2 and Tnf Inhibition in PEripheral SpondyloArthritis (TIPES). Continuous outcome measures included patient's global assessment (PGA)/physician's global assessment of disease (PhGA), C-reactive protein (CRP), tender joint counts (TJC)/swollen joint counts (SJC), Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), and the Ankylosing Spondylitis Disease Activity Score (ASDAS). Dichotomous response criteria included Peripheral SpondyloArthritis Response Criteria (PSpARC), American College of Rheumatology (ACR), ASDAS and BASDAI response criteria. The capacity to discriminate between adalimumab and placebo groups was assessed by standardised mean differences (SMD) for continuous variables, and Pearson's χ2 for dichotomous response criteria.
Results Within each trial, the composite indices for axial SpA assessment, ASDAS-CRP (SMD: −0.63 and −0.89 in ABILITY-2 and the TIPES trial, respectively) and BASDAI (SMD: −0.50 and −0.73), and the single-item measures PGA (SMD: −0.47 and −1.12) and PhGA (SMD: −0.64 and −0.87) performed better than other single-item measures, such as CRP (SMD: −0.18 and −0.53), SJC or TJC. In general, the PSpARC and ACR response criteria discriminated better than ASDAS and BASDAI response criteria.
Conclusions The axial SpA-specific ASDAS-CRP and BASDAI, but also PGA and PhGA, demonstrated good discriminatory ability in patients with pSpA. The pSpA-specific pSpARC response criteria and the rheumatoid arthritis-specific ACR response criteria also discriminated well. To fully capture typical pSpA manifestations, it may be worth developing new pSpA-specific indices with better performance and face validity.
Trial registration numbers ABILITY-2: NCT01064856; TIPES: EUDRACT 2008-006885-27.
- Outcomes research
- Disease Activity
- Patient perspective
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Spondyloarthritis (SpA) is a spectrum of diseases with several subtypes with overlapping clinical, radiographic and genetic characteristics.1 Recently, the Assessment in SpondyloArthritis international Society (ASAS) has developed classification criteria for SpA, based on the predominant clinical manifestation, as either axial SpA (axSpA), presenting with chronic back pain, or peripheral SpA (pSpA), presenting with arthritis, enthesitis or dactylitis.1 ,2 Several tumour necrosis factor inhibitors (TNFi) have been approved for axSpA (both non-radiographic axSpA (nr-axSpA) and ankylosing spondylitis (AS)), as well as for psoriatic arthritis (PsA), although not for pSpA.3–7 Two randomised controlled trials (RCTs) have been performed to assess the efficacy of TNFi in pSpA.8 ,9 Adalimumab (ADA) was effective in both trials, which had different primary endpoints because no composite measures or response criteria had been previously validated in patients with pSpA. In the ABILITY-2 trial, a new composite outcome measure, the Peripheral SpondyloArthritis Response Criteria (PSpARC)40, was developed as the primary endpoint.9 In the Tnf Inhibition in PEripheral SpondyloArthritis (TIPES) trial, the improvement in patient's global assessment of disease (PGA) was chosen as the primary endpoint.8
Therefore, there exists a need to identify discriminant outcome measures for pSpA. Several efficacy variables were used in the two RCTs, including measures developed specifically for AS, such as the Ankylosing Spondylitis Diseases Activity Score (ASDAS) and the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI); and measures developed specifically for rheumatoid arthritis (RA, also applied in PsA), including the American College of Rheumatology (ACR)20/50/70. However, the performance of all these outcome measures in pSpA is unknown, and we hoped to determine which measures best reflect disease activity and clinical response, because the success of future therapeutic clinical trials depends not only on a well-defined patient population, but also on the availability of valid outcome measures and response criteria. Taking into account the Outcome Measures in Rheumatology Clinical Trials (OMERACT) guidance,10 ,11 we compared the discriminative properties of outcome measures in pSpA. The aim of this study was to assess the sensitivity to change and discriminatory aspects of outcome measures and response criteria in pSpA.
The analysis included data from two double-blind placebo (PBO)-controlled RCTs, ABILITY-2 and TIPES,8 ,9 which evaluated the efficacy and safety of ADA in active patients with pSpA. ABILITY-2 (NCT01064856) included patients fulfilling the ASAS pSpA criteria2 who did not have PsA or AS. The primary endpoint was PSpARC40 at week 12, defined as ≥40% improvement from baseline in PGA and patient global pain, and ≥40% improvement in any of the following: (1) swollen joint count (SJC) and tender joint count (TJC); (2) enthesitis count; or (3) dactylitis count.
The TIPES trial (EUDRACT 2008-006885-27) included patients with pSpA, fulfilling the European Spondyloarthropathy Study Group (ESSG) criteria and/or the Amor criteria,12 ,13 without PsA or AS. The primary endpoint was the change in PGA at week 12.
Outcome measures and response criteria
The performance of the following outcome measures in assessing disease activity and treatment response after 12 weeks was evaluated: PGA, patient global pain, physician's global assessment of disease activity (PhGA), ASDAS-CRP, BASDAI, SJC, TJC and C-reactive protein (CRP).
The patient global pain in the past week, PGA and PhGA of current disease activity were recorded on a 0–100 mm visual analogue scale (VAS). The ASDAS-CRP, originally developed for AS, includes questions pertaining to axial and peripheral symptoms, PGA and CRP.14 Disease activity was classified as follows: <1.3, inactive disease; 1.3 to <2.1, moderate disease; 2.1 to ≤3.5, high disease; and >3.5, very high disease.15 The BASDAI, also developed for AS, consists of questions mainly for axial and peripheral complaints, measured on a 0–10 cm VAS.
Clinical response criteria assessed were PSpARC40/50/70, ASDAS-major improvement (ASDAS-MI, change in ASDAS≥2.0), ASDAS-clinically important improvement (ASDAS-CII, change in ASDAS≥1.1), ASDAS-inactive disease (ASDAS-ID, ASDAS<1.3),15 BASDAI50 (improvement of ≥50% in BASDAI score), BASDAI ≥2 units (improvement of ≥2 units)16 and ACR20/50/70.
Since the TIPES trial did not capture patient global pain, enthesitis or dactylitis counts, a modified PSpARC40 was determined, which included 40% improvements from baseline in PGA, patient global pain and SJC66 and TJC68. Patient global pain was calculated as the mean of BASDAI components #3 (joint pain/swelling) and #4 (enthesitis) because these showed the highest correlation to patient global pain in the ABILITY-2 trial (Spearman's coefficient=0.6). Modified ACR20/50/70 criteria were derived for the TIPES study, where patient global pain was calculated as above.
Analyses were performed for all outcome measures, for the two RCTs in the ‘as observed’ population, without imputation for rare missing data. The RCTs were separately analysed because this allowed results from one trial to be used to confirm results from the other and also because there were differences in the outcome measures collected and the inclusion criteria.
First, we evaluated whether levels assessed by these measures could discriminate between two states of disease activity. Since no gold standard is available to define disease activity states in pSpA, patients (in each trial) were assigned to (arbitrarily determined) states of low disease activity and high disease activity based on the PhGA and PGA at baseline of treatment (<40 vs ≥60 mm, excluding patients with values in between, in order to increase separation). In subgroups with low disease activity and high disease activity, the standardised mean difference (SMD) was calculated as the difference between the group means divided by the pooled SD. The SMD has no units, which facilitates comparison across disease measures. An SMD with higher absolute value indicates better discriminatory ability.
For the continuous outcome measures, the sensitivity to change of each outcome measure for detecting improvement from baseline to week 12 was determined by comparing the adjusted standardised means of change from baseline to week 12 for all measures for both treatment groups separately. Adjusted standardised mean changes from baseline to week 12 were obtained for each continuous outcome measure (dependent), stratified for treatment (ADA and PBO) and adjusted for baseline values of the corresponding outcome measure (covariate), using analysis of covariance. The following formula was used for standardising: ‘disease activity outcome change (week 12—baseline) divided by SD of that outcome at baseline’. These standardised mean changes reflect sensitivity to change of an outcome measure within a treatment group (ADA vs PBO).
We then evaluated whether responses assessed by these measures could discriminate between ADA treatment and PBO treatment by determining SMDs, which reflect the capacity of continuous outcomes measures to discriminate between change under ADA and change under PBO. Furthermore, the t-score and the Guyatt's effect size (ES) were determined for discriminatory capacity and sensitivity to change, respectively. Guyatt's ES is the mean change in the treatment group divided by the SD of the PBO group, and relates the magnitude of the effect (the ‘signal’) to the magnitude of the non-specific change (the ‘noise’). Guyatt's ES of 0.2, 0.5 and 0.8 represent small, medium and large effect size, respectively.17 A higher t-score indicates a better discriminatory ability within the same trial. The discriminatory ability of the dichotomous response criteria was determined by Pearson's χ2 or Fisher's exact test (if n<5). With a constant number of observations per outcome measure, a higher χ2 indicates better discriminatory ability. Analyses were performed using SPSS V.20 (SPSS, Chicago, Illinois, USA).
Demographics and disease characteristics
In total, 205 patients were included: 165 from ABILITY-2 and 40 patients from the TIPES trial. The primary results have been reported previously.8 ,9 Patients with pSpA were more often female, and the mean age was around 40 years. Half of them were human leukocyte antigen (HLA)-B27 positive, and the mean number of swollen joints was 5–6 (see online supplementary table S1). Of note, a high proportion had a history of enthesitis, and in ABILITY-2, dactylitis was not common. About half of them were using disease-modifying antirheumatic drugs, most often methotrexate or sulfasalazine, in equal proportions. Overall, the outcome measures were similar at baseline between the ADA and PBO groups in both studies.
Discrimination between disease activity states
In the ABILITY-2 subgroups with low disease activity versus high disease activity at baseline based on PhGA, the SMD was highest for ASDAS-CRP (1.16), followed by BASDAI (1.13), patient global pain (1.03) and BASDAI#3 (1.00) (table 1). In the TIPES trial, the SMD was highest for BASDAI #1 (2.66), PGA (2.01), ASDAS-CRP (1.84) and BASDAI (1.75). In the ABILITY-2 subgroups with low disease activity versus high disease activity based on PGA, the SMD was highest for patient global pain (4.46), followed by ASDAS-CRP (2.08), BASDAI#4 (1.62) and BASDAI (1.61) (data not shown). In the TIPES trial, the SMD was highest for BASDAI (2.22), followed by ASDAS-CRP (2.18), patient global pain (1.89) and PhGA (1.71).
Sensitivity to change of continuous outcome measures
All measures showed an improvement after 12 weeks of treatment in both trials, with greater improvements observed in patients on ADA versus PBO treatment (figure 1). Across both studies, the outcome measures with greatest sensitivity in detecting change from baseline after 12 weeks of treatment were PhGA, PGA, patient global pain, ASDAS-CRP, BASDAI and SJC. The ranking of the adjusted standardised means of change from baseline of the measures differed slightly between the trials. In ABILITY-2, the largest changes from baseline were observed in PhGA, patient global pain, PGA and ASDAS-CRP. In the TIPES study, the largest changes from baseline were observed in ASDAS-CRP, PGA, PhGA and patient global pain.
Discriminatory aspects of continuous outcome measures
In ABILITY-2, PhGA had the highest SMD (−0.64) (as well as Guyatt's ES and t-scores, data not shown), followed by ASDAS-CRP (−0.63), patient global pain (−0.50), BASDAI (−0.50), TJC78 (−0.50) and PGA (−0.47) (table 2). In the TIPES trial, PGA had the highest SMD (−1.12) (and Guyatt's ES and t-scores, data not shown), followed by patient global pain (−0.93), ASDAS-CRP (−0.89), PhGA (−0.87) and some single-item components of BASDAI and BASDAI itself (−0.73). SJC discriminated well in the TIPES, but not the ABILITY-2 trial. Since the discriminatory performance of outcome measures had not been previously investigated in the pSpA population, the effects of level of high or low levels of disease activity at baseline on performance were measured. The measures performed similarly, independently of level of disease activity at baseline (data not shown). However, in ABILITY-2, but not TIPES, the discriminatory ability of ASDAS-CRP, BASDAI, PGA, PhGA and TJC78 was enhanced in the subgroups with higher disease activity at baseline defined by BASDAI, compared with the subgroup with lower disease activity.
Discriminatory ability of categorical clinical response criteria
Among the response criteria used in both studies, ACR20/50 and PSpARC40/50 performed comparatively well in differentiating between ADA and PBO treatment with χ2 (in ABILITY-2 and the TIPES trial, respectively) for ACR20 of 16.05 and 11.79; for ACR50, 13.66 and 8.58; for PSpARC40, 8.18 and 8.58; and for PSpARC50, 13.46 and 7.13 (table 3). PSpARC70 showed significant discriminatory ability in ABILITY-2, although not in the TIPES trial (χ2 13.49 and 3.26, respectively). Among the AS-specific measures used in ABILITY-2 and the TIPES study, respectively, ASDAS-ID (χ2 7.17 and 10.13), ASDAS-CII (χ2 9.44 and 6.91) and BASDAI50 (χ2 10.90 and 7.13) showed significant discriminatory activity across trials. As expected, the TIPES trial reached less discrimination compared with ABILITY-2.
In conducting trials, as well as in monitoring patients in clinical practice, there is a need to define the optimal measures for disease activity and clinical response. This is acknowledged as an important research focus for pSpA, given the growing awareness and diagnosis of this disease and the need for new therapies. In our assessment of the performance and hierarchy of outcomes in two independent RCTs, we have found that among the status measures evaluated for disease activity, ASDAS-CRP, BASDAI, PGA, patient global pain and PhGA consistently had both the highest sensitivity to change from baseline and the highest level of discriminatory ability. Concerning the response criteria (to be used in RCTs), in both trials ACR20 and PSpARC50/70 performed best in terms of discrimination. Previously, a similar analysis in AS18 determined that ASDAS-CRP performed extremely well compared with other outcome measures with respect to sensitivity to change and discrimination.
The two RCTs used different inclusion criteria: ABILITY-2 used the ASAS criteria, whereas TIPES used the ESSG/Amor criteria, resulting in slight differences in the patient populations. Also, the trials partly assessed different outcome measures: PSpARC, ACR and patient global pain were assessed in ABILITY-2 but not TIPES. Therefore, we analysed them independently. Importantly, this allowed the validation of findings from one population in the other, thus adding robustness. As there is no gold standard for high disease activity or low disease activity in pSpA, we artificially constructed states of low disease activity and high disease activity based on two external constructs: PhGA and PGA. Regardless of the external construct used, the AS-specific indices ASDAS-CRP and BASDAI showed the best performance in both trials. The relatively better performance of the ASDAS and BASDAI indices, which were originally developed for AS, may be because both measures include aspects of peripheral joints (the presence of peripheral swelling; BASDAI also includes a question regarding enthesitis).
Interestingly, PGA and PhGA, which are non-disease-specific measures, performed as well as ASDAS-CRP and BASDAI (axSpA-specific measures). This finding was consistent across all analyses, supported by the fact that these four outcome measures showed the best discrimination between treatment groups. Notably, these non-disease-specific measures were used in the first trials with TNFi in SpA,19 ,20 when disease-specific outcome measures had not yet been developed. Possibly a similar cycle of outcome measure development may be required for pSpA. Our data strongly suggest that the perceptions of patients as well as physicians about disease activity are thus far not captured by existing disease activity indices and that the use of axSpA-specific disease measures inherently lacks face validity. The sensitivity and discriminatory ability of almost all these measures was increased in subgroups of patients with higher disease activity at baseline in the ABILITY-2 trial.
Among the response criteria evaluated, the PSpARC and the ACR criteria performed relatively well in both trials. The performance of the ACR20/50, which are used in RA and PsA, in the pSpA population, may be attributed to the overlapping arthritic symptoms between RA and pSpA. Although the ACR response criteria appeared to perform better than the PSpARC, unlike the PSpARC, these do not capture all manifestations of pSpA symptoms, which may limit their usefulness for patients with pSpA. In addition, the TIPES trial did not collect data on patient global pain, enthesitis and dactylitis, and therefore, the PSpARC was calculated using the PGA, patient global pain (calculated as mean of BASDAI questions 3 and 4), SJC and TJC. The performance of the axSpA-specific response criteria, ASDAS-MI and ASDAS-CII, and BASDAI50, was worse than ACR20 and PSpARC50/70. This is most likely because the cut-off levels for ASDAS-MI/CII and BASDAI50 were obtained and tested in populations with axSpA, and not in populations with pSpA. Our analyses suggest that certain disease activity states are specific to the population in which they have been validated. In other words, it is questionable whether ASDAS/BASDAI response criteria should be used in populations with pSpA, despite their very acceptable psychometric characteristics in this regard.
To our knowledge, this is the first study specifically looking into outcome measures in pSpA. A limitation of this study is that due to the different outcomes assessed in the trials, patient global pain, pSpARC40/50/70 and ACR20/50/70 were retroactively derived for TIPES. The differences in performance of some measures between the two studies may have been influenced by smaller sample size of the TIPES trial. The strengths of this study include the reasonably high number of patients, the controlled prospective design, the availability of a PBO group for comparison and the inclusion of many existing outcome measures and response criteria.
In conclusion, the continuous composite outcome measures ASDAS-CRP and BASDAI, as well as the single-item measures, PGA and PhGA, performed consistently well in both pSpA trials, and better than other single-item measures such as CRP and TJC, in detecting change from baseline, and in discriminating between active and PBO treatment. To fully capture typical pSpA manifestations such as enthesitis and dactylitis, it may be worthwhile to develop new composite measures, specific for pSpA, as the performance of PGA and PhGA in this analysis suggests that important parts of the patient's and physician's perceptions of disease activity are not yet captured by the current constructs. Regarding the response criteria, our results suggest the use of the disease-specific PSpARC and non-specific ACR criteria in future clinical trials because they represent multiple facets of pSpA disease (face validity), include patient's and physician's assessments (face validity), and performed well in both RCTs (discrimination) in comparison to other response criteria evaluated.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online table
Handling editor Tore K Kvien
MCT and SR serve as joint first authors.
Contributors MCT and SR contributed equally. In accordance with ICMJE authorship criteria, all authors are responsible for the development of this manuscript, and have reviewed and approved the final version.
Funding AbbVie funded the ABILITY-2 study (NCT01064856), contributed to its design and was involved in the collection, analysis and interpretation of the data, and in the writing, review and approval of the publication. AbbVie provided adalimumab and matching placebo for the TIPES study (EUDRACT 2008-006885-27). Statistical support was provided by Yinglin Xia, PhD, and medical writing support was provided by Naina Barretto, PhD, both of AbbVie.
Competing interests DLB has received research grants, consulting fees, and speaker's fees from AbbVie, BMS, Boehringer Ingelheim, Centocor, Janssen, MSD, Novartis, Pfizer, and UCB. PM has received research grants, consulting fees, and speaker's fees from AbbVie, Amgen, Biogen Idec, Bristol Myers, Celgene, Genentech, Janssen, Lilly, Merck, Novartis, Pfizer, UCB, and Vertex. RL has served as consultant/participated in advisory boards for Abbott/AbbVie, Ablynx, Amgen, Astra-ZenecA, BMS, Janssen (formerly Centocor), GSK, Merck, Novo-Nordisk, Novartis, Pfizer, Roche, Schering-Plough, TiGenics, UCB, and Wyeth; is Director of Rheumatology Consultancy BV, a registered company under Dutch law; has received research grants from Abbott, Amgen, Centocor, Novartis, Pfizer, Rhoche, Schering-Plough, UCB, Wyeth; has received speaker fees from Abbott, Amgen, Centocor, Novartis, Pfizer, Rhoche, Schering-Plough, UCB, Wyeth. I-HS and ALP are full-time employees of AbbVie and may hold stock and/or options.
Ethics approval The ABILITY-2 study is phase III, randomized, placebo-controlled, double-blind trial being conducted at 28 centers in Australia, Canada, Europe, and the US. The study has been performed in accordance with the International Conference on Harmonisation Guidelines for Good Clinical Practice and the Declaration of Helsinki. Approval of an institutional ethics review board and voluntary written informed consent were obtained prior to the initiation of study procedures. The TIPES study was conducted with the approval of the ethics committee of the Academic Medical Center/University of Amsterdam, The Netherlands for the TIPES trial.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All data used in the development of this manuscript were shared with all of the authors involved. The data in this manuscript are the results of a post hoc analysis.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.