Article Text

Download PDFPDF

Performance of response criteria for assessing peripheral arthritis in patients with psoriatic arthritis: analysis of data from randomised controlled trials of two tumour necrosis factor inhibitors
  1. J Fransen1,
  2. C Antoni2,
  3. P J Mease3,
  4. W Uter4,
  5. A Kavanaugh5,
  6. J R Kalden2,
  7. P L C M Van Riel1
  1. 1Department of Rheumatology, University Medical Centre St Radboud, Nijmegen, The Netherlands
  2. 2Department of Medicine III, Friedrich-Alexander University, Erlangen, Germany
  3. 3Seattle Rheumatology Associates, Swedish Medical Center, Seattle, Washington, USA
  4. 4Department of Medical Informatics, Friedrich-Alexander University, Erlanger, Germany
  5. 5Division of Rheumatology, Allergy, and Immunology, University of California, San Diego, California, USA
  1. Correspondence to:
    J Fransen
    Department of Rheumatology, Radboud University Nijmegen Medical Centre, PO Box 9101, NL-6500HB Nijmegen, The Netherlands;j.fransen{at}reuma.umcn.nl

Abstract

Background: In recent clinical trials in patients with psoriatic arthritis (PsA), the response criteria and disease activity measures that have been used were those developed for rheumatoid arthritis. However, these have not yet been validated in PsA.

Objective: To compare the responsiveness and discriminative capacity of the psoriatic arthritis response criteria (PsARC), American College of Rheumatology (ACR) and European League Against Rheumatism (EULAR) response criteria and the Disease Activity Score (DAS) and core-set measures in patients with PsA and peripheral arthritis, using the data from two randomised placebo-controlled trials of tumour necrosis factor inhibitors.

Methods: In an infliximab trial, 104 patients with active PsA were randomised to receive placebo or infliximab for 16 weeks. In an etanercept trial, 60 patients with active PsA were randomised to receive placebo or etanercept for 12 weeks. Data from baseline and the end of the intervention phase were used from each study. Responsiveness was assessed using the standardised response mean and effect size. Capacity to discriminate between the active drug and placebo was assessed using t values or a χ2 test. Measures were ranked in order of their t value or χ2 value.

Results: The EULAR criteria performed better in discriminating the active drug from placebo than the ACR20 improvement criteria, which in turn performed better than the PsARC. It was also found that the pooled indices (DAS and DAS28) were generally more responsive, and performed better in discriminating active drug from placebo, than the single core-set measures.

Conclusion: Response criteria and pooled indices developed for rheumatoid arthritis are useful for the assessment of arthritis in PsA clinical trials.

  • ACR, American College of Rheumatology
  • DAS, Disease Activity Score
  • DIP, distal interphalangeal
  • EULAR, European League Against Rheumatism
  • GST, global statistical test
  • HAQ, Health Assessment Questionnaire
  • IMPACT, Infliximab Multinational Psoriatic Arthritis Controlled Trial
  • OMERACT, Outcome Measures in Rheumatoid Arthritis Clinical Trials
  • PASI, Psoriasis Area and Severity Index
  • PhGA, physician global assessment
  • PsA, psoriatic arthritis
  • PsARC, psoriatic arthritis response criteria
  • PtGA, patient global assessment
  • RAI, Ritchie Articular Index
  • SJC, swollen joint count
  • SRM, standardised response mean
  • TJC, tender joint count
  • TNF, tumour necrosis factor
  • VAS, Visual Analogue Scale
View Full Text

Statistics from Altmetric.com

A growing awareness of the severity of psoriatic arthritis (PsA), as regards to the effect of the arthritis and other manifestations of the disease, has been observed. A greater understanding of the pathophysiology of PsA, combined with progress in biotechnology, has resulted in the development of new, highly effective agents for the treatment of PsA. Although the peripheral arthritis of PsA shares some clinical characteristics with that of rheumatoid arthritis, PsA shows some distinct features. Nevertheless, in assessing the response to treatment of peripheral arthritis, outcome measures have been largely borrowed from those developed for rheumatoid arthritis.1–3 These include the individual rheumatoid arthritis core-set measures, the American College of Rheumatology (ACR) improvement criteria, the Disease Activity Score (DAS) and the European League Against Rheumatism (EULAR) response criteria.4–9 These measures, as well as composite criteria suggested for PsA (commonly called psoriatic arthritis response criteria (PsARC)),10,11 have never been validated in PsA. Therefore, we compared the responsiveness and discriminative capacity of those rheumatoid arthritis response criteria and activity measures in patients with PsA having peripheral arthritis, using the data from two randomised placebo-controlled trials of tumour necrosis factor (TNF) inhibitors.

METHODS

Data

The data were derived from two recently published phase II randomised placebo-controlled trials of TNF inhibitors in patients with PsA.11,12 For the analysis, only data from baseline and the end of the intervention phase were used for each study.

In the trial by Antoni et al12 (Infliximab Multinational Psoriatic Arthritis Controlled Trial (IMPACT)), 104 patients with active psoriatic arthritis (defined as ⩾5 swollen joints and ⩾5 tender or painful joints) were randomised to receive placebo or infliximab for 16 weeks.

In the trial by Mease et al11 (etanercept), 60 patients with active psoriatic arthritis (defined as ⩾3 swollen joints and ⩾3 tender or painful joints) were randomised to receive placebo or etanercept 25 mg twice weekly, for 12 weeks. In both trials, there were no patients with exclusive distal interphalangeal (DIP) involvement.13 According to definitions by Helliwell et al,14 95% of the patients had polyarthritis (>5 joints affected) at baseline. The relevant ethics committees approved the trials and all patients provided written informed consent.

Assessments and measures

The patient assessments, physical examinations and laboratory assessments were similar in both trials. Arthritis pain and patient global assessments (PtGA) of disease activity were rated using the Likert scale (0–5), as well as by a Visual Analogue Scale (VAS, 0–100) in the IMPACT trial. Disability was assessed using the Disability Index (0–3) of the Health Assessment Questionnaire (HAQ). Duration of morning stiffness was rated in minutes. Physician global assessment (PhGA) of disease activity was rated using the Likert scale (0–5), as well as by a VAS (0–100) in the IMPACT trial. A 76 swollen-joint count (SJC76, 0–76) and a 78 tender-joint count (TJC78, 0–78) were used in the etanercept trial, and a 66 SJC (SJC66, 0–66) and a 68 TJC (TJC68, 0–68) in the IMPACT trial. All counts included the DIP joints of the hands and feet. Reduced joint counts (TJC28 and SJC28) were calculated from the extended joint counts. The Ritchie Articular Index (RAI 0–75) was calculated using the weighted TJC, omitting the subtalar joints that were not assessed. The Psoriasis Area and Severity Index (PASI 0–72) was assessed to determine skin involvement. Erythrocyte sedimentation rate (mm/h) and CRP (mg/l) were determined as measures of the acute-phase response. We refer the readers to the original publications for more details.11,12

Indices for arthritic activity that were calculated from the data were the DAS and the Disease Activity Score of 28 joint counts (DAS28).6,7 The DAS and the DAS28 were calculated as

DAS = 0.53938√(RAI)+0.06465(SJC44)+0.330ln(ESR)+0.0072(PtGA)

and

DAS28 = 0.56√(TJC28)+0.28√(SJC28)+0.70ln(ESR)+0.014(PtGA).6,7

Appropriate formulas omitting the global assessment were used for the etanercept trial, because PtGA was not assessed using a VAS scale.6,7

Improvement criteria (table 1) calculated were the EULAR response criteria,8,9 the ACR improvement criteria with 20%, 50% and 70% cut-off points,5 and the PsARC.10,11 Full available joint counts were used for PsARC and ACR criteria, thus including DIP joints. The PsARC use PtGA and PhGA in a Likert-scale format (improvement = decrease by 1 category, worsening = increase by 1 category) and TJC and SJC (improvement = decrease by 30%, worsening = increase by 30%).10 Accordingly, patients were classified as responder when there was improvement in at least two of the four measures, one of which must be a joint score, and no worsening in any of the four measures. Additional low-disease activity criteria calculated were criteria for minimal disease activity in rheumatoid arthritis as proposed by the Outcome Measures in Rheumatoid Arthritis Clinical Trials (OMERACT) conference in 2004. They defined minimal disease activity as the absence of swollen or tender joints and an ESR ⩽10, or having low values for at least 5 of the 7 core-set measures: not more than 1 tender joint (out of 28), not more than 1 swollen joint (out of 28), pain (VAS) ⩽20 mm, PtGA (VAS) ⩽20 mm, PhGA (VAS) ⩽15 mm, ESR ⩽20 mm/h and HAQ ⩽0.5 points15; the low-disease activity criteria of the DAS (⩽2.4) and the DAS28 (⩽3.2),8,9 and “near-remission” low-disease activity by the DAS (<1.6) and the DAS28 (<2.6).16,17 In addition, a global statistical test (GST) was calculated from the seven available measures of the core set for rheumatoid arthritis randomised controlled trials.18 A GST is sometimes advised for trials with multiple end points to increase the power. The GST was calculated by ranking the patients according to their response for every core-set measure separately, followed by addition of all seven rank numbers for every patient.18 The resulting summed rank values were analysed using a t test. The GST approach is very powerful in randomised controlled trials, but it is a measure of change aimed solely at statistical testing; the summed rank values are not clinically interpretable and cannot be compared across randomised controlled trials.

Table 1

 An overview of the response criteria

Statistical analysis

Responsiveness of the continuous outcome measures was assessed using the standardised response mean (SRM, calculated as mean change/SD change) and effect size (calculated as mean change/SD baseline). SRM and effect size can be seen as signal-to-noise ratios. Within trials, higher values of SRM or effect size point to better sensitivity to change or responsiveness of a particular measure. It is not clear whether SRM or effect size is the better responsiveness statistic, although we prefer the SRM because it shows change in its nominator as well as its denominator. The discriminative capacity of the continuous outcome measures was tested using t values of the difference in change between the active drug and placebo. Larger t values point to better discrimination and lead to smaller p values for the difference between active drug and placebo. Next, all measures were ranked according to the t values, as the aim of the trials was the discrimination of the active drug from placebo.19 The t value of the GST can be used as a reference, as a GST is supposed to represent maximal power when combining existing measures.18 The discriminative capacity of the different response criteria and low-disease activity criteria was tested using the Mantel–Haenszel χ2, allowing the comparison of dichotomous and trichotomous measures with each other. For these analyses, no attempt was made to pool the data of both trials, mainly because of the differences in drugs and timing of the follow up. SRMs, t values and χ2 values are therefore best interpreted within a trial, not across both trials. However, the relative ranking of measures was compared across the trials.

RESULTS

As indicated by the mean changes (tables 2 and 3), all measures improved in the active drug group in both trials. In the IMPACT trial, for example, the DAS28 showed a mean improvement of 2.6 points, the TJC68 a mean improvement of 13 joints and the HAQ a mean improvement of 0.56 points. In the etanercept trial, the DAS28 reduced by a mean of 1.7 points, the TJC68 improved by a mean of 12 joints and the HAQ improved by an average of 0.73 points. In both active drug groups the responsiveness statistics were >1.0 for most measures.

Table 2

 Responsiveness of individual items and indices ranked by t value in the Infliximab Multinational Psoriatic Arthritis Controlled Trial

Table 3

 Responsiveness of individual items and indices, ranked by t value in the etanercept trial

Placebo responses in terms of improvement, which are indicated by negative SRMs in tables 2 and 3, were largest in ESR, CRP, SJC and the DAS28 in the IMPACT trial, and in PhGA and the SJC in the etanercept trial.

All indices of disease activity performed better than the single variables in both trials with regard to responsiveness (SRM and effect size) and discriminating change in the placebo group from change in the active drug group (t value) than the single variables in both trials (tables 2 and 3). The SRM and effect size in the active drug group represent the responsiveness to change of a measure. The ability to discriminate between change in the active drug group and a probable change in the placebo group is given by the t values. A t value >1.96 is indicative of a significant (p<0.05) difference between the treatment groups. The clinical index that performed best, according to the ranking of the t value, was the DAS28, with a performance similar to the global statistical index GST. The single variable that performed best was the PhGA of disease activity. Patient ratings of pain and global disease activity were more responsive than joint counts in the IMPACT trial, irrespective of their format (0–5 or 0–100), but not in the etanercept trial. TJCs were more responsive than SJCs and the measures of the acute phase response. Further, extended joint counts were generally more responsive than reduced ones.

In both trials, the EULAR response criteria performed better (χ2) than the ACR20 or PsARC criteria did in discriminating the active drug from placebo (table 4). The PsARC performed similar to or less well than the ACR20 criteria. Using data from the IMPACT trial, the ACR improvement criteria were recalculated using the more responsive VAS scales, instead of 0–5 rating scales, for global assessments and pain. The results of both methods of calculation were exactly the same (not shown).

Table 4

 Discriminative capacity of response criteria

The DAS28 low-disease activity criterion performed as well as the ACR20, PsARC and OMERACT criteria.

DISCUSSION

In this study, we showed that response criteria and pooled indices initially developed for rheumatoid arthritis are useful for assessing arthritis in PsA clinical trials. In this analysis we considered only measures of peripheral arthritis. Other important core domains of PsA that may be assessed in clinical trials include the involvement of the skin and nails, axial arthritis and the involvement of periarticular structures such as entheses.3 The development and validation of useful outcome measures for these domains will facilitate more a complete definition of the effect of treatment, comparisons among therapeutic approaches and understanding of the course of PsA.3 For assessing peripheral joint arthritis in PsA clinical trials, several measures that were primarily developed for use in rheumatoid arthritis are used. However, the potential value of these measures in PsA had not been shown. Therefore, we compared the responsiveness and discriminative capacity of existing response criteria and arthritic activity measures in two randomised placebo-controlled trials of TNF antagonists in PsA.11,12 In the trials we studied, the ACR20 improvement criteria, the EULAR response criteria and the PsARC all showed significant differences between the active drug and placebo with p<0.001. When ranking the response measures according to their discriminative capacity, the EULAR criteria performed better in discriminating the active drug from placebo than the ACR20 improvement criteria, which in turn performed better than or as well as the PsARC.

This does not necessarily imply that the EULAR criteria will be “the best” response measure for all future PsA clinical trials. In rheumatoid arthritis, the discriminating capacity of ACR20 and EULAR criteria was reported to be similar in trials of conventional disease-modifying antirheumatic drugs.20 In this study, we used clinical trials of highly effective biological agents. A factor contributing to the performance of the EULAR criteria in these trials is that many active drug-treated patients reached the low-disease activity criterion of the EULAR response criteria, but few placebo-treated patients did. Although the ACR50 and ACR70 criteria have been proposed as meaningful end points when testing very effective treatments, these stringent measures were less discriminative than the ACR20 criteria in this study. This is consistent with findings in rheumatoid arthritis.21 The low-disease activity criteria that were tested in this study (eg, DAS28<3.2) discriminated well between the active drug and placebo. However, as the cut-off point was lowered (eg, DAS28<2.6 or OMERACT MDAS), smaller numbers of active drug-treated patients fulfilled these criteria and the discrimination was less clear.

The pooled indices (GST, DAS, DAS28) were generally more responsive and discriminating than single measures in the trials we studied. The DAS28 performed better than the DAS; this was unexpected because the DAS includes a larger number of counted joints. A reason for the lower discriminative ability of the DAS as compared with the DAS28 may be that in the DAS relatively much weight is given to the RAI, which in both trials did not perform as well as the ungraded TJCs, including the 28TJC. The discriminative capacity of the RAI was also less than the TJC in a study comparing responsiveness of outcome measures in early rheumatoid arthritis.19 The RAI may be less sensitive to change because of observer error due to the grading of tenderness, as well as the scoring of the proximal interphalangeal joints of each hand, metacarpophalangeal joints of each hand, and metatarsophalangeal joints of each foot as a single unit. The 28SJC and the 28TJC showed less discriminative ability than the more extended joint counts. However, their combination in the DAS28 seemed advantageous, as the discriminative ability of the DAS28 was close to the performance of the Global Statistical Index, which included the full joint counts. The reason for not losing all discriminative information when omitting the feet and DIP joints from a full joint count is that changes in extended and reduced joint counts are related. Certainly, in individual patients, excluding the feet and DIP joints from joint counts may lead to underestimation of disease activity. This is especially so in patients with monoarthritis or oligoarthritis and in patients with predominantly, or exclusively, DIP involvement. Underestimation of disease activity by omission of joints is not a major issue in larger randomised controlled trials because this underestimation will occur at random in the arms of the trial. Most of the patients in the trials we studied had involvement of at least five joints (tender or swollen), so we could not study the response in patients with oligoarticular disease. In those patients, it may be more difficult to reach and measure response.

The results of our study apply only to clinical trials. In practice, the goal of treatment is principally to reach low-disease activity, and therefore response criteria are generally less useful as they present change in status, whereas the interest is in knowledge about absolute status. The DAS or DAS28 are measures with absolute value and could therefore be interesting for assessing PsA arthritic activity in practice, but they present a level of disease activity “as would the patient have RA”. Use of the DAS and DAS28 for patients with PsA in practice might be more appropriate in rheumatoid arthritis-like polyarthritis, but it is still not clear what a certain level of DAS or DAS28 means in PsA. Especially, there is no clarity on whether it is necessary to include DIP joints in the joint counts, even when isolated DIP involvement is seldom seen,22,23 and whether the weights for the joint counts and ESR and PtGA should be changed or whether ESR can be omitted.

It can be concluded that response criteria and pooled indices developed for rheumatoid arthritis may be used in PsA clinical trials as well, at least in the case of active polyarticular presentation as predominantly expressed in trials assessing highly effective TNF inhibitors. Further research may help to define the extent to which development of measures specific for PsA might perform better.

REFERENCES

View Abstract

Footnotes

  • Published Online First 27 April 2006

  • Competing interests: None declared.

  • CA is currently working at the Schering-Plough Research Institute, Kenilworth, New Jersey, USA.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.