Psoriatic arthritis assessment tools in clinical trials
- 1Seattle Rheumatology Associates and Swedish Medical Center Rheumatology Research Division, University of Washington School of Medicine Seattle, WA, USA
- 2Department of Medicine III, Friedrich-Alexander University Erlangen-Nuernberg, Germany
- 3Toronto Western Research Institute, Psoriatic Arthritis Program, University Health Network, Centre for Prognosis Studies in the Rheumatic Diseases, Toronto Western Hospital, Toronto, Ontario, Canada
- 4Department of Medicine, Wellington School of Medicine and Health Sciences, University of Otago, Wellington, New Zealand
- Correspondence to:
Dr P J Mease
Seattle Rheumatology Associates, 1101 Madison St, 10th floor, Seattle, WA 98104, USA;
In order to measure disease activity, progression, and change with therapy in psoriatic arthritis (PsA), it is important to use accurate, reliable, and feasible outcome measures that can ideally be employed in longitudinal cohorts, clinical trials, and clinical practice. Until recently, there has been little focus on this methodology in PsA. Clinical trials and long term clinical registries have used disparate outcome measures. With emerging therapies, the focus on the methodology of outcome assessment has increased to ensure that discriminant and responsive instruments are used. The Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA), in conjunction with the society, Outcome Measures in Rheumatology (OMERACT), is focused on refining and developing outcome measures for a variety of disease domains reviewed in this report. Key domains to assess include joints, skin, enthesitis, dactylitis, spine, joint damage as assessed radiologically, quality of life, and function. These domains can be assessed by individual and composite measures. A number of measures have been “borrowed” from the fields of rheumatoid arthritis, ankylosing spondylitis, and psoriasis and adapted to PsA. Others are being developed specifically for PsA. Few are validated but most have been shown to perform well in distinguishing placebo from treatment response. This report reviews the current state of the art of assessment in PsA and points toward future directions of development of this field.
- ACR, American College of Rheumatology
- BASDAI, Bath Ankylosing Spondylitis Disease Activity Index
- CRP, C-reactive protein
- DAS, disease activity score
- ESR, erythrocyte sedimentation rate
- EULAR, European League Against Rheumatism
- GRAPPA, Group for Research and Assessment of Psoriasis and Psoriatic Arthritis
- HAQ, Health Assessment Questionnaire
- OMERACT, Outcome Measures in Rheumatology
- PsA, psoriatic arthritis
- PsARC, psoriatic arthritic response criteria
- RA, rheumatoid arthritis
- VAS, visual analogue scale
Accurate, reliable, and reproducible assessment of disease activity and change with time and/or therapy in psoriatic arthritis (PsA) are important for understanding its natural history and relative effectiveness of therapies. In this report we review assessments that have been used in PsA trials as well as those in development, particularly through the work of the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) and Outcome Measures in Rheumatology (OMERACT). (The group was founded as Outcome Measures in Rheumatoid Arthritis Clinical Trials, hence the acronym, but has since expanded its mission across the spectrum of rheumatology intervention studies.) Most of the assessment methodologies discussed have been adapted from clinical trials in rheumatoid arthritis (RA), ankylosing spondylitis, or psoriasis; however, these methodologies have not yet been validated for use in PsA. Skin and radiological assessments are covered in separate articles in this supplement.1–3
Assessment begins with inclusion of patients who have an accurate diagnosis. The problem of classification of PsA has been addressed elsewhere in this supplement.4 Until the work of the international ClASsification of Psoriatic Arthritis (CASPAR) study group has developed an updated schema, the traditional Moll and Wright criteria5 are being used. A number of challenges regarding the application of outcome measures used in other diseases, such as RA, become readily apparent when considering the subsets of PsA identified by Moll and Wright: oligoarticular, polyarticular, distal interphalangeal joint arthritis, spondylitis, and arthritis mutilans. For example:
Will the American College of Rheumatology (ACR) responder criteria or the disease activity score (DAS) based responder criteria, developed for a primarily polyarticular disease such as RA, function in a valid way for oligoarticular variants of PsA?
Will measurement of spinal disease developed in ankylosing spondylitis, wherein all patients have, by definition, spinal involvement, function in a reliable way in PsA, where spine involvement is generally less severe and variable in expression?
How does one approach measurement of unique clinical features such as dactylitis and enthesitis?
Without a well documented understanding of the natural disease progression, in terms of joint damage, as we know in RA, how are we to know what to expect in the placebo arm of a study, which may beneficially impact on disease progression?
What key core domains of PsA should be measured in every trial (such as joint tenderness and swelling, skin involvement, enthesitis, functional impairment, fatigue, etc.) and how should these be weighted in a composite responder index?
ARTICULAR AND COMPOSITE DISEASE ASSESSMENT
A key assessment in rheumatology clinical trials is the counting of the number of tender and swollen joints in a patient. Variable numbers of joints are counted in various systems, such as the ACR joint count of 68 tender and 66 swollen (excluding the hips from assessment of swelling), developed in 1949 for the evaluation of RA.8 These joint counts are then combined with other core measures of disease activity, such as patient and physician assessed global health, pain, function, and inflammation markers in different formulas. Such composite outcome measures of clinical response, developed in clinical trials of RA, include the ACR response criteria9 and the DAS,10 which form part of the European League Against Rheumatism (EULAR) response criteria,11 discussed in detail below. The DAS has been validated with several different numbers of assessed joints, now commonly 28. The Ritchie Articular Index is an older articular assessment methodology not currently used.12
An early attempt to develop a composite index of PsA disease activity included the Ritchie Index, a visual analogue scale (VAS) pain score, duration of morning stiffness, grip strength, and haemoglobin.13 Tested in varied subsets of PsA patients, it was only reliable in those with symmetrical polyarthritis, thus it has not been adopted as a responder index. The Ritchie Index was not validated.
PsARC and ACR response criteria
In the Veterans Administration study of sulfasalazine in PsA, a PsA specific response index was developed. To achieve response, a patient had to achieve two of the following, one of which had to be a joint count, and no worsening of any measure: tender or swollen joint count improvement of at least 30%, patient global improvement by one point on a five point Likert scale, or physician global improvement on the same scale.14 In this study, none of the individual measures showed statistically significant differences between the treated and placebo groups but the composite measure was able to do so marginally.
The first published study with a biological agent (etanercept)15 honoured historical precedent by using the same measure used in the Veterans Administration study as its primary outcome measure, and named it for the first time as the PsA response criteria (PsARC). As a secondary outcome measure, a modified version of the ACR joint count was used. The distal interphalangeal joints of the feet and carpometacarpal joints of the hands were added to the usual ACR joint count of 68 tender and 66 swollen, to yield a 78 and 76 joint count, respectively. Thus, the joints assessed for tenderness included the distal interphalangeal, proximal interphalangeal and metacarpophalangeal joints of the hands, and metatarsophalangeal joints of the feet, the carpometacarpal and wrist joints (counted separately), the elbows, shoulders, acromioclavicular, sternoclavicular, hip, knee, talo-tibial, and mid-tarsal joints. All of these except for the hips were assessed for swelling. Also, in this trial, joint tenderness, and swelling were graded 1–3. The other individual elements in the ACR scoring system, VAS scores of patient pain, patient global, physician global, the Health Assessment Questionnaire (HAQ), and acute phase reactant, C-reactive protein (CRP) or erythrocyte sedimentation rate (ESR) were unchanged from the way they are used in traditional RA trials. To achieve an ACR 20, 50, or 70 response, at least 20%, 50%, or 70%, respectively, improvement in tender and swollen joint counts and three of five scores of individual elements (VAS scores of patient pain, physician and patient global assessment, a disability measure (HAQ) and an acute phase reactant (ESR or CRP)). Both the PsARC and ACR scoring systems were clearly able to distinguish patients treated with and without etanercept.
In one other trial, that of leflunomide in PsA,16 the PsARC was used as the primary outcome measure. This measure was able to discriminate treatment from placebo response, as was the ACR 20. As will be seen in data from biological trials,17 frequency of ACR 20 response is typically lower than PsARC response, even though PsARC requires ⩾30% improvement in joint score, possibly because tender or swollen joint change are required, not both, and perhaps because of the absence of the elements of acute phase reactant and the HAQ score.
Subsequently, in the phase III etanercept trial,18 and phase II19 and III infliximab20 and adalimumab trials,21 the primary outcome measure was the ACR 20 response. This was chosen as a “more stringent” outcome measure than the PsARC (based on lower percentage response in placebo treated patients in previous trials). In all of the trials with antitumour necrosis factor (anti-TNF) agents, there was a highly statistically significant treatment response with p values <0.001. ACR 50 and 70 responses were also significantly improved in the treatment groups.
There has been a great deal of interest in the DAS, developed in Europe for both assessment of disease activity state, as well as change of disease activity with therapy in RA.10 The original DAS used the Ritchie Articular Index (RAI),12 swollen joint count (SJC), ESR, and general health status (GH) (VAS). The DAS is calculated as follows:
The scale range is approximately 0–6 (since ESR is not bounded). The DAS was derived from a calculated weight of key elements in RA activity as observed in clinical trials. It has been modified to a 28 joint count for simplification purposes, and also has been employed with and without the general health status question, all of which correlate with each other and with the ACR score.10,22–28
As mentioned, the DAS has proved to be a useful instrument because unlike the ACR criteria, which only measures change in disease activity, the DAS can characterise the current amount of disease activity. For example, changing from a “very severe” to a “severe” level of disease may be perceived as having different significance than changing from a “modest” to “low” disease state.
The DAS is being used as a basis for the EULAR response criteria, which are derived from a discriminant function analysis of RA patients with active and non-active disease11 in whom experienced rheumatologists judged whether important treatment changes had occurred. The patients are divided into three groups: non, moderate, and good responders. A non-response is a reduction in the DAS of ⩽0.6, with an endpoint DAS of >3.7. A moderate response is a reduction in DAS between 0.6 and 1.2, with an endpoint DAS of >2.4 and ⩽3.7. A good response is an improvement of >1.2 and endpoint DAS of ⩽2.4. For DAS 28, the reduction amounts are the same, and the endpoint scores are >5.1 for a non-response, >3.2 and ⩽5.1 for a moderate response, and ⩽3.2 for a good response.26 These measures have been highly discriminant and responsive in RA trials. A state of remission in RA is considered to be a DAS score of <1.6.27
Whereas the ACR criteria and PsARC have been used in recent PsA trial—for example, with leflunomide and the biological agents, the DAS has only been reported in trials with infliximab, where it was shown to be discriminant and responsive.19 Indeed, in a post hoc analysis of the data from the phase II trials of infliximab and etanercept (where the DAS was calculated from the elements of the ACR criteria), presented at OMERACT 7,29 all of these instruments were found to be discriminant and responsive, including all the variants of the DAS (CE Antoni, personal communication). It is important to note that if a 28 joint count is used as part of entry criteria to a trial, a number of PsA patients would be excluded, especially if they have predominantly DIP or lower extremity disease. Thus, we would suggest that if a DAS 28 is used, that the joint count to determine inclusion in the trial be at least the 68/66 count.
In several recent RA trials, instruments which demonstrate more continuous data than the ACR 20, 50, 70, such as the ACR-N, or summary measures, such as the area under the curve analysis of the ACR-N, have been used.30,31 Schiff has provided a review of this methodological approach,32 however, this methodology has not yet been applied in PsA. A committee authorised by the American College of Rheumatology is working on updating the ACR scoring system and incorporating informative elements from alternative criteria sets. This work will be reviewed and considered for incorporation into assessments of PsA.
An interesting observation is that PsA patients may experience less tenderness than RA patients when their joints are palpated. This was noted in a dolorimeter based study in Toronto.33 This should be kept in mind as we observe the rating of both the patient and physician regarding the relative severity of PsA.
Spine involvement has been reported in 51% of PsA patients.34 Sacroiliitis has been reported in up to a quarter of patients in several series.35–37 However, in one study of 221 patients, all with pelvis films, sacroiliac involvement was noted in up to 78%.14 Unlike ankylosing spondylitis, wherein axial involvement is present in all patients and tends to be more consistent in severity and in imaging findings, axial PsA is more inconsistent and heterogeneous. Because of this, current studies with biologicals have enrolled patients with active peripheral disease and have not attempted to measure axial involvement, even though the previous study by Clegg et al14 did attempt to measure spinal pain via the Douglas Spondylitis Articular Index.38
The ASessment in Ankylosing Spondylitis (ASAS) working group has recommended the use of a number of outcome measures for spinal involvement in ankylosing spondylitis, which includes the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI),39 the Bath Ankylosing Spondylitis Function Index (BASFI)40 and the Bath Ankylosing Spondylitis Metrology Index (BASMI).41 A more detailed description of spinal physical examination has been recently reviewed.6 The ASAS group has developed a responder index—the ASAS 20, 50, and 70, based on elements of the above indices and other patient assessment scales for use in ankylosing spondylitis trials. It is not clear if these instruments will perform well in PsA.
There are a number of difficulties in measuring spine disease in PsA clinical trials. As mentioned before PsA occurs less frequently and with greater variability of expression than is seen in ankylosing spondylitis. In an older patient, it is often difficult to distinguish pain from inflammatory spine disease versus pain from the very common presence of degenerative spine disease, even with imaging studies. Thus, these measures have not been attempted in most recently reported studies. In Clegg’s study of sulfasalazine,14 duration of morning stiffness, spinal and nocturnal pain, and the Dougados Spondylitis Articular Index38 were measured. The BASDAI items, as mentioned, have not been used in intervention trials in PsA. When PsA patients with and without spinal disease were assessed in a clinic cohort, the BASDAI scores did not differ.42 A principal components factor analysis in a different cohort of PsA patients showed that the BASDAI correlated more with measures of self reported wellbeing than with measures of disease activity, whereas the Dougados Articular Index showed the opposite, leading to a suggestion that this index should be further evaluated as a measure of spine inflammation in PsA.43 At the present time, the GRAPPA initiative is making an effort to determine a feasible and appropriate measure to distinguish the presence of axial disease due to PsA in these patients and to measure its change in clinical trials.
Enthesitis, a common feature of PsA, is characterised by inflammation at sites of tendon, ligament, and joint capsule insertion into bone. Common symptomatic areas include insertions of the Achilles’ tendon and plantar fascia to the calcaneus, ligamentous insertions, and insertions into the pelvis and to the bones of the thorax and spine. The enumeration of patients with or without enthesitis, before and after treatment, has been studied in three PsA trials.14,19,44 Neither sulfasalazine nor azathioprine showed response in this domain, whereas in both studies with the anti-TNF medicine, infliximab, there was. This particular domain has not been studied with etanercept, and results from trials of adalimumab are pending.
For spondyloarthropathies in general, two enthesitis measures have been developed, the Mander index45 and the Maastricht Ankylosing Spondylitis Enthesis Score (MASES).46 The Mander index assesses 66 sites; it is generally accepted that this is too many to feasibly and reliably assess in clinical trials. The MASES index, which represents a culling of the 13 most specific and sensitive sites from the Mander index in an assessment of patients with ankylosing spondylitis over two years, correlated well with the Mander. The MASES is a more feasible instrument, but it has not yet been assessed in PsA trials. A theoretical issue with these instruments is whether they will adequately discriminate from fibromyalgia tender points in patients with this as a comorbid condition with PsA. Gladman et al have reported on the performance of investigators from the Canadian Spondyloarthropathy Group (SPARCC) in their ability to reliably assess enthesitis areas: the plantar fascia, Achilles’ tendon, tibial tuberosity, and rotator cuff insertions. In the first three locations, observer agreement was “moderate” and in the fourth, “poor”.6,47 Both the degree to which this element of PsA contributes to disease burden and the prospect of its meaningful improvement with newer therapies argue for focus on reliable assessment and measurement of change to further understand the impact of therapy.
Dactylitis is characterised by swelling of a whole digit and represents a combination of synovitis and inflammation of tendon and ligament insertions. It occurs in approximately half of all patients with PsA.48 Presence or absence of dactylitis before and after drug treatment has been reported in three trials,9,18,19 with statistically significant change demonstrated in the two anti-TNF trials. A simple quantitative assessment, on a 0–3 scale is being employed in a current trial in PsA (PJ Mease, personal communication). A more quantitative scoring system is being developed, but has not yet been used in a clinical trial (P Helliwell, personal communication).
QUALITY OF LIFE, FUNCTION, “PARTICIPATION”
Increasingly, assessment of quality of life, physical function, and “participation”—the capacity to engage meaningfully and capably in activities of life—is important when judging the impact of disease and improvement with treatment. New therapies have shown significant effect in this arena. Treatment expense increasingly requires justification: Does it not only ameliorate symptoms and prevent disease progression but does it also enhance quality of life, reduce work absenteeism, improve functional capacity, and allow for return to a “normal life”? The methodology in this field is complex and draws input from such diverse fields as psychology, ergonomics, pharmacoeconomics, psychometrics, and sociology.
Quality of life measures can be generic, crossing multiple disease states, or specific, developed for a single condition. An example of the former, across multiple disease states and thus comparable between them, is the Medical Outcomes Study Short Form 36 (SF-36).49 This patient self-administered questionnaire assesses eight domains of health status: physical functioning, pain, vitality, social functioning, psychological functioning, general health perceptions, and role limitations due to physical and emotional problems. This has been employed in most recent rheumatology trials, including PsA, has shown significant improvements in PsA patients with effective treatment,15,18 and has been validated in PsA.50,51 The EuroQol-5D52 has shown no difference in the impact of RA and PsA on quality of life despite greater physical damage in the comparator RA group.53 The Arthritis Impact Measurement Scales (AIMS and AIMS2)54 instruments assess, in arthritis patients, physical, emotional, and social wellbeing, and have been validated and critiqued in PsA.55,56 Gladman et al suggest that although broadly used, generalised questionnaires allow comparison of patient groups and are worthwhile to use in both clinical trials and longitudinal cohorts, it is disease specific quality of life instruments that are likely to yield larger effects in clinical trials.6
The Psoriatic Arthritis Quality of Life (PsAQoL) measure, the first patient derived instrument specific for PsA, has shown reliability and construct validity,57 but has not yet been used in a clinical trial. Several measures have been developed to assess quality of life in patients with psoriasis and/or dermatological disease in general. The Dermatology Life Quality Index (DLQI)58 is a 10 item instrument developed as a measure of disability for a wide range of dermatological conditions. It has been the most used and validated instrument in psoriasis, and has been used in some studies of PsA to consistently show discriminant ability.18 A complementary instrument used to assess psychosocial wellbeing is the Dermatology Quality of Life Scale (DQoLS).59 This has not been used in PsA studies. Some psoriasis specific instruments, also not yet used in PsA trials, are the Koo-Menter Psoriasis Instrument (KMPI),60 the Psoriasis Disability Index (PDI),61 the Psoriasis Life Stress Inventory (PLSI),62 Psoriasis Quality of Life instrument (PsoriQoL)63 and the Salford Psoriasis Index (SPI).64 Quality of life, its measurement, and effects of therapy in both psoriasis and PsA have been recently reviewed.65
The Health Assessment Questionnaire (HAQ), originally developed to assess disability in RA,66 by focusing on physical disability and pain, has been used widely in inflammatory arthritis clinical trials, including PsA. The HAQ has been modified for spondyloarthropathies, which includes two spinal domains (HAQ-S),67 as well as a further skin modification for patients with psoriasis (HAQ-SK).68,69 These were shown to perform similarly, and thus the original instrument has been used in clinical trials. Gladman et al have emphasised that not only should this instrument be used in clinical trials, but it is also useful as a measure of health status in longitudinal clinical cohorts.6 Since the HAQ focuses on physical disability, even with the skin modification, it may not adequately capture disability in patients with predominantly skin disease. The HAQ will presumably show less change in the context of treatment that has a predominant effect on the skin and not the joints. Taylor has further observed that it may not adequately measure the activities affected in some subsets of PsA patients—by applying a measurement modelling technique known as Rasch analysis,70 we may better determine if the HAQ or its modifications are performing adequately in patients with different patterns of PsA.7
An important question in the field of functional evaluation is how much improvement in functional status is considered important by patients? Determining the “minimal clinically important difference” (MICD), for a measurement instrument is a key construct, which helps us determine if a change with treatment is not only statistically significant, but is also meaningful to a patient. The methodological science of determining MCID is complex and can be approached in a number of ways.71 A generally accepted value for MCID of the HAQ in RA is 0.22.72 The first attempt at determining such a value for PsA, based on analysis of the HAQ data from the phase III trial of etanercept on PsA, found that the MCID calculated on the basis of a patient rating method, conjectured as being more valid was 0.3, and when based on standard error of measurement method, was 0.4.73 This observation needs to be further confirmed by analysis in other trials and across trials before it can be fully accepted.
As noted earlier, there is a new domain in disease assessment, “participation”. This domain addresses not just a person’s ability to perform a task or action, but it takes into account their actual performance in life situations. For example, if a person is physically capable of working in a certain job capacity or visiting friends, but is embarrassed to do so because of unsightly psoriasis lesions, then their “participation” may be impaired even if not due to change in body function or structure. A World Health Organization initiative, the International Classification of Functioning, Disability and Health,74 is stimulating development of assessment tools, not just a self-report questionnaire, to capture this complex interaction between disease state, body function and structure, activities, participation, and environmental and personal factors.75 It is anticipated that such measurement will give us a better idea of the true impact of a disease on the individual, their family, and society, as well as inform us of the full impact of therapies in this global domain. The GRAPPA group was charged with developing such an instrument at OMERACT 7 and is proceeding forthwith.
Although some would argue that fatigue is a domain subsumed under global patient assessment, others, including patients, would suggest that it is an important domain in and of itself. It is notable that when asked about improvement with the newer biological agents, fatigue is a dimension, along with pain and function, often mentioned by patients as demonstrating significant improvement. A number of instruments to measure fatigue have been developed. One is to have the patients rate fatigue on a VAS scale. Multidimensional measures have been developed to try to capture various aspects of fatigue, such as physical and emotional fatigue. Examples of these include the Multidimensional Fatigue Inventory (MFI),76 the Fatigue Severity Scale (FSS),77 the Functional Assessment of Chronic Illness Therapy (FACIT) scale,78 and the Multidimensional Assessment of Fatigue (MAF) index, referred to as MFI.79 The MFI has been used in patients with ankylosing spondylitis and was found to be highly correlated with the single question on fatigue in the BASDAI. It also provided more complex information about fatigue in this patient group. The FSS has been studied in PsA patients, and it was able to distinguish patients from controls and showed correlation with disease activity.80 The FACIT is currently being employed in clinical trials of PsA patients. An advantage of the FACIT is that it can be adapted to a variety of chronic disease states, so can be compared across diseases.
Simple measures of inflammation, such as the acute phase reactants, ESR and CRP, are not reliably elevated in patients with PsA, even with active inflammation. In a recent analysis of the phase II etanercept and infliximab studies in PsA, a receiver operator curve analysis showed that these measures were not highly specific in discriminating placebo from treatment response (CE Antoni, personal communication). Thus, these may not be reliable markers to assess baseline disease activity or response to therapy. However, it has been noted that high ESR at presentation does correlate with progression of joint disease and early mortality.81
Translational studies of histological and immunohistochemical changes in PsA synovium and skin due to therapeutic intervention, recently reviewed,82 are a mode of outcome assessment, albeit impractical for general clinical use. It is through this work that we are learning more about the basic pathophysiology of joint and skin inflammation in PsA and the specific change that accrues from targeted therapy. A subgroup of GRAPPA is chartered to develop this area and standardise specific biomarkers assessed in different research centres.
FUTURE DEVELOPMENT OF PSA SPECIFIC COMPOSITE RESPONDER CRITERIA
It will be noted that the ACR criteria, the PsARC, and the EULAR response criteria are all considered composite responder indices and have proved effective in discriminating between placebo and treatment response. However, they do not incorporate what some might consider a full set of core domains to be assessed, such as skin, spine, and entheseal involvement. A future project in the development of a more comprehensive responder index will be to measure individual core domains and then to statistically derive a more full combination of measures that is both responsive and discriminant.
Outcome measures which measure clinical response, disease progression, and quality of life in PsA have been shown to be discriminant and responsive in recent clinical trials of emerging therapeutics. These measures will be undergoing further refinement through the efforts of GRAPPA and OMERACT. Methods to assess less fully explored domains, such as enthesitis, dactylitis, and “participation”, will be developed and applied in future clinical trials. The availability of effective outcome measures allows us to define more fully the impact of therapy, compare therapeutic approaches, and understand the course of PsA.