Article Text
Abstract
Objective: To evaluate the responsiveness of patient reported outcomes (PROs), including fatigue, sleep, activity limitation, and quality of life, in patients with rheumatoid arthritis (RA).
Methods: Data were considered from a randomised controlled trial comparing abatacept (n = 258) with placebo (n = 133) on a background of DMARD treatment in RA patients who were inadequate responders to anti-TNF therapy (ATTAIN study). PROs assessed included SF-36, activity limitation, fatigue, and sleep. For each outcome the treatment difference, relative per cent improvement, standardised response mean (SRM), and relative efficiency for assessing an outcome’s ability to detect a treatment effect relative to tender joint count (TJC) were calculated. A relative efficiency >1 suggests a measure that is more efficient than TJC in detecting treatment effect.
Results: Moderate to large SRMs (⩾0.6) were observed for the PRO measures. In particular, SRMs (95% confidence interval) were: physician global, 0.72 (0.51 to 0.94); HAQ, 0.63 (0.42 to 0.85); SF-36 physical component score, 0.62 (0.40 to 0.83); SF-36 bodily pain, 0.68 (0.46 to 0.90); and fatigue, 0.59 (0.38 to 0.81). Relative efficiencies for physician global (1.6), SF-36 bodily pain domain (1.4), pain intensity (1.4), HAQ (1.2), SF-36 physical component score (1.2), fatigue (1.1), and patient global assessment (1.04) were all more responsive than TJC. The SF-36 mental component score (0.3), swollen joint count (0.6), activity limitation (0.8), sleep (0.7), and C reactive protein (0.9) were less responsive.
Conclusions: Using PROs for evaluating treatments for RA can detect improvements and will identify changes that are important to patients. In general, physical assessments are more responsive to an effective treatment than mental assessments.
Statistics from Altmetric.com
Patient reported outcomes (PROs) assess health, wellbeing, and treatment response from the patient’s perspective—aspects of the disease that may differ from the clinical manifestations that are most commonly measured by the clinician. These outcomes can range in complexity from simple concepts such as symptom assessment, to more complex concepts such as activities of daily living and quality of life. Consideration of PROs is important where changes in clinical measurements or imaging results may not translate into recognisable benefits to the patient, and in clinical trials where treatments may have similar effects in controlling disease but different effects on symptoms, function, or quality of life.
Sleep quality, fatigue, and work productivity have been identified at different OMERACT meetings as important aspects of the health and wellbeing of patients with arthritis.1–3 Individuals with a variety of common medical illnesses including arthritis frequently experience sleep disturbances. It is recognised that medical illnesses can adversely affect sleep quality, and that pain, infection, and inflammation can induce symptoms of excessive daytime sleepiness and fatigue.4 In particular, this is true for patients with RA.5 6 Fatigue is also an important issue for RA patients. Wolfe found that fatigue was “common across all rheumatic diseases, associates with all measures of distress, and is a predictor of work dysfunction and overall health status.”7 In particular for rheumatoid arthritis, Huyser and colleagues noted that fatigue related to RA appears to be associated with psychosocial variables.8 Patients with arthritis have higher unemployment rates than those with other chronic diseases and have more time lost from work.9–11 Days of activity limitation can be used as a simple assessment of activities of daily living. Although there are more elaborate measures of work productivity, days of activity limitation can be used as a simple indicator.
Simple assessments of specific aspects of RA from the patients’ perspective have been designed to measure changes in fatigue, sleep quality, and activity resulting from the treatment of this disease, but there is limited information on the treatment difference and relative per cent improvement expected, the effect size, and the relative efficiency of these measures in detecting treatment effects. It is critical to know if treatments we believe are effective in RA show meaningful improvement in these outcome measures, as well as in generic quality of life measures. Are they responsive to RA treatment?
The objective of this study was to evaluate the responsiveness of PROs including the short form 36-item health survey (SF-36), activity limitation, fatigue, and sleep with respect to clinical measures in RA patients.
METHODS
The data used for this evaluation came from a randomised, double blind, placebo controlled trial of an effective drug in patients with active rheumatoid arthritis. The ATTAIN study was a phase III multicentre, six month trial evaluating the efficacy and safety of abatacept on a background of disease modifying antirheumatic drug (DMARD) treatment in patients with active RA, who were anti-tumour necrosis factor (anti-TNF) treatment failures.12 Patients were randomised 2:1 to receive abatacept (n = 258) or placebo (n = 133) on a background of DMARDs. Several assessments were made. Core set measures were assessed: tender joint count (TJC) (0–68), swollen joint count (0–66), patient assessment of disease (0–100), physician assessment of disease (0–100), pain assessment (0–100), health assessment questionnaire (HAQ) (0–3; 0 = best, 3 = worst), C reactive protein, and erythrocyte sedimentation rate (ESR). Higher scores denote a worse outcome. The response criteria ACR20, ACR50, and ACR50 were assessed, and the EULAR disease states (good, moderate, none) determined.
Multiple patient reported outcomes were assessed. Health related quality of life was measured by the SF-36, which includes eight domain scores: physical functioning, role limitation due to physical problems (RP), bodily pain, general health perception, vitality, social functioning, role limitation due to emotional problems (RE), and mental health; and two SF-36 component scores: physical component score, and mental component score. The scores are 0–100, with higher scores indicating better quality of life. Pain and fatigue severity were assessed using a visual analogue scale (VAS) (0–100 mm, with higher scores indicating a greater degree of severity; 0 = none, 100 complete).13 Sleep quality was assessed using the medical outcomes study sleep module (MOS-Sleep),14 a validated instrument measuring sleep problems (for example, sleep disturbance, quantity, and adequacy). A sleep problems index (0–100, with higher scores indicating more problems) was also generated. Activity limitation was measured by a questionnaire assessing the number of days in the past 30 days on which a patient was unable to carry out their usual daily activities (defined as work (paid or unpaid) and any other daily activities (such as household chores or personal care)) because of RA.
Comparison of treatments (abatacept vs placebo) included: baseline comparison of demographic/clinical variables, study outcomes, and comparison of mean change from baseline. The ability to detect a treatment effect in the study outcomes was evaluated using four measures: treatment difference (the difference between the mean change in the abatacept group and mean change in the placebo group); relative per cent improvement (the ratio of the treatment difference to the pooled baseline scores); standardised response mean (SRM) (the ratio of the treatment difference to the pooled standard deviation of the mean change scores; see the appendix); and relative efficiency in relation to the TJC (the square of the ratio of the t statistic15 which corresponds to squaring the ratio of the SRM for the outcome to the SRM for the TJC). A relative efficiency >1 would imply that the outcome is more efficient than the TJC in detecting a treatment effect. Concordance between patients who were ACR responders and achieved improvement in study outcomes (positive agreement) and between those who were ACR non-responders and did not achieve improvement (negative agreement) was assessed. For each of the ACR response criteria (ACR20, ACR50, and ACR70), positive agreements were expressed as the percentage of patients who were classified as ACR responders who achieved >20% improvement in a study outcome, and negative agreements were expressed as the percentage of patients who were classified as ACR non-responders who had <20% improvement in a study outcome. The mean response change of the PROs across the response categories of the EULAR and ACR criteria was assessed.
RESULTS
The baseline demographic and clinical characteristics, as well as the baseline study outcomes, are given in table 1 by treatment group. There were no substantive differences between the study groups. For the patient reported outcomes, the overall scores for activity limitation, fatigue, and sleep were 17, 73 and 48, respectively, indicating great impairment at baseline. The SF-36 scores were well below the population norm of 50, denoting impaired quality of life. For all outcomes, the average scores were generally between the first and the third quartile of the range of possible values, and floor and ceiling effects were not evident.
The mean changes from baseline in the study outcomes were statistically larger for the abatacept group, with the absolute changes ranging from 4.1 to 22.3 for abatacept and 0.7 to 5.3 for control. The treatment difference and 95% confidence interval (CI) for each of the PROs are given in table 2. The relative per cent improvement for each of the PROs is also given in table 2. The largest relative per cent improvements were found for the acute phase reactants (C reactive protein 52%, ESR 38%) and the PRO activity limitation (35%). The smallest relative per cent improvements were found for the more mental PROs (RE (12%), mental health (8%), mental component score (9%)) and general health (9%). In particular, the PROs patient global assessment (24%), pain assessment (28%), and fatigue (23%) had larger relative per cent improvements than the generic quality of life outcomes SF-36 domains and component scores, which ranged from 8% to 21%. Within the generic measures the physical attributes had larger relative per cent improvements than the mental attributes; in particular, the physical component score (20%) was twice as large as the mental component score (9%).
The standardised response means and 95% confidence intervals for the outcomes are given in table 2. Moderate to large effect sizes (⩾0.60) were found for ESR and physician global assessment, as well as for the PROs pain assessment, HAQ, bodily pain, and physical component score. Fatigue (0.59) and patient global assessment (0.58) were in close proximity to this level. Many of these outcomes correspond directly to physical attributes, whereas all the small response means (<0.40) were found for the mental attributes mental health, mental component score, and RE.16 In particular, the response mean for physical component score (0.61) was twice as large as the response mean for mental component score (0.33). Most response means were around 0.5, a commonly used threshold for a moderate response mean.
The relative efficiencies in relation to the TJC are provided in table 2 and shown in fig 1 in decreasing order of magnitude. The outcomes ESR, physician global assessment, bodily pain, pain assessment, HAQ, physical component score, fatigue, and patient global assessment were more efficient than TJC in detecting a treatment effect. The least efficient outcomes were for the mental attributes mental health, mental component score, and RE.
Changes in individual patients in the study outcomes and the ACR responder criteria were assessed, and positive and negative agreements between improvements in study outcomes and ACR criteria are presented for abatacept treatment in table 3. For the PROs, the level of concordance was high, with only mental health, mental component score, RE, and general health perception falling below 50% agreement for the ACR20 responders, and activity limitation below 50% for ACR70 non-responders. For patients classified as an ACR responder, the percentage who improved on an outcome increased as the ACR response criteria increased. For example, the percentage showing >20% improvement on the physical component score was 59, 86, and 86 for the ACR20, ACR50, and ACR70, respectively. Correspondingly, for patients that did not satisfy the ACR responder criteria, the percentage of these patients who did not improve decreased with increasing ACR criteria. For example, the percentage showing <20% improvement on the physical component score was 79, 71, and 66 for the ACR20, ACR50, and ACR70, respectively.
Changes in the PROs by EULAR response criteria and the ACR responder criteria for the abatacept treated group are presented in tables 4 and 5, respectively. For all the PROs there was an orderly improvement across the “none”, “moderate”, and “good” EULAR states (table 4). The only exception was for sleep quality in which a worsening from the “none” to the “moderate” state was found. Similarly for the ACR responder criteria, changes in the PROs showed consistently greater improvement for patients satisfying the ACR20, ACR50, or ACR70 (table 5).
DISCUSSION
The specific instruments used for the PROs in this evaluation are well described in published reports. The MOS-Sleep17 is one of several instruments designed to assess sleep and has been shown to have good psychometric properties.14 For measuring fatigue, Wolfe18 reviewed fatigue scales of suitable length or previously used for rheumatic diseases and concluded that the VAS fatigue measure was just as suitable for a simple assessment of fatigue as more extensive scales. However, when it is of interest, a greater understanding of fatigue may be found by evaluating and comparing different domains of fatigue, requiring a multidomain fatigue instrument. Preliminary work has shown that “days of limited activity” is a simple activity participation measure reflecting real changes in patient clinical status and quality of life, and it is valid, reliable, and sensitive to change.19 Assessment of quality of life is a central PRO. A consensus on the important domains include physical function, emotional/psychological function, and social function.20–22 The widely used SF-36 is a multidimensional, patient centred, generic instrument that attempts to assess these various aspects of quality of life.23
Treatment groups were similar for baseline demographic, clinical, and study outcomes, and no evidence of floor/ceiling effects was found. Significant decrements in PROs were apparent at baseline. As reported elsewhere,12 significant treatment differences between abatacept and placebo were found for all core set measures and PROs. The differences were substantive in most cases, with significant but smaller differences noted for PROs associated with the more mental attributes (that is, RE, mental health, mental component score) and general health perception.
PROs, including the SF-36, activity limitation, fatigue, and sleep, are in general responsive measures in RA patients. The distinction, in particular, of the mental attributes RE, mental health, and mental component score from the other outcomes was also observed in this evaluation. The relative percentage improvement compared with placebo in each of the PROs was substantive: all were ⩾15% except for general health perception, RE, mental health, and mental component score. The response mean for many of the study outcomes was in the moderate range, with the effect size for physical attributes being generally larger than for the mental attributes. Moderate to large response means (⩾0.60) were found for patient global assessment, pain assessment, HAQ, bodily pain, physical component score, and fatigue; and small response means (<0.40) were found for the mental attributes mental health, mental component score, and RE. The PROs patient global assessment, pain assessment, HAQ, bodily pain, physical component score, and fatigue were more efficient than the TJC in detecting a treatment effect. The least efficient outcomes were for the mental attributes mental health, mental component score, and RE. In previous work comparing leflunomide and methotrexate with placebo,24 similar results were found, with measures related to mental aspects (mental health, RE, mental component score) usually being less affected by drug treatments for RA than measures related to physical aspects. When interpreting the clinical importance of the response of mental aspects observed, or when determining the sample size in designing a study, the limited effect expected should be taken into consideration.
In evaluating changes in individual patients, concordance was found between patients who were ACR responders and achieved improvement in study outcomes (positive agreement) and those who were ACR non-responders and did not achieve improvement (negative agreement). Concordance was found between changes in patient reported outcomes and the EULAR response criteria and ACR responders.
Several apparent pairings of generic measures from the SF-36 with disease specific measures can be identified, with similar responsiveness. Generic measures theoretically are less responsive than disease specific measures, but subtle differences in what is being assessed partially explain the lack of a difference in responsiveness. For example, the SF-36 bodily pain and the pain VAS both measure pain and have similar SRMs; however, bodily pain assesses pain impact and not intensity, which is assessed by pain VAS. The SF-36 vitality and fatigue VAS have the same SRM, but may be different concepts: “as an absence of fatigue does not necessarily mean the presence of energy, it is doubtful whether the SF-36 vitality subscale measures fatigue.”25 Although the SRMs for HAQ and SF-36 physical component score are similar (0.63 and 0.62, respectively), it is the SF-36 physical functioning that is more closely related to HAQ, and it is less responsive (0.43). The physical component score primarily consists of physical functioning, bodily pain, and RP, and the last two domains have higher SRMs (0.68 and 0.57, respectively) contributing to the higher SRM for the physical component score.
In this paper, the focus has been on the response of the outcome measures observed at the end of the study. When outcomes are assessed at frequent intermediate time points during the conduct of the study, additional aspects of the measures are of interest. In particular we should ask at what time point is a clinically important change obtained (time to onset), what is the time point at which maximum efficacy is obtained (time to onset), and how long is the important change maintained (durability). These important concepts could not be properly evaluated in this study as data were only available for three time points (baseline, three months, and six months).
Traditional treatment goals for RA patients include relieving signs and symptoms, improving physical functioning, and inhibiting progression of joint damage. Outcomes that are important from the perspective of the patient and are value added to these traditional outcomes would be beneficial in assessing treatment effects. As a generic quality of life measure, the SF-36 will permit comparisons of physical and mental aspects of quality of life of the RA patient group of interest with other patient groups and the general population. This contribution is unique and value added, when issues of quality of life are important. Fatigue and sleep have consistently been raised as important issues in their own right in RA patient workshops. The relation of fatigue and sleep to other outcomes, and to each other, is complex. For example, when determining the added value of including fatigue and sleep when assessing quality of life using the SF-36, it is of interest that a factor analysis found sleep to be consistently and highly correlated with the mental domains, and fatigue correlated with both the physical and mental domains.26 When a single overall response such as patient global assessment is used, it is unknown what drives the response. This is of particular relevance in situations in which the treatment may affect different PROs in different ways, and the complex nature of global measures raises questions about interpretation and reporting results in a misleading way. Separate measures of fatigue and sleep are often desirable, as both are too important to RA patients to be misinterpreted and misreported. Days of activity limitation is a simple measure of activity participation which is highly related to HAQ and the physical domains of the SF-36,26 and its unique contribution and validation is currently under investigation.19 27
With the growing interest in patient participation and with the endorsement of the International Classification of Functioning, Disability, and Health, there is increasing interest in validation of scales that assess other aspects of participation such as work productivity, fatigue, and sleep. Many patients were upset about the exclusion of these items in the OMERACT core set1 so this project is important in rectifying that omission. The challenge has been to assess these measures using psychometrically validated scales that have sufficient responsiveness. This project has done that. The implications of this study for practising rheumatologists are twofold. First, we demonstrate the efficacy of abatacept in treating RA symptoms of importance to patients and health care professionals. Second, the analysis we describe raises the value and prominence of formal patient reported measures in providing a holistic view of patients’ response to treatment.
In conclusion, using patient reported outcomes for evaluating treatments for RA patients can detect improvements and will identify changes that are important to patients. In general, physical assessments are more responsive to an effective treatment than mental assessments.
Acknowledgments
This study was supported in part by an unrestricted research grant-in-aid from Bristol-Myers Squibb.
APPENDIX
Relative per cent improvement from baseline (table 2) is given by the formula:
The standardised response mean is given by the formula:
where ma and sa are the mean and standard deviation, respectively, of the change scores from baseline in the abatacept group, and na is the number of patients in this group. Similarly, mc, sc, nc are the corresponding values for the control group.
REFERENCES
Footnotes
Competing interests: None declared