Article Text
Abstract
Objective A patient-derived composite measure of the impact of rheumatoid arthritis (RA), the rheumatoid arthritis impact of disease (RAID) score, takes into account pain, functional capacity, fatigue, physical and emotional wellbeing, quality of sleep and coping. The objectives were to finalise the RAID and examine its psychometric properties.
Methods An international multicentre cross-sectional and longitudinal study of consecutive RA patients from 12 European countries was conducted to examine the psychometric properties of the different combinations of instruments that might be included within the RAID combinations scale (numeric rating scales (NRS) or various questionnaires). Construct validity was assessed cross-sectionally by Spearman correlation, reliability by intraclass correlation coefficient (ICC) in 50 stable patients, and sensitivity to change by standardised response means (SRM) in 88 patients whose treatment was intensified.
Results 570 patients (79% women, mean±SD age 56±13 years, disease duration 12.5±10.3 years, disease activity score (DAS28) 4.1±1.6) participated in the validation study. NRS questions performed as well as longer combinations of questionnaires: the final RAID score is composed of seven NRS questions. The final RAID correlated strongly with patient global (R=0.76) and significantly also with other outcomes (DAS28 R=0.69, short form 36 physical −0.59 and mental −0.55, p<0.0001 for all). Reliability was high (ICC 0.90; 95% CI 0.84 to 0.94) and sensitivity to change was good (SRM 0.98 (0.96 to 1.00) compared with DAS28 SRM 1.06 (1.01 to 1.11)).
Conclusion The RAID score is a patient-derived composite score assessing the seven most important domains of impact of RA. This score is now validated; sensitivity to change should be further examined in larger studies.
Statistics from Altmetric.com
Rheumatoid arthritis (RA) is traditionally assessed by physical examination, laboratory tests and radiographs, in keeping with a ‘biomedical model’, the dominant paradigm of 20th century medicine. However, there has been growing interest in the assessment of RA from the patient's perspective. Patient reported outcomes (PRO) have been found to be as informative as joint counts, radiographic and laboratory data for the assessment of baseline status, change during interventions, and are predictive of long-term outcomes.1,–,4 Furthermore, PRO bring additional information in the assessment of RA, as there is a discordance between the patient's and the physician's perspective.5 6 Current standard assessment of RA includes some dimensions or domains assessed by PRO, namely patient assessment of pain, functional disability and/or patient global assessment.7,–,9 However, current scores mainly include only these three PRO,7,–,13 and these domains are the only PRO usually reported,14 while other domains of health appear important from the patient's perspective, such as fatigue, wellbeing and sleep pattern.15,–,23
In this context, through the European League Against Rheumatism (EULAR), an international task force comprising 10 people with RA and 12 rheumatologists/health professionals elaborated a new composite response score for clinical trials in RA, based on the patients' perception of the impact of the disease on domains of health: the patient-derived rheumatoid arthritis impact of disease (RAID) score.23 This score is planned to be used in clinical trials as a measure of the impact of RA. Seven domains of health were chosen during the elaboration phase, and relative weights based on the patients' assessment of relative importance were obtained (table 1). The preliminary RAID included pain, functional capacity, fatigue, physical and emotional wellbeing, quality of sleep and coping. At the final stage of the elaboration study, these seven domains could be assessed by 12 different questions or questionnaires (table 1).
After the elaboration of the RAID, several questions remained: How would the score perform in terms of psychometric properties, would the choice of the seven domains be confirmed (ie, would each of these domains be sufficiently sensitive to change) and which questions or questionnaires should be chosen. To answer these questions, a large international study was conducted with the following three main objectives: (1) to assess the psychometric properties of the RAID, as defined by the outcome measures in rheumatoid arthritis clinical trials (OMERACT) filter;24 (2) to perform the final choice of domains; and (3) the final choice of questions or questionnaires (with the aim of bringing the number of questions or questionnaires down to one per domain).
Materials and methods
Overall organisation
First, a cross-sectional study with a longitudinal component for reliability and sensitivity to change was performed. This international observational study was conducted in 12 countries. All applicable regulations were respected, and the project was accepted by ethical committees in participating countries. The inclusion criteria and data collected are described below. After the validation study, a meeting was held with the investigators and the patients who had participated in the initial phases of the elaboration of the RAID, to discuss the results and take final decisions. At this meeting in April 2009, six physicians (LC, LG, TKK, EMM, TS, GAW), five persons with RA (GJA, MdeW, CH, MS, GvonK), one nurse (TH) and one psychologist (ME) were present. Therefore, final decisions regarding the RAID were both data driven and expert opinion driven, with important input from persons with RA.
Patients
Outpatients seen for RA in the rheumatology departments of the participating tertiary care centres (in Estonia, Germany, Finland, France, Greece, Italy, The Netherlands, Norway, Spain, Romania, Turkey and the UK) were included between March 2008 and July 2009. It was planned to include 600 patients (50 from each country). Selection criteria were: definite RA,25 ability to fill in a questionnaire and signed informed consent.
Psychometric properties
Psychometric properties were examined according to the OMERACT filter,24 which checks that a potential outcome measure is: (1) feasible; (2) truthful, ie, reflects what it is supposed to reflect (validity); and (3) discriminant, which includes reliability and sensitivity to change.
Assessment of validity (‘truth’) of the RAID
The patients filled in a questionnaire comprising the numeric rating scales (NRS) for each of the seven domains and additional questionnaires for five of the other domains (table 1): short form 36 (SF-36) bodily pain,26 the health assessment questionnaire (HAQ)27 with possibility to derive the shorter, modified HAQ (mHAQ),28 the medical outcome study subscale (MOS) sleep disturbance subscale29 30 and a coping questionnaire.31 The following other variables were also collected: demographic data (age, sex, symptom duration, work status), patient global assessment by visual analogue scale (VAS) and the SF-36.26 In parallel, demographic and disease variables were collected (rheumatoid factor and anticyclic citrullinated protein status, disease duration, structural severity, current treatment), and also joint counts and laboratory tests that allowed calculation of the disease activity score (DAS28 with erythrocyte sedimentation rate, ESR).32
Assessment of reliability of the RAID
Patients for whom RA treatment was not changed and who were considered in a stable state by the physician were included in the reliability arm of the study. For that purpose, they were assessed a second time 2–10 days after the baseline assessment. The objective was to include 60 patients, five per centre.
Assessment of sensitivity to change of the RAID
Patients who required a therapeutic change because of unacceptable clinical disease activity were included in the sensitivity to change arm of the study. The therapeutic change could be the initiation of synthetic or biological disease-modifying drug. Concomitant modifications of corticosteroid, steroids and/or non-steroidal symptom-modifying drugs were allowed. Patients were reassessed 10–14 weeks after the treatment change. It was planned to include 120 patients (10 per centre) in this part of the study.
Final choice of RAID domains
Although ‘coping’ was included in the initial construction of the proposed RAID,23 the group had concerns regarding a possible lack of sensitivity to change for the domain ‘coping’, ie, that an efficacious treatment-lowering global disease impact might not modify coping. Alternatively coping may represent an ability to manage the symptoms and effects of RA and so changes in coping might be reflected in the RAID score. In the first case, including the coping domain in the RAID would lower the sensitivity to change of the whole RAID score, while in the second it would improve sensitivity to change. As it was anticipated that coping might be deleted from the RAID, the relative importance of the domains after the exclusion of coping were obtained based on the patients' perspective, as explained in supplementary file 1 (available online only). Two separate weighting systems were thus obtained, one with coping (during the elaboration study)23 and one without coping (during the validation study described here).
Final choice of RAID questions or questionnaires
The final choice to bring down the number of questions or questionnaires from 12 to seven, ie, one per domain, was based on the comparisons of the psychometric properties of the different combinations and on expert opinion.
Statistical analyses
Because of the 12 different tools included in the preliminary RAID, there were many possible combinations in the RAID (24 possible combinations in the RAID with seven domains, and 12 with only six domains—excluding coping). Each of these combinations was assessed for psychometric properties. SAS version 9.1 was used for data management and statistical analyses.
Feasibility
Feasibility was assessed in the cross-sectional study using the percentage of missing data for each of the questions/questionnaires.
Truth
Internal consistency was evaluated in the cross-sectional study using Cronbach's α coefficient. A Cronbach's value greater than 0.7 is generally regarded as satisfactory.33 Construct validity was determined in the cross-sectional study by Spearman's correlation between the RAID combinations and other measures of disease activity/impact (patient global assessment VAS, SF-36 global scale: question 1, SF-36 summary values (physical, PCS and mental, MCS) and DAS28).
Reliability
Reliability was tested with the intraclass correlation coefficient (ICC) (two-way model, single measure) with a 95% CI. An ICC of more than 0.8 is usually considered to be indicative of excellent reliability.34 Agreement was evaluated by the Bland and Altman approach.35
Sensitivity to change
The standardised response mean (SRM), ie, the mean change between baseline and 3 months after the treatment change divided by the SD of the change, was calculated. An SRM greater than 0.8 is considered large. CI were calculated by boot-strap.
Relative weights of the domains
Mean and median weights for each domain were computed and linearly transformed to a 0–100 range, as explained in supplementary file 1, available online only.
Results
Patient characteristics
In total, 570 patients (79% women, mean±SD age 56±13 years, DAS28 4.1±1.6, HAQ 1.1±0.8) participated in the validation study (table 2). More than half of the patients were on biological disease-modifying antirheumatic drugs (53.8%) and 21.0% were in DAS28 remission (DAS28 <2.6). The 50 patients participating in the reliability study had as expected milder disease, and the 88 patients participating in the sensitivity to change study had more active disease (table 2). Reliability was assessed after a mean interval of 7.0±4.6 days (range 1–27); sensitivity to change was assessed after disease-modifying antirheumatic drugs (N=35) and/or biological (N=51) introduction and/or steroid intravenous pulse (N=5) after a mean interval of 94.7±40.3 days (range 26–273).
Psychometric properties
The psychometric properties of the final RAID, composed of seven NRS, and of some of the combinations assessed, are presented in tables 3 and 4. Supplementary figure 1A,B (available online only) shows two selected Bland and Altman plots. Detailed psychometric properties of the 36 possible combinations of the RAID are available from the first author.
Final choice of RAID domains
The comparisons of combinations of domains with versus without coping (tables 3 and 4) indicated that assessing coping as part of the RAID led to better psychometric properties, in particular better sensitivity to change. Furthermore, Cronbach's α was higher when assessing the seven domains (0.93) than when taking out coping (0.91). The group decided to continue to include coping within the RAID. This decision was based on the psychometric data (tables 3 and 4) and on the general consensus that coping/self-management is an important aspect of the impact of RA. Therefore, the final RAID comprises seven domains as shown in table 5. The relative weights given by patients to the domains were very similar in the elaboration study23 and in the present study (table 1).
Final choice of RAID questions or questionnaires
Choice of tool for pain
The NRS was chosen over the SF-36 pain questions, as a result of better psychometric properties (tables 3 and 4) and issues related to feasibility, simplicity and copyright.
Choice of tool for sleep
The psychometric properties of the MOS sleep questionnaire were quite similar to those of the NRS for sleep (tables 3 and 4), and correlation between both scores was substantial (Spearman's R=0.69, p<0.0001). After discussion it was decided to retain the sleep NRS in the RAID, mainly for feasibility issues, simplicity and copyright.
Choice of tool for coping
The psychometric properties of the coping 18-question questionnaire (analysed as one summary result) were not better than the coping NRS for correlation and reliability. However, sensitivity to change appeared higher for the questionnaire, but this was assessed on the 50 (of 88) patients without missing data; and for these 50 patients the sensitivity to change of the 7-NRS RAID was also higher (SRM 1.18). After discussion it was decided to retain the coping NRS in the RAID, mainly for feasibility issues as reflected by high rate of missing data with the longer coping questionnaire; furthermore, the coping questionnaire was not designed to be presented as one summary value (which is needed for integration into a composite score).
Choice of tool for functional disability
The psychometric properties of the NRS were similar to HAQ and mHAQ for correlation to other measures of disease activity as well as for reliability and sensitivity to change (tables 3 and 4). Furthermore, correlations between the HAQ and mHAQ versus the function NRS were strong (Spearman's R=0.72 and 0.75, respectively; both p<0.0001). Therefore, after discussion, and although the HAQ is widely used, it was decided to keep the NRS as the tool to assess function in the RAID. This decision was based on the similar psychometric properties and that all other domains in the final RAID were assessed by NRS.
Psychometric properties of the final RAID
The final validated RAID is presented table 5 with scoring rules, and its distribution in the population is presented in supplementary file 2, available online only; the validated translations are available as supplementary file 3, also available online only.
As shown in tables 3 and 4, the final RAID composed of seven NRS correlated strongly with patient global VAS (R=0.76) and significantly with other outcomes (DAS28 R=0.69, SF-36 physical −0.59 and mental −0.55, p<0.0001 for all). Reliability was very high (ICC 0.90; 95% CI 0.84 to 0.94) with mean scores of 3.8±2.2 and 3.6±1.9 at the first and second assessments. Sensitivity to change was also large (SRM 0.98; 95% CI 0.96 to 1.00) with mean scores of 5.8±1.8 and 3.9±1.9 at the first and second assessments. For comparative purposes, the reliability of DAS28–ESR and HAQ in the same population was, respectively, ICC 0.84 (95% CI 0.72 to 0.91) and ICC 0.96 (95% CI 0.93 to 0.98), and the SRM of the same scores were 1.06 (95% CI 1.01 to 1.11) and 0.91 (95% CI 0.89 to 0.93), respectively.
Correlations between the seven NRS comprised in the RAID were highest for pain versus function (Spearman's R=0.85, p<0.0001) and lowest for sleep versus function (Spearman's R=0.52, p<0.0001). An imputation rule was devised for missing data within the RAID (table 5) and the psychometric properties of the RAID with imputation of missing data was similar to those of the original RAID (data not shown).
Research agenda
The RAID is composed of seven NRS questions; however, the group felt that additional work will be needed to support the use of RAID further as a fully validated tool. The research agenda includes work on an alternative, longer format of the RAID in which the same domains could be assessed through more comprehensive scales, if their psychometric properties were shown to be at least as good as those of the NRS. In particular, the working group recommended further evaluation of the following domains: (1) Sleep: the wording of the NRS might be improved by adding issues of falling asleep and staying asleep (instead of ‘resting at night’). Other, alternative tools may also be tested, for example, the Athens sleep questionnaire.30 (2) Coping: more work is needed on the coping tool. It is suggested to work on the longer questionnaire,31 and in parallel to assess other tools such as, for example, the arthritis helplessness index, which comprises five questions.36 (3) Function: the group felt the need for further assessment of performance of the function NRS versus the HAQ/mHAQ, in particular in terms of sensitivity to change. We therefore suggest the inclusion of the function NRS on top of the HAQ in trials and studies, and comparative assessment of these tools. Furthermore, sensitivity to change of the final RAID needs to be assessed further in comparative intervention studies. Differential performances in subgroups of patients (eg, according to disease duration or disease severity) should be evaluated in further studies and different datasets (T Heiberg, unpublished observations). Importantly, much of the information captured by the RAID may already be captured by the RA core set, as indicated by the high correlations with these measures; therefore studies to characterise further the unique information, ie, that not conveyed by the core set, are needed.
RAID scoring and calculation rules
The RAID is calculated based on seven numerical rating scales (NRS) questions. Each NRS is assessed as a number between 0 and 10. The seven NRS correspond to pain, function, fatigue, sleep, emotional wellbeing, physical wellbeing and coping/self-efficacy (questions above).
1. Calculation
RAID final value = (pain NRS value (range 0–10) × 0.21) + (function NRS value (range 0–10) × 0.16) + (fatigue NRS value (range 0–10) × 0.15) + (physical wellbeing NRS value (range 0–10) × 0.12) + (sleep NRS value (range 0–10) × 0.12) + (emotional wellbeing NRS value (range 0–10) × 0.12) + (coping NRS value (range 0–10) × 0.12).
Thus, the range of the final RAID value is 0–10 where higher figures indicate worse status.
2. Missing data imputation
If one of the seven NRS values composing the RAID is missing, the imputation is as follows:
Calculate the mean value of the six other (non-missing) NRS (range 0–10)
Impute this value for the missing NRS
Then, calculate the RAID as explained above.
If two or more of the NRS are missing, the RAID is considered as missing value (no imputation).
Discussion
In this report, a patient-derived score to assess the impact of RA from the patient's perspective has been finalised and validated. The score includes seven domains and the domains of highest importance to patients are pain, functional disability and fatigue. The four other domains are emotional and physical wellbeing, sleep disturbance and coping/self-management. The questionnaire is very simple because it is composed of seven questions assessed by NRS.
The three domains found here to be of utmost importance to patients, pain, function and fatigue, are regularly reported as essential by people with RA,15,–,20 and are considered as core domains.7 37 Other domains reported in the literature as important include wellbeing, sleep disturbance, coping, social life, professional status (ability to work) and satisfaction with health care.15,–,22 38 39 In the present study, sleep, physical and emotional wellbeing and coping were also selected. The international classification of functioning, disability and health is a generally accepted framework to assess the bio-psycho-social model of disease; however, it was developed without patient input. It is interesting to note that the domains selected in the RAID were also selected in international classification of functioning, disability and health-based focus groups, except for wellbeing.20 21 40
An original technique was developed in the elaboration of the RAID, to obtain a relative assessment of importance for the different domains included in the RAID score.23 The relative weights correspond to the relative importance of the domains for the patients, and allow combination of the NRS values for each domain into a unique score. It is interesting to note that the relative weights of the domains were very similar in the elaboration study and in the present study in which the importance of coping was not assessed (table 1). The result given as one unique figure is the essential part of RAID when analysing or reporting the results of the RAID at the group level (eg, in trials). Although the RAID domains could be simply summed together (data not shown), we believe using the weights heightens the face validity of the RAID, as the weights reflect the importance of the domains, from the patient's perspective.
To our knowledge, there is only one other patient-reported questionnaire that includes more than three core set domains. In addition to function, pain and global, the multidimensional HAQ includes fatigue, morning stiffness, psychological dimensions and patient self-reported joint pain.41 Other interesting measures are utility measures such as SF-6D and EQ-5D; in another study, strong correlations have been found between RAID and these utility measures supporting that RAID is a measure of global health (T Heiberg, unpublished observations).
At the individual level, it may be important to analyse the different domains separately, as this is more informative than only a summary score value. In clinical practice, it may thus be worthwhile to assess the seven domains of the RAID through the seven NRS presented in table 5. The potential usefulness of the RAID in clinical practice, however, warrants further studies. A limitation of the RAID, in particular for clinical practice, is that although the questionnaire is simple (seven NRS), the scoring is quite complex (table 5); however, it is also possible not to calculate the combined score in clinical practice.
The choice of questions or questionnaires to assess each of the domains in the RAID was a challenge, and was partly based on a data-driven approach, partly after discussions in the working group, which included persons with RA. The choice of NRS (vs VAS) and the time-frame chosen (1 week) are discussed elsewhere.23 The assessment of coping was a particular challenge. First, the notion of coping is not as easy to understand as some other domains (eg, pain). Furthermore, coping was not previously usually reported in published qualitative studies as an essential issue for patients,15,–,22 although a recent study did find coping to be important.38 39 Therefore, the group felt more qualitative work on the notions of coping, self-management and helplessness was warranted. Second, the assessment of coping also presents difficulties, as many coping questionnaires are available, but there is no consensus on which tool is most appropriate in RA.42 In the end, we decided to keep coping in the final RAID, because there was a consensus that this domain was important for patients, had been selected in the elaboration study,23 and as the psychometric properties of the coping NRS appeared satisfactory. However, the group did conclude that more work was needed around coping, both regarding the concept and the instrument.
The assessment of physical functioning was also a challenge, as the HAQ is among the best-validated measures in RA. Nevertheless, due to some limitations in the HAQ,43 and because each of the other domains was assessed by one single NRS, the use of the HAQ might give function a ‘disproportionate weighting’ due to the disequilibrium in the number of questions. In this study, a single NRS question was found to have similar psychometric properties to the HAQ; Wolfe et al44 also used a single question for function in RA. Therefore, the group decided to assess function in the RAID using this single-question NRS; but we did conclude that further assessment of the function NRS versus the HAQ or mHAQ, was needed, and in particular with regard to sensitivity to change.
The findings from this study must be considered in the light of its limitations: first, participating subjects might not be representative of the entire spectrum of RA patients as they were mostly recruited from tertiary care centres. Indeed, a high percentage of patients were on biological drugs. Furthermore, the mean DAS was quite high, therefore the present conclusions may warrant further study in patients in remission (although close to 20% of the patients assessed here were in remission). Most of the patients in both the elaboration study23 and the current validation study had established disease, and it may be possible that selection and relative weights of domains may differ between patients with recent-onset and established RA. It is well known that people with chronic diseases adapt to their conditions, but a previous study did not indicate that prioritised areas for improvement change over time.45 However, the international nature of this study, with the inclusion of people with RA from 12 countries with different cultures and socioeconomic backgrounds, is a strength. Furthermore, the performance of the RAID has been confirmed in another large representative registry dataset (T Heiberg, C Austad, TK Kvien, et al, unpublished observations). Sensitivity to change is a key psychometric property of a measurement tool designed for clinical trials, but was only assessed on a limited number of patients. Therefore, we suggest further assessment of sensitivity to change of the RAID score and of other possible combinations of questions or questionnaires for assessment of the RAID domains. Another strength of this study was the central involvement of patients in the elaboration, validation and finalisation processes of the RAID, and the large number of patients (>1000) who participated in the various studies leading to its validation. However, much of the information captured by the RAID may already be captured by the RA core set. Further studies are needed to assess the redundancy of this tool compared with already-assessed measures in RA trials. The patients also took active part in the translation process of the RAID instrument;46 47 the RAID is available and validated, in 12 languages (see supplementary file 3, available online only).
Conclusion
In this study we propose a patient-derived weighted score to assess the impact of RA. We believe that the RAID score will be of value in clinical trials, in which its use, in addition to traditional clinical measures of disease activity, will capture information that is relevant for patients. Its ease of use will allow a better assessment of the patient's perspective. Capturing the patient perspective of important changes from pharmacological therapies has the potential to enhance decision-making in clinical practice and influence the research agenda. In further work, it may be useful to derive cut-offs for the RAID related to the patient acceptable symptom state and to minimal clinically important improvements. Other possible developments of the RAID could include assessment of its usefulness in clinical practice, and assessment of a patient-derived definition of ‘patient remission’ based on the RAID, which could be a substitute for the patients global part of the recently proposed American College of Rheumatology/EULAR remission criteria. Further assessment of sensitivity to change of the RAID is warranted, especially in intervention studies with a control group.
Acknowledgments
This project was convened by TKK, facilitated by LG, and has as a steering committee one other rheumatologist (MD), one person with RA (MdeW), one epidemiologist (LC) and one allied health professional (TH). The authors wish to thank the other patients who participated in the elaboration of the RAID for their input and support: A Celano (Italy), A Dudkin (Estonia), K Koutsogianni (Greece), F Nilgun Akca (Turkey), A M Petre (Romania) and P Richards (UK). Twelve countries were involved in the validation of the RAID: Estonia, Germany, Finland, France, Greece, Italy, The Netherlands, Norway, Spain, Romania, Turkey and the UK. The authors wish to acknowledge all personnel who participated in data collection and in particular, in Crete, Dr Herakles Kritikos and Eva Choustoulaki.
References
Supplementary materials
Web Only Data
Files in this Data Supplement:
Footnotes
-
Funding This project was supported financially by EULAR (grant CLI.013).
-
Competing interests None.
-
Patient consent Obtained.
-
Ethics approval This study was conducted with approval of the ethics committees in participating countries.
-
Provenance and peer review Not commissioned; externally peer reviewed.