Objective: Recommended outcome measures in osteoarthritis are standardised scales identical for each patient. As patient-specific scales are of increasing interest when considering patient priorities in outcome assessment, this study aims to validate individualised forms of the Western Ontario and McMaster Universities osteoarthritis index (WOMAC) function subscale.
Patients and Methods: WOMAC function subscale data were prospectively obtained from 1218 outpatients with hip or knee osteoarthritis requiring non-steroidal anti-inflammatory drugs. Patients also rated the importance to remove disability in each activity of the WOMAC function subscale, and selected the five activities they considered the most important to be improved upon. After treatment, patients again completed the WOMAC function subscale. Several individualisation methods were evaluated: methods whereby the score of each item is multiplied by, or added to, its importance, and methods based on the five most important activities (WOMAC top 5). Psychometric properties of individualised scales were compared to those of the WOMAC function subscale.
Results: The missing data rate was 11%, 13% and 2% for the WOMAC function, its individualised forms and the WOMAC top 5, respectively. Combining severity and importance of each item did not improve the properties of the scales. The WOMAC top 5 was the most responsive scale (standardised response mean: 0.96 vs 0.80, p<0.001).
Conclusion: Because of its better responsiveness, ease of use, low missing data rate and ability to highlight patient priorities, the WOMAC top 5 could be an interesting tool in therapeutic evaluation in hip or knee osteoarthritis.
Statistics from Altmetric.com
Patient-reported outcomes are of increasing interest in clinical practice and clinical research. In the setting of osteoarthritis (OA), a core set of outcome measures to be considered for phase III trials has been defined by the OMERACT (Outcome Measures in Rheumatology Clinical Trials) group; three domains should systematically be included: pain, physical function and patient global assessment.1 The function subscale of the Western Ontario and McMaster Universities osteoarthritis index (WOMAC) is a valid, reliable, and responsive measure of functional impairment in hip and knee OA.2–4 It is the most widely used condition-specific index, and the subscale is recommended to be included in all hip and knee OA trials.1 However, the WOMAC function subscale is a standardised instrument that involves a fixed number of identical items, all having the same weight on the final score. Although the mean score is a measure of global functional impairment at the group level, the measure does not consider the variability in the importance to patients to be able to perform a particular activity. For example, the ability to climb stairs should be of low importance in a patient who always takes a lift, or getting out of the bath should be of no importance for a patient who always takes showers.
Questionnaires that principally focus on each patient’s priorities, so-called patient-specific or individualised instruments, have been developed and/or used in rheumatology.5–12 These scales identify relevant issues at the individual level and allow the evaluation to focus on what is important to each patient. Some have shown better sensitivity to change than classical instruments.6 7 13 However, few patient-specific scales have been applied in hip or knee disorders7 14 and none in lower limb OA requiring a medical intervention. As patient perspectives are matters of increasing importance,15 16 this prospective study aimed to develop and validate individualised scales derived from the WOMAC function subscale by several methods of individualisation, assessing functional impairment in patients with hip or knee OA and comparing their psychometric properties to those of the WOMAC function subscale.
PATIENTS AND METHODS
Data were obtained from a prospective cohort study (duration 4 weeks) involving 1362 outpatients with hip (n = 343) or knee (n = 1019) OA as defined by the American College of Rheumatology (ACR).17 18 Between 12 April and 31 July 2002, patients were recruited by 399 French rheumatologists in private practice. Each rheumatologist was required to include three patients with knee OA and one with hip OA. To be included in the study, patients had to experience pain from the OA (⩾30 mm on a visual analogue scale (VAS) (0–100 mm)). Each patient gave informed consent. All patients initially visited the rheumatologist in charge of their case and inclusion could begin with the onset of a non-steroidal anti-inflammatory drug (NSAID) or with a switch from one NSAID to another. A final visit to the same rheumatologist was scheduled 4 weeks later.
At baseline visit, patients completed three self-administered questionnaires (fig 1):
The French-Canadian version of the WOMAC function subscale (5-point Likert version),2 also termed as “severity questionnaire” in this paper, which is a 17-item scale addressing the degree of difficulty in accomplishing 17 activities of daily life.
The “importance questionnaire”: patients had to rate how important it was to them to remove disability in each activity addressed by the WOMAC function items (from not important at all, to extremely important). Patients were randomly assigned to three groups according to different response modalities for rating that questionnaire: 5-point (1–5) Likert scale (Likert 5 group), 3-point (1–3) Likert scale (Likert 3 group), or VAS (0–100 mm) (VAS group).
The “preference questionnaire”: patients had to select the five items of the WOMAC function they considered the most important by answering to the following question: “Could you choose from the 17-item list, the 5 you consider the most important to be improved upon?”
Along with these measures, patients also assessed their pain and global disease activity on a VAS (0–100 mm). Additionally, practitioners assessed each patient’s global disease activity on a VAS (0–100 mm). At final visit, patients again completed the WOMAC function subscale.2
To assess the test–retest reliability, a subsample of 93 patients, all from the Likert 5 group, were asked to complete and return by mail the WOMAC function subscale and the “importance” and “preference” questionnaires again within 48 h, before initiating NSAID therapy.
Methods of individualisation
Several methods for individualisation were used:
Individualised scales based on the importance questionnaire rated with a 5- or 3-point Likert scale or VAS. These scales were derived from the WOMAC function subscale (17 items) and the WOMAC short-form subscale (8 items).19
With multiplicative methods: for each item the severity score was multiplied by the importance score,
With additive methods: for each item the severity score was added to the importance score.
The WOMAC top 5, based on the preference questionnaire, including the five most important items to each patient. Thus, the items of the WOMAC top 5 are not the same for all patients.
Because the psychometric performances of each scale were compared to those of the WOMAC function subscale, only patients who had completed this scale with no missing data at the baseline visit (n = 1218) were involved in the development of the individualised forms (fig 1). Each scale was linearly transformed to a 0–100 scale, with a score of 0 indicating no disability and 100 indicating maximum possible disability. For each item, Spearman rho correlation coefficients between severity score and importance score were obtained.
Psychometric properties of each scale were evaluated, and properties of individualised scales were compared to those of the WOMAC function subscale.
Construct validity was assessed by Spearman rho correlation coefficient between the scores of each scale and that of the WOMAC function subscale. We examined divergent validity by the use of Spearman correlation coefficients between scores of individualised scales and other measures applied in this study (pain, patient and practitioner global assessment of disease activity). Internal consistency was assessed when estimable (for fixed-item scales), by Cronbach alpha coefficient.20 Estimation of confidence intervals and comparisons of Cronbach alpha coefficients involved use of bootstrapping methods, with 1000 replications.21 Test–retest reliability was assessed with the intraclass correlation coefficient (ICC). ICC values vary from 0 (totally unreliable) to 1 (perfectly reproducible); an ICC⩾0.75 is regarded as excellent.22 ICC confidence intervals were estimated with bootstrapping methods, with 1000 replications.21 ICCs were compared with the likelihood ratio test.23 Responsiveness was assessed by the standardised response mean (SRM). SRM is the mean change in score between the baseline and the final visit divided by the standard deviation of the change in score.24 SRM values can be considered large (>0.8), moderate (0.5–0.8) or small (<0.5).25 26 SRM confidence interval estimations and SRM comparisons involved use of bootstrapping methods, with 1000 replications.21 Because final scores were calculated using the baseline importance questionnaire, SRM of scales using additive methods are arithmetically the same as the scale from which they are derived, so comparisons were not performed.
Statistical analyses were performed with the SAS (SAS Institute Inc., Cary, North Carolina, USA) V.9.1 and R (R Foundation, http://www.r-project.org) V.2.2.1 statistical software packages.
At baseline, the missing data rates were 11% (n = 144), 13% (n = 174) and 2% (n = 21) for the WOMAC function subscale, its individualised derived forms and the WOMAC top 5, respectively. Baseline characteristics of the 1218 patients involved in analyses are reported in table 1 and were similar to those of patients with incomplete WOMAC function subscale data (n = 144). Among these 144 patients, 1 patient did not complete the questionnaire, and 71 (49.3%), 53 (36.8%), 11 (7.6%) and 8 (5.7%) patients did not respond to 1, 2, 3 or more than 3 items. Baseline characteristics of patients in the Likert 5, Likert 3 and VAS groups did not differ.
Mean scores for each scale and mean changes in score over the 4-week period are reported in table 2. Neither the WOMAC function subscale nor the individualised scales had a substantial ceiling or floor effect. For each item, the severity and importance scores were significantly correlated for each group (rho ranging from 0.34 to 0.67) However, for each activity of the WOMAC function subscale, some patients with a low severity score rated the item as very important, whereas a few patients with a high severity score for a given activity rated the item as being of little importance, except for item 13 (“getting in/out of the bath”) and item 16 (“performing heavy domestic duties”) where some patients with high severity score rated the item as being of little importance (fig 2).
All individualised scales were highly convergent with the WOMAC function subscale (rho⩾0.75). Individualised scales involving all 17 items correlated more strongly with the WOMAC function subscale than shorter scales such as WOMAC top 5 or scales derived from the WOMAC short form. However, the additive scale using VAS for rating of importance was less correlated with the WOMAC function subscale than all other scales. Lower correlations (rho<0.5) were obtained between all scales, measuring functional status, and pain and global assessment of activity (table 3).
Cronbach alpha coefficients of individualised scales involving all 17 items did not significantly differ from that of the WOMAC function subscale and ranged from 0.91 to 0.94 (table 3). For scales involving eight items (derived from the WOMAC short form), Cronbach alpha coefficients were significantly lower, ranging from 0.82 to 0.86. For the WOMAC top 5, as the items involved in the scale were different for each patient, Cronbach alpha coefficients were not estimable.
Among the 93 patients for whom test–retest reliability was assessed, necessary data to compute the WOMAC function subscale, its individualised forms and the WOMAC top 5 were obtained for 71 (76%), 64 (69%) and 78 (83%) patients, respectively. The mean (SD) change in score between test and retest were of 4.76 (11.46), 4.24 (9.12), 4.24 (9.23) and 6.42 (16.48) for the WOMAC function subscale, its individualised form with multiplicative method and additive methods and the WOMAC top 5, respectively. The ICCs of individualised scales (long and short forms) did not significantly differ from that of the WOMAC function subscale (table 3) and were >0.75, except for the WOMAC top 5 (ICC = 0.58).
The SRM of the WOMAC function in the overall population was 0.80 (table 3). The SRM of the WOMAC function subscale was 0.78, 0.86 and 0.77 for the Likert 5, Likert 3 and VAS groups, respectively. The SRM of the 17-item individualised forms with multiplicative methods did not significantly differ from that of the WOMAC function subscale, except for scale using VAS for rating importance (0.85 vs 0.77, p = 0.01). The WOMAC short form showed better responsiveness than the WOMAC function subscale (0.84 vs 0.80, p = 0.002). However, the best sensitivity to change was obtained with the WOMAC top 5 (0.96 vs 0.80, p<0.001).
Where there were less than three items missing, WOMAC function scores were computed with imputation of missing data, with the average value of the subscale, as recommended in the WOMAC user guide.27 Analyses performed in this population of 1353 patients gave similar results as those obtained with data from 1218 patients, leading to the same conclusions.
This study aimed to validate individualised measures of function impairment in hip or knee OA developed from the patient’s perspective by highlighting each patient’s priorities about functional improvement. All scales were derived from the WOMAC function subscale. Adding the measure of “importance” to the measure of “severity” provided complementary information. However scales combining the severity and importance of each item did not have better properties than the WOMAC function subscale, except with VAS used for rating of importance. The WOMAC top 5 was the best scale in terms of responsiveness and missing data rate.
Psychometric properties of the WOMAC function subscale found in this study are in accordance with those previously described and validated.3 4 28–31 This study enrolled a large sample of patients with a wide range of severity of disease, which might allow the conclusions to be applied to a wide spectrum of patients with hip or knee OA.
Because no consensus exists concerning the best way to individualise functional status instruments, we have evaluated two of the main possible individualisation methods. In the first method, prespecified items, identical for each patient, were preserved in the final score; individualisation involved combining for each item rating of “severity” and “importance”.7 In the second method, individualisation was based on a selection process, patients were asked to specify or choose a limited (or not) number of areas they considered the most in need of improvement.6 9 32 Then, patients were followed in terms of only these selected items. Moreover, both methods could be combined, for example, by rating the importance of the selected items.
In this study, the WOMAC top 5 had a low rate of missing data and a good responsiveness. In addition, this scale is probably the most patient-specific scale because the selection process ensures inclusion of only clinically relevant activities to each patient. Furthermore, a SRM of 0.96 is large in a study involving pharmacological therapy intervention (NSAIDs); greater values of responsiveness had been found mainly when the intervention was a surgical joint replacement and when response was integrated over multiple observed time points.4 33 The WOMAC top 5 displayed fair reliability. However, the ICC exceeded 0.5, which is the minimal necessary precondition for appropriate application of a change score and evaluation of responsiveness.34 One hypothesis to explain the smaller ICC observed for the WOMAC top 5 could be due to the within-patient change during the first 48 h. However this point is unlikely to explain entirely the phenomenon, as shown in the analysis of the variation in score in these patients during this period that was moderate (mean change of 6.42 for the top5 versus 4.24 to 4.76 for other scales) and in the graphical analysis of Bland–Altman plots that did not reveal major systematic change in score (data not shown). Guyatt et al distinguished two kinds of measurement instruments: discriminative instruments, which measure the difference between subjects, and evaluative instruments, which measure change over time and treatment effects.35 36 In the setting of OA, outcome measures such as the WOMAC function subscale or its individualised forms, are primarily assumed to be efficient for evaluative purposes. The key issue in developing evaluative instruments is to improve their responsiveness, to allow reducing the sample sizes when designing clinical trials or developing tools for more precise treatment comparison. From this assumption, the most valuable scale is the WOMAC top 5.
Shorter tools such as WOMAC top 5 and WOMAC short form could be of interest in terms of feasibility (ease of use, time for completion, lower missing data rate) but also relevance of content. In fact, some activities addressed in the WOMAC function subscale (4/17) are not performed by 5% to 30% of patients in their daily life37 and thus are not relevant to them. Among these four items, three have been excluded in the WOMAC short form.19 These items generated more missing data than the others and, if numerous, generated higher missing data rate for the overall score. The WOMAC top 5 reduces these two drawbacks by including only relevant activities for each patient. These findings support the fact that in case report forms, shorter scales could result in better data quality.
The importance of improving ability to perform an activity was closely related to level of disability in that particular activity (fig 2). For each activity, when rating severity, patients may also take into account the importance of being able to perform it, not just their ability to perform it. These findings could explain why most of scales developed with methods combining, for each item, severity and importance showed psychometric performances not significantly different from those of the scale from which they were derived (long or short form). However, data showed that overlap between the rating of severity and importance was incomplete (fig 2). Thus, in clinical practice, assessment of importance for improvement in each area of function may provide complementary information to assessment of functional status alone. Moreover, such a patient-centred approach might reinforce the patient–physician relationship in clinical practice.38
We evaluated the impact of the response modalities (Likert 5, Likert 3 and VAS) used for rating importance. In most studies comparing values of VAS or multiresponse-mode scales,2 33 39–41 both scales had similar results. Some authors encourage the use of the Likert scale41–43 because of its ease of use and interpretation. Others encourage the use of the VAS26 because of its better precision and sensitivity to change.44 45 We found that scales involving VAS were more sensitive to change. However, the increase in responsiveness was small compared to the complexity addressed by this method in data management.
The use of individualised versions of the WOMAC function subscale, such as we developed here, enables highlighting of patient concerns not only in therapeutic evaluation but also in identification of priorities for improvement in clinical practice. Nevertheless, in our study, the use of a pre-established list of items did not offer patients the opportunity to provide supplemental items of relevance to them, which might be a limitation to our individualisation methods, but is of more practical use particularly in case-reported forms in the setting of clinical trials. These scales allow for determination of whether each patient’s priorities, in terms of functional improvement, are attained, the priorities having been established before treatment initiation. However, a patient’s priorities may change over time, due to response shift or, for instance, with improvement, deterioration or change in physical environment. In this study, we did not investigate these changes. Another potentially important question is to determine the optimal number of items to increase responsiveness with preservation of measure precision. This has not been investigated in this study where patients had to select exactly five items, but this could be an interesting topic for future research.
Individualisation by combining severity and importance for each WOMAC function item did not improve scale psychometric performances, but provided complementary information on patient priorities that could be relevant in clinical practice. Among all scales, because of its better sensitivity to change, ease of use, lower rate of missing data and better reflection of patient concerns, the WOMAC top 5 could be an interesting tool in therapeutic evaluation and decision making for patients with hip or knee OA. It could be easily included in OA trials and used in addition to the WOMAC function subscale by adding only the preference questionnaire. The WOMAC top 5 requires further validation in independent samples of subjects from the target population.
We thank all the rheumatologists who recruited patients for this study.
Funding: This study was supported by a by an unrestricted grant from Merck, Sharp and Dohme Chibret Laboratories, France.
Competing interests: None declared.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.