BACKGROUND Physical disability is part of the end point measures in rheumatoid arthritis clinical trials. The Stanford Health Assessment Questionnaire Disability Index (HAQ DI) is often used for this purpose but lacks international uniformity owing to variations in the translated and adapted questionnaires and variations in its calculation. To study the consequences of these variations the previous Dutch HAQ (HAQ90) was revised, resulting in a new Dutch HAQ (HAQ99).
OBJECTIVE To compare DI scores from the two versions, and to study the consequences of applying different calculation methods for the DI score.
METHODS 78 patients completed both the HAQ99 and the HAQ90. To compare the use of different category score calculation methods a post hoc analysis on prospectively collected data obtained in clinical trials was performed.
RESULTS No statistically significant differences were observed between the DI scores of the HAQ90 and the HAQ99 using the alternative method (that is, without correcting for aid and devices). However, correcting for the use of aid or devices or not did result in statistically significant different DI scores. The systematic shift when using the maximum or mean item score for calculation of the category score resulted in non-comparable absolute DI scores.
CONCLUSION The use of HAQ DI questionnaires with different numbers of items and/or categories does not hinder international comparability, except when these variations interfere with the calculation method of the DI (as in the case of questionnaires without a section correcting for devices). For the sake of international uniformity the HAQ or any validated translation should be used and calculated in a standard way, including correcting for the use of aid and devices, and taking the maximum within each category as the category score.
- Health Assessment Questionnaire
- disability index
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
In the past two decades the need for standardisation of measurement procedures in rheumatoid arthritis clinical trials has been recognised. As a consequence a worldwide consensus about a “core set of end point measures in rheumatoid arthritis (RA) clinical trials” was established.1 Physical disability, as measured by self reported questionnaires, is part of this core set.
The most frequently used worldwide questionnaire to measure physical disability is the Stanford Health Assessment Questionnaire Disability Index (HAQ DI).2 This questionnaire has been validated and shown to be reproducible. Since its publication in 1980 the Stanford HAQ,3 which consists of several sections with questions, for example, about functional disability, pain, drug side effects, and economic aspects, has been modified several times. The disability index section (HAQ DI), which measures functional disability is the only standardised section and has remained the same since 1982. It contains 20 questions in eight categories, and includes a section about aid from other people or the use of devices to correct, if necessary, the answers given to the 20 questions. Since the introduction of this questionnaire there have been numerous translations and modifications, resulting in a wide range of versions.4 These versions differ in number of items, number of categories, number of items in each category, and the presence or absence of a section correcting for the use of devices. Table 1 shows these differences, which may occur even within countries.
The frequently used modified HAQ (MHAQ) DI with eight items is based on a different (transition) question than the one used in the original HAQ DI, and provides three instead of four alternatives to choose from (less difficult, equally difficult, or more difficult than before).5 ,6
In addition to the existence of different HAQ DI versions, the way they are calculated has not been consistent. According to the manual, the DI score can be calculated using either a so-called standard method or an alternative method. In the standard method the eight category scores are corrected using the section on aid and devices at the bottom of the questionnaire. Whenever aid by others or the use of devices is required to perform a certain activity, the corresponding category score (range 0–3, 0 = best, 3 = worst) is increased to 2 when that score was 0 or 1. In the alternative method no correction for aid and devices is made.
In both standard and alternative methods the DI should be calculated by taking the maximum item score within each category as the category score, and then calculating the mean of the eight category scores. Others prefer taking the mean within a category, and then calculating the mean of the category scores,7 or just calculating the mean of the 20 items. In most papers, however, the methods section does not reveal at all which HAQ DI was used and which method was applied to the DI score.
As a result, in practice, the HAQ appears to lack international uniformity owing to variations in the translated and adapted questionnaire itself as well as variations in calculation of the DI.
To obtain an instrument to study possible consequences of using different HAQ versions and different calculation methods the previous Dutch HAQ (HAQ90) was revised, resulting in a new Dutch HAQ (HAQ99). The HAQ99 is based on an accurate translation of the Stanford HAQ DI and does not have any modifications. This in contrast with both other HAQ DI versions that are used in the Netherlands, one of which is the validated HAQ90.8 This HAQ90 differs from the Stanford HAQ DI in a number of ways: (a) it contains 23 questions; (b) it has nine categories; and (c) it has no section on need for aid and/or devices, but in answering the 23 questions, the third of four alternatives is “yes, but with a need for devices or aid from others”. This was done because in some cases it was unclear which category should be corrected when a patient mentioned the use of a specific device. Finally, the number of items within each category differs in comparison with the Stanford HAQ DI. This is partly because when the HAQ90 was developed, questions were added especially for the Dutch situation.
This study aimed at comparing DI scores obtained from the HAQ99 with those resulting from the HAQ90, and studying the consequences of applying different calculation methods for the DI. Furthermore, patients' problems when filling in the HAQ DI were listed.
REVISION OF THE DUTCH HEALTH ASSESSMENT QUESTIONNAIRE
Two rheumatology researchers translated the Stanford HAQ DI. Both translations were then back translated from Dutch into English by one native English speaker and one English teacher, independently of each other and without consulting the other translators. The resulting four back translations, two translations, and the original Stanford HAQ DI were then compared by all participants. Any differences in the versions were discussed and a consensus was reached about the alternative that best matched the original Stanford HAQ DI. This resulted in the HAQ99, consisting of 20 questions in eight categories and including a section on aid/devices.
COMPARISON OF THE STANFORD HAQ AND DUTCH HAQ99 RESULTS
Nine bilingual patients with RA (that is, patients who watched BBC regularly) were asked to complete the HAQ99 at the outpatient clinic and the English Stanford HAQ DI at home. DI scores of both questionnaires were calculated in both alternative (without correcting for the use of aid or devices) and standard (including this correction) ways and compared using a paired t test.
COMPARISON OF THE HAQ90 AND HAQ99
During three weeks 92 consecutive patients with RA were asked to complete the HAQ99 directly after consulting their rheumatologist at the outpatient clinic, and to complete the HAQ90 within a week at home as well. Only the results of patients who filled in both questionnaires were analysed. DI scores were calculated according to the manual—that is, taking the maximum item score within each category as the category score. Firstly, the results without correcting for devices (that is, the alternative method) of both HAQ versions were compared using a paired t test. Then the standard method (that is, including correction for use of devices or aid from others) was used to calculate the DI score of the HAQ99. This score was also compared with the HAQ90 score using a pairedt test; because the HAQ90 lacks a section on aid/devices, correcting for use of aid/devices was not possible for this version.
Furthermore, the DI scores obtained from the HAQ99 after correction and without correction for aid/devices (standard and alternative methods, respectively) were compared.
INVESTIGATION OF DIFFICULTIES ENCOUNTERED WHEN FILLING IN THE QUESTIONNAIRE
After completing the HAQ99 at the outpatient clinic, 62 consecutive patients were interviewed and asked if they had any problems when answering the items of the questionnaire. After this, any complaints or remarks were categorised in main problem fields and frequencies were calculated.
CONSEQUENCES OF DIFFERENT CALCULATION METHODS
The consequences of different calculation methods were studied by a post hoc analysis on prospectively collected data of completed HAQ90s of two different patient groups. The groups included patients with refractory RA in a clinical trial with an intravenous anti-tumour necrosis factor α drug (group A, n=39), and patients with early RA participating in another clinical trial in which sulfasalazine and methotrexate were evaluated (group B, n=103). In each group the DI score was calculated using the maximum score within a category as the category score (that is, according to the Stanford HAQ manual, maxHAQ), as well as using the mean score within a category as the category score (meanHAQ). In both calculation methods the DI score was obtained by taking the mean of the eight category scores and using the alternative method—that is, without correction for the use of aid or devices. DI scores were calculated at two different sequential time points in group A (0 and 20 weeks) and in group B (0 and 24 weeks). Differences between the DI scores measured at the two time points were statistically analysed using paired t tests. Furthermore, in both groups the absolute changes of DI scores over the 20 week period, calculated using the maxHAQ calculation method, were compared with the absolute DI score changes calculated with the meanHAQ method using Wilcoxon signed rank tests. The same was done for relative changes of DI scores.
COMPARISON OF THE STANFORD HAQ AND DUTCH HAQ99 RESULTS
All nine patients who were asked filled in both the HAQ99 and the English Stanford HAQ DI. Of these patients (mean age 47, seven female, two male) all were rheumatoid factor (RF) positive and seven (78%) had an erosive arthritis. The mean erythrocyte sedimentation rate (ESR) of this group was 13 mm/1st h. There were no significant differences between the two versions in both ways of calculation (standard as well as alternative, p=0.80 and p=0.88, respectively). Table 2 summarises these findings.
COMPARISON OF THE HAQ90 AND HAQ99
Both the HAQ90 and the HAQ99 were completed by 78/92 (85%) patients. Of these patients (mean age 61.5, 48 female, 30 male), 58 (74%) were RF positive and 35 (45%) had an erosive arthritis. The mean ESR for this group was 17 mm/1st h. When the alternative method was used (that is, without correcting for devices) the mean DI scores were almost equal: 0.96 (HAQ90) and 0.92 (HAQ99) (p=0.36). The HAQ99 using the standard method (that is, corrected for the use of devices or aid by other people) had a mean DI score of 1.14, which was significantly higher than the DI score of the HAQ90 (p<0.001) using the alternative method (table 3). Also, within the HAQ99 there was a clear difference between the two DI scores (using standard and alternative methods) (table 3).
INVESTIGATION OF ENCOUNTERED DIFFICULTIES WHEN FILLING IN THE QUESTIONNAIRE
Table 4 presents the six main problem fields reported by patients when completing the questionnaire.
CONSEQUENCES OF DIFFERENT CALCULATION METHODS
Table 5 summarises the results obtained in the post hoc analysis on prospectively collected data of completed HAQ90s, comparing the two different calculation methods for the DI score. In both patient groups a significant decrease of DI scores at the final time point was found for both calculation methods. Furthermore, in the patients with refractory RA (group A) absolute changes of DI scores over the 20 week period, calculated by using the maxHAQ as compared with the meanHAQ calculation method, did not differ significantly (p=0.94). In the patients with early RA (group B), however, absolute DI score changes, measured by the two different calculation methods, were significantly different (p<0.01). For the relative DI score changes no significant differences were seen in either patient group (p=0.23 and p=0.07, respectively).
Furthermore, of particular interest was the observation that in the patients with refractory RA (group A) the change of DI scores calculated using the mean HAQ instead of the maxHAQ calculation method was about equal to the effect of the therapeutic intervention (table5).
Since 1982 the HAQ has been widely used to measure functional disability in patients with RA. Patient-assessed function, which is often measured with the HAQ, is part of the American College of Rheumatology (ACR) response criteria set and is frequently used in clinical trials. Partly because of the international character of clinical trials, a large number of translations and local adaptations of the HAQ have been developed and validated. This has led to the use of many different versions of the questionnaire even within countries. Also, within our country several different versions of the HAQ DI, which have different numbers of questions and different ways of correcting for use of devices,8 ,9 are being used. To improve uniformity and to obtain an instrument to study possible consequences of these differences we revised the validated HAQ90,8 producing the HAQ99. In this study it was shown that the exact number of questions (20 or 23) and/or categories (eight or nine) in the HAQ DI used has no major influence on the DI score when calculated according to the manual. For international uniformity, however, and to facilitate comparison of results, it is better to strive for maximum similarity to the original Stanford HAQ DI of all local versions. In contrast with the number of questions and/or categories, correcting for the use of devices or not (that is, applying the standard or alternative method) leads to significantly different absolute DI scores. This is of particular interest because case report forms used in clinical trials often do not contain this section, probably to facilitate data entry. Thus to enable exchange of international results, it should be clear which method (standard or alternative) is used. It would be even more useful to reach an international consensus about whether the section on aid and devices should be included or not.
The commonly used MHAQ differs apart from a different number of questions (eight items by taking one of each category) from the Stanford HAQ DI in asking a different kind of (transition) question and offering three instead of four alternative responses to this question. Therefore it was not investigated in this study. Next to patient satisfaction the MHAQ measures transitions in functional disability from visit to visit in clinical trials instead of absolute scores of functional disability at any time point. The rationale behind this modification is understandable and the MHAQ has been reported to be a better instrument than the original Stanford HAQ DI for measuring changes in disability during clinical trials.6 But one should realise that the MHAQ cannot be used as part of the ACR response criteria.10 In addition, theoretically, by reducing the number of questions, the MHAQ may fail to detect clinically relevant changes in activities of daily living in patients with relatively little impairment. When these major differences are taken into account the MHAQ does not contribute to international uniformity.
Another aspect which lacks international uniformity is the way in which the eight category scores are calculated. According to the manual the DI score should be calculated by taking the mean of the category scores, with the category score being the maximum score of the questions in that category (here called maxHAQ). However, a method in which the mean score within a category is taken as the category score is also being used (sometimes referred to as the alternative HAQ method7 (here named the meanHAQ). In practice, even a third alternative is sometimes used by taking the mean of all the questions together as the DI. In the post hoc analysis on prospectively collected data the chosen method (maxHAQ/meanHAQ) for calculation of the category scores had no affect on monitoring the clinical course, but as was expected resulted in different absolute DI scores. In our dataset the effect of applying the maxHAQ compared with the meanHAQ calculation method was equal to the effect of the therapeutic intervention in the patients with refractory RA (table 5). Thus calculating the category score in one way rather than another does not affect monitoring the course of functional disability within a group, but because one method leads to higher absolute DI scores than the other method this undermines the possibilities of worldwide between-group comparison of measured DI scores; this was also noted by Ramey et al.4
Theoretically, differences in ACR20 response rates could occur when the HAQ is used as a response criterion, because different category score calculation methods (mean versus maximum) and DI score calculation methods (standard versus alternative) have been shown to lead to different absolute DI scores. In this study, however, in contrast with the observed absolute DI score changes, the relative DI score changes measured using the max HAQ and meanHAQ calculation method did not differ significantly. As the ACR20 response rate is a relative change parameter (20% improvement from baseline score) in this study ACR20 responses measured as patients' assessed function did not differ significantly between the calculation methods. However, further investigations are needed to exclude possible consequences of applying different DI calculation methods for ACR response rates.
From the patients' point of view some general problems appear to remain when using the HAQ. Although an investigation of the difficulties encountered by patients showed that about 40% of them had no difficulties at all, a few problems were noted frequently by the remaining patients. One in five patients has difficulty in choosing the correct answer because of day to day variations in disability. In the interviews most of them indicated that they choose a “mean answer” rather than a “bad answer” on bad days. In fact to choose between “some difficulty” and “much difficulty” was noted as a problem on its own in 15% of the patients. About 8% of the patients have difficulties when filling in the section on aid and devices, and an almost equal number of patients mention that there is an item with two questions in one. Because most of these problems can be solved by a clear, uniform instruction to the patient, the instructions given are of special importance to get accurate data from the HAQ. The Stanford Institute also provides a manual for the investigator. We recommend this manual to establish a standardised patient instruction method.
Taking all this together the use of HAQ DI questionnaires with different number of items and/or categories does not appear to hinder international comparability, except when these variations interfere with the calculation method of the DI (as in the case of questionnaires without a section correcting for devices). For the sake of international uniformity we suggest that the HAQ or any validated translation should be used and calculated in a standard way, including correcting for the use of aid and devices and taking the maximum within each category as the category score.
We thank A Haagmans, an english teacher, and P Donnelly, a native english speaker, for their help in back translating the translated HAQ. We also thank A den Broeder and C Haagsma for providing data of clinical trials.