Background The HAQ disability index is the most widely used measure of function in rheumatological conditions. Translated versions of thr original HAQ and several revisions are increasingly used to make comparisons between countries in epidemiological studies. Such comparisons assume cross-national measurement equivalence. That is, total scores are not biased due to responses to individual items being influenced by subtle differences in interpretation across languages or cultures. Despite its wide use, this measurement property has not been tested across a wide range of countries.
Objectives To examine the cross-national measurement equivalence of four versions of the HAQ: modified HAQ (MHAQ), multidimensional HAQ (MDHAQ), functional HAQ (FNHAQ) and HAQ-II.
Methods Data are from the QUEST-RA cross-sectional survey of 10,150 patients with RA from 34 countries (74% female, mean age 55 years). Respondents completed an item pool of 33 items, excluding devices, relating to the original HAQ and four revisions. Item response theory (IRT) models were estimated. Likelihood ratio tests compared models for each HAQ version fixing IRT parameters to be the same across all countries (equivalent) or estimating separate parameters for each country (non-equivalent). Further analysis with Bayesian multiple-group IRT models identified items with non-equivalent IRT parameters for each country using an alignment method.
Results All versions of the HAQ exhibited good psychometric properties with a single latent disability variable explaining around two-thirds of the variance in item responses. Compared to other versions, the HAQ-II exhibited greater precision (i.e. reliability) in differentiating between people with low levels of disability. For each version of the HAQ the non-equivalent model provided a significantly better fit compared to the equivalent model (all p<0.001), indicating comparisons of scores across countries are biased. Further analysis indicated this non-equivalence across countries was generally due to differences in the level of disability required to respond positively to an item, rather than variation in an items ability to discriminate level of disability. Examining the aligned means for each country using the limits of agreement method (Figure) indicated that the overall impact of non-equivalence was likely to be minimal for most countries and that scores for the MHAQ were least affected. Scores for Russia, Serbia and Morocco were overestimated, and Egypt underestimated.
Conclusions All versions of the HAQ have good psychometric properties. Due to differences in item interpretation across translations, caution should be taken when drawing inferences about disability levels across countries. For most countries such inferences are likely to be valid due to the relatively low level of bias introduced by individual items. Although measurement properties at the individual level are more favourable for the HAQ-II, it is more prone to bias in cross-national comparisons where the MHAQ may be favoured.
Disclosure of Interest None declared