Background The SF-36® 10 items Physical Functioning domain (PF10) and the Health Assessment Questionnaire (HAQ) are widely used measurements to evaluate rheumatoid arthritis (RA) treatments. However, there are competing theories about the dimensionality in the overall latent Physical Functioning (PF) construct when combining PF10 and HAQ items.
Objectives To explore the dimensionality of the PF construct when PF10 and HAQ items are combined into a single instrument and then evaluate the psychometric properties of the combined instrument.
Methods Item Response Theory (IRT) was used to analyze data from a phase 3, randomized, placebo-controlled study in patients with RA. Dimensionality of the combined instrument was explored by factor analyses (exploratory and confirmatory). A sequence of unidimensional Rasch  and generalized partial credit (GPC)[1,2] models were fitted to address the multi-factor/dimensionality issue identified through factor analyses. Model comparisons and diagnoses were performed to assess the adequacy of model fitting and to detect any violation of the underlying assumptions. In addition to building IRT models for individual factor, another IRT model that includes all 30 items was built to derive a total score. Pearson correlation was used to assess the validity of the IRT-based health status scores, whereas analysis of covariance was used to assess sensitivity of response change (defined by the American College of Rheumatology [ACR] 20% improvement criteria ).
Results The factor analysis suggested there were three underlying factors in the combined PF10 and HAQ. These three factors can be best characterized as hygiene (including 1 item from PF10 and 4 from HAQ), lower body activity (including 6 items from PF10 and 8 from HAQ), and upper body activity (including 3 items from PF10 and 7 from HAQ). In all models considered, the GPC model significantly outperformed the Rasch model counterpart (all p-value<0.0001), implying divergent discriminability among items. In addition to the factor analysis-based IRT scales, IRT-based scales derived from all the 30 combined items also show strong convergent validity (all p-value<0.0001) and strong sensitivity (all p-values<0.0001) when comparing across ACR groups.
Conclusions The pooled PF10 and HAQ represent multiple domains of physical functioning. The superior performance of the GPC model over Rasch model suggested divergent discriminability among items. Strong convergent validity and sensitivity were demonstrated for all IRT-based scales.
Edelen, MO and Reeve, BB (2007) Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual. Life Res. 16 Suppl 1: 5-18.
Li, Y and Baser, R (2012) Using R and WinBUGS to fit a generalized partial credit model for developing and evaluating patient-reported outcomes assessments. Statist. Med. 31: 2010-2026
Felson, D and American College of Rheumatology Committee to Reevaluate Improvement Criteria (2007) A proposed revision to the ACR20: the hybrid measure of American College of Rheumatology response. Arthritis Rheum 57: 193-202
Disclosure of Interest : None declared