Article Text

Extended report
Development of a patient-reported outcome measure of tophus burden: the Tophus Impact Questionnaire (TIQ-20)
  1. Opetaia Aati1,
  2. William J Taylor2,
  3. Richard J Siegert3,
  4. Anne Horne1,
  5. Meaghan E House1,
  6. Paul Tan1,
  7. Jill Drake4,
  8. Lisa K Stamp4,
  9. Nicola Dalbeth1
  1. 1Department of Medicine, University of Auckland, Auckland, New Zealand
  2. 2Department of Medicine, University of Otago, Wellington, New Zealand
  3. 3Department of Psychology, Auckland University of Technology, Auckland, New Zealand
  4. 4Department of Medicine, University of Otago, Christchurch, New Zealand
  1. Correspondence to Dr Nicola Dalbeth, Bone and Joint Research Group, Department of Medicine, Faculty of Medical and Health Sciences, University of Auckland, 85 Park Rd, Grafton, Auckland 1023, New Zealand; n.dalbeth{at}


Background Tophus burden is currently measured using physical examination and imaging methods. The aim of this study was to develop a patient-reported outcome (PRO) tool to assess tophus burden in people with gout.

Methods The responses from interviews with 25 people with tophaceous gout were used to generate items for a preliminary PRO tool. Following cognitive testing of each item, a preliminary 34-item questionnaire was administered to 103 people with tophaceous gout. Rasch analysis generated a 20-item Tophus Impact Questionnaire (TIQ-20). Test-retest reproducibility and construct validity of the TIQ-20 were assessed.

Results The TIQ-20 responses fit the Rasch model and demonstrated unidimensionality, adequate precision, absence of differential item functioning and adequate person separation index. The TIQ-20 included items related to pain, activity limitation, footwear modification, participation, psychological impact and healthcare use due to tophi. In the 103 patients with tophaceous gout, floor effects were observed in 4.9% and ceiling effects in 1%. The TIQ-20 test-retest intraclass correlation coefficient was 0.76 (95% CI 0.61 to 0.85). All predicted correlations for construct validity testing were observed, including weak correlation with serum urate concentrations (r<0.30), moderate correlation with subcutaneous tophus count and dual energy CT urate volume (r=0.30–0.50), and stronger correlation with Health Assessment Questionnaire scores (r>0.50).

Conclusions We have developed a tophus-specific PRO in patients with tophaceous gout. The TIQ-20 demonstrates acceptable psychometric properties. Initial results show internal, face and construct validity, reproducibility and feasibility. Further research is required to determine responsiveness to change.

  • Gout
  • Outcomes Research
  • Patient Perspective
View Full Text

Statistics from

The tophus represents a chronic foreign body granulomatous response to monosodium urate (MSU) crystals.1 Tophi typically develop in people with longstanding gout, following prolonged periods of hyperuricaemia,2 although early presentation of tophaceous disease occasionally occurs.3 Tophi have been associated with activity limitation, joint damage and mortality in people with gout.4–6 At present, tophus burden is assessed in gout studies using physical examination and advanced imaging methods.7 Although these techniques measure the physical size, location and volume of tophus burden, they do not quantify the impact and experience of tophi for patients. While both generic and gout-specific patient-reported outcome (PRO) measures have been used in studies of gout,8 ,9 there is currently no tophus-specific PRO tool. Participants at a recent Outcomes Measures in Rheumatology (OMERACT) meeting strongly endorsed the development of a tophus-specific PRO for use in studies of gout.10 The aim of this study was to develop a PRO tool for assessment of tophus burden in people with gout.

Patents and methods

All aspects of this study received ethical approval from the Northern X Ethics Committee (NTX/08/06/050). All participants in the study provided written informed consent. All patients were aged over 18 years, were able to complete written forms in English, were confirmed to have gout according to the 1977 American Rheumatism Association preliminary classification criteria11 and had at least one subcutaneous tophus.

Generation of the items for the preliminary questionnaire

Items were generated using the methodological framework for item selection proposed by Patient Reported Outcomes Measurement Information System.12 The responses from interviews with 25 people with tophaceous gout from Auckland, New Zealand, were used to generate items for a preliminary questionnaire. This work is described in a separate publication.13 Briefly, 25 people with tophaceous gout (88% men, median age 66 years, median gout disease duration 26 years, median number of tophi 10, median serum urate 0.39 mmol/L) participated in semistructured recorded interviews that explored their experiences and perceptions of tophi. The specific aim of this qualitative study was to identify items for inclusion in the preliminary questionnaire. Three major inter-related themes arose from the interviews. The first theme was functional impact affecting body structures and functions (causing pain, restricted joint range of motion and deformity, and complications). This theme also encompassed activity limitation and participation restriction (affecting day-to-day activities, leisure activities, employment participation, and family participation). The second theme was psychological impact. The third theme was the lack of impact in some participants. The study coordinator (OA) and a rheumatologist (ND) compiled a list of questionnaire items from an item pool derived from the participant interviews. The items were reviewed by another rheumatologist (WJT) for validation. This process led to the development of a preliminary instrument comprising 35 items.

Cognitive testing was performed to ensure that each question was understandable and meaningful to the target population (people with tophi). Five healthcare professionals (three doctors and two nurses) and five patients with tophaceous gout (all male, three European, two Samoan, median age 45 years, median gout disease duration 26 years) participated in semistructured recorded interviews that examined the items in the preliminary instrument, to ensure readability and clarity of the questions. Modification of wording related to some items was made as a result of this testing to improve understanding. One item was removed (‘My tophi get irritated when they are rubbed against’) as all participants reported that the meaning of this statement was unclear. Following the cognitive interviews, a 34-item preliminary questionnaire was generated for further testing (shown in online supplementary table S1). Each item was constructed as a statement to which the respondent could agree or disagree (dichotomous scoring).

Clinical assessment and administration of the 34-item preliminary questionnaire

The preliminary 34-item questionnaire was administered to 103 people with tophaceous gout. Participants were consecutively recruited from clinical research units in Auckland and Christchurch, New Zealand, between February 2012 and June 2013. Eligible patients were identified through databases of clinical trials of antiosteoclast therapy or urate-lowering therapy. The 34-item questionnaire was completed at a study visit, and relevant clinical details about gout, including disease duration, flare frequency, serum urate concentration, Health Assessment Questionnaire (HAQ)-II scores,14 and subcutaneous tophus count were obtained at the same visit. Where other gout measures were obtained for the clinical trial protocol, these data were also included for analysis, including days off work in the previous month for those in paid employment (n=41), grip strength (n=48)15 and dual energy CT (DECT) MSU crystal volumes (n=79).16 All patients were asked to complete and return by post a repeat 34-item questionnaire approximately 2 weeks after the study visit (returned by 53 participants).

Development of the 20-item Tophus Impact Questionnaire (TIQ-20) using Rasch analysis

Fit to the dichotomous Rasch model was performed using the RUMM2030 software (RUMM Laboratory) and all available responses to the 34-item questionnaire (n=156). The technical approach followed the recommendations by Tennant and Conaghan.17 Briefly, the Rasch model consists of a theoretical relationship between the likelihood of a person answering any item in a particular way, and the amount of the concept being measured that is possessed by the person. Questionnaires that demonstrate this theoretical relationship possess superior measurement properties compared with questionnaires that do not demonstrate this relationship. The task of the analysis is, therefore, to find a parsimonious set of items that fit the Rasch model and to describe the properties of the resulting questionnaire.

An iterative approach was taken to identify a set of items that showed acceptable fit. Items with the largest positive and negative item fit residuals were removed one at a time to improve the overall item-trait interaction χ2 statistic. Person fit-residuals were also inspected to establish whether there were any significant outliers. In addition, evidence for differential item functioning (DIF) by age, disease duration and ethnicity was sought and corrected where necessary. The scale was examined to identify items that covered the greatest range and items that had very similar scale location were removed to create a parsimonious instrument with adequate scale coverage. Uniform and non-uniform DIF was tested by 2-way analysis of variance on each person-factor and class intervals (trait locations). Response dependency was checked using the correlation matrix of residuals. Overall item-trait χ2 statistic and item fit χ2 statistic, fit residuals and item difficulty are reported. Within each analysis, the conventional statistical significance of 5% was adjusted using the Bonferroni method because of multiple comparisons.

We report the person-separation index as a measure of the internal consistency reliability of the final instrument. Unidimensionality was established using the independent t test for the difference in trait estimates between two groups defined by items that either positively loaded or negatively loaded on a principle components factor analysis of residuals as described by Smith.18 The targeting of the final instrument is described by plotting the distribution of the item-difficulty parameter alongside the person-location (trait) estimates. The information function (that represents the reciprocal of the square of the SE of estimation) is also plotted to show the precision of the trait estimates at different parts of the scale.

Further validation of the TIQ-20

The 20-item Tophus Impact Questionnaire (TIQ-20) data were analysed for further validation and test-retest reliability using SPPS (v21, SPSS, Chicago, Illinois, USA). The Rasch model estimates of person-location were used for these analyses. Test-retest reproducibility over the 2-week interval was assessed using the intraclass correlation coefficient19 and Bland and Altman limits of agreement analysis.20 Frequency distributions were calculated, and floor and ceiling effects were analysed. Construct validity was assessed by analysing Spearman's correlations between TIQ-20 scores and other measures of gout severity. Calculated Spearman's correlations were compared with predicted correlations generated by consensus of three rheumatologists (ND, WJT, LKS), who were blinded to the calculated correlations.


Patient characteristics

The clinical characteristics of the 103 participants with tophaceous gout, and the 53 participants who provided retest data are shown in table 1. Patients were predominantly middle-aged men, with a wide range of gout disease duration. The number of tophi and dual-energy computed tomography (DECT) urate volume also varied widely within the group. In the 103 participants, there were 803 clinically apparent tophi present in total. Tophi were most frequently observed on the hands and wrists (480, 59.8%), the feet and ankles (194, 24.2%), followed by the elbows (82, 10.2%), the ears (34, 4.2%) and the knees (13, 1.6%).

Table 1

Clinical features of participants (n=103)

The 53 participants providing retest data were representative of the entire study group (with no significant difference between the groups with respect to age or HAQ). Similarly, those participants with DECT data and grip strength data were representative of the entire group. Those reporting days off work were younger (mean age 51 years), but otherwise did not differ from the entire study group in the patient characteristics shown in table 1.

Generation of the TIQ-20 by Rasch analysis

Rasch analysis generated a TIQ-20 (maximum score 20). There were 15 steps taken to derive the final instrument that involved removal of 12 items because of high-fit residuals, one item because of DIF by age, and one item because of duplicating scale location (see online supplementary table S2). No persons were removed from the analysis because of outlier status.

The t test for undimensionality showed that no t test comparisons were statistically significant at the 5% level, confirming unidimensionality of the TIQ-20 (all items are measuring, essentially, the same construct). The overall fit to the dichotomous model was good (χ2 44.41, p=0.29), mean (SD) item location was 0 (1.28), mean (SD) item fit residual was −0.46 (0.95), mean person fit residual was −0.28 (0.84), mean person-location was −1.12 (1.78). The satisfactory fit statistics means that the instrument conforms to the theoretical Rasch model, thereby conferring superior measurement characteristics. The non-significant χ2 test means that the observed responses were not significantly different from the responses predicted by the model. The fit residuals give another index of how close the observed responses were to the model-predicted values (residuals of zero indicate perfect model fit). The person separation index was 0.84.

Item parameters are shown in table 2. There were two residual correlations above 0.4 that potentially indicated response dependency: item7 (‘Writing is difficult because of my tophi’) by item8 (‘I have difficulty feeding myself because of my tophi’) (r=0.40) and item30 (‘My tophi have become infected’) by item32 (‘I have had surgery for my tophi’) (r=0.45). Since removal of any of these items failed to improve the overall item-trait interaction χ2, we decided to leave these items in the final instrument. The excluded items are shown in online supplementary table S2. There was no significant uniform or non-uniform DIF by age, disease duration or ethnicity (data not shown).

Table 2

Item parameters from the dichotomous Rasch model for the 20-Item Tophus Impact Questionnaire (TIQ-20)

Figure 1 shows the distribution of the trait in this sample compared with the distribution of the item-location. The trait distribution was strongly left skewed (most people having relatively less tophus burden) with a median location of −1.39 (IQR −3.63–0.85), whereas the instrument's precision was mainly in the range −1.5–2.5. This suggests that the instrument is more targeted to a population with relatively worse tophus burden than was observed in this study sample. This is analogous to saying that there is somewhat of a floor effect of the instrument—it will detect a less precise estimate of tophus burden in people with few or minimal concerns about their tophi. A floor effect is the most extreme instance of this problem, which is where no precision in the estimate is possible (because the person scored zero). In this sample, we did not observe such extreme floor effects in any patient.

Figure 1

Trait distribution and item location. The top graph shows the frequency of the person-location estimates (the amount of tophus burden experienced by each person), measured in logits. A more negative score indicates relatively less tophus burden. The bottom graph shows the distribution of the 20 items, in terms of what level of tophus burden an affirmative response means. The superimposed bell-shaped curve is the information function which is equivalent to the reciprocal of the squared SE of the estimate and shows the relative precision of the tophus burden measure. Overall, the graphic shows that the 20-Item Tophus Impact Questionnaire items tap into the construct of tophus burden within range of moderate to high tophus burden, whereas, the sample of people in this study had mainly low levels of tophus burden. The ideal instrument would have items that cover the whole scale.

The TIQ-20 included items related to pain, activity limitation, footwear modification, participation, psychological impact and healthcare use due to tophi. The final questionnaire is shown in table 3. The raw scores (range 0–20) are transformed to the Rasch-modelled score (TIQ-20 RM score, nominal range 0–10; in this dataset, the minimum score was 0.89, maximum score was 9.06) to give interval level trait estimates using the mean person-location for each raw score value (scoring and conversions shown in online supplementary table S3).

Table 3

The 20-Item Tophus Impact Questionnaire (TIQ-20)

Further validation of the TIQ-20

In the 103 patients with tophaceous gout, the median TIQ-20 raw score at the study visit was 6 (range 0–20) and the median TIQ-20 RM score was 3.92 (range 0.89–9.06). Mean (SD) raw TIQ-20 scores were 7.1 (5.2) and TIQ-20 RM score was 4.05 (1.75). The distribution of both raw and TIQ-20 RM scores is shown in figure 2. Floor effects were observed in 4.9% and ceiling effects were observed in 1%.

Figure 2

Distribution of 20-Item Tophus Impact Questionnaire (TIQ-20) scores. (A) Raw TIQ-20 scores. (B) Rasch-modelled TIQ-20 scores. X axis shows the score and Y axis shows the numerical frequency for each score.

For the 53 participants with TIQ-20 scores available at baseline and after 2 weeks, the test-retest intraclass correlation coefficient was 0.76 (95% CI 0.61 to 0.85). The mean (SD) difference between the test and retest values for the TIQ-20 RM score was 0.32 (1.20) with 95% limits of agreement of −2.08–2.72. The Bland and Altman plot is shown in online supplementary figure S1.

The predicted and calculated correlations for construct validity testing are shown in table 4. TIQ-20 scores weakly correlated with serum urate concentrations (r=0.25, p=0.01). TIQ-20 scores moderately correlated with gout flare frequency (r=0.33), subcutaneous tophus count (r=0.39) and dual-energy CT urate volume (r=0.44) and more strongly with HAQ-II scores (r=0.52), p<0.001 for all. All predicted strengths of correlations were observed.

Table 4

Construct validity analysis for TIQ-20


We have developed a tophus-specific PRO in patients with tophaceous gout. The TIQ-20 is a short questionnaire that is simple to use and is publicly available for studies of patients with tophaceous gout. We recommend that Rasch modelled scores are used for the TIQ-20 (scoring instructions and conversion table are shown in online supplementary table S3). The cognitive testing process has ensured that the items within the questionnaire are understandable to people with tophaceous gout. The questionnaire demonstrated acceptable psychometric properties, and initial results show internal consistency, face, content and construct validity, reproducibility and feasibility. The TIQ-20 captures many aspects of the experience of living with tophaceous gout that were identified through qualitative research, including functional impact, psychological impact and lack of impact in some individuals.13

There are several other PROs in current use in studies of chronic gout. The HAQ and Medical Outcomes Study Short Form 36 (SF-36) have been endorsed by OMERACT as outcome measures for activity limitation and health-related quality of life, respectively, for chronic gout studies.8 ,9 These questionnaires are not specific to gout, and may not capture all aspects of the experience of tophaceous gout. The only available gout-specific PROs are the Gout Assessment Questionnaire (GAQ) and GAQ2.0.21 At present, these questionnaires are not endorsed by OMERACT, due to concerns about construct validity (truth) and lack of internal consistency for some of the scales.8 ,22 The gout impact scale of the GAQ2.0 captures many aspects of the experience of gout, but is heavily weighted to the impact of gout flares and does not specifically examine the impact of gouty tophi. It is possible that the TIQ-20 will provide additional useful information to the GAQ2.0, given its high construct validity and ability to specifically capture aspects of disease that are not covered within the GAQ2.0.

A variety of methods of tophus measurement have been used in studies of gout.7 These include physical measurement of subcutaneous tophi (counting the number of tophi, tape measurement of tophus area, Vernier callipers of longest tophus diameter),23–25 digital photography26 and advanced imaging methods (ultrasonography, MRI, conventional CT, and dual-energy CT).27–30 In our study, as predicted, the TIQ-20 correlated with the number of subcutaneous tophi and the DECT urate volume to a moderate degree. These findings indicate that factors other than amount of tophus, or urate deposition, contribute to the impact of tophi on the individual. As a quantitative measure of the patient experience of tophus impact, the TIQ-20 may allow examination of how tophus-specific factors, such as site or location, size, consistency, composition and complications contribute to the patient's experience of tophaceous gout.

This study has some limitations. The items were derived from a relatively small qualitative study of 25 participants, and may not have captured the entire experience of tophaceous gout. In particular, almost all participants were male, consistent with the higher prevalence of gout in men. The experience of tophi may be different in women than men,31 and further studies to examine the properties of this questionnaire in women are warranted (particularly in elderly women with tophi associated with Heberden node formation). Almost half the study participants were of Māori or Pacific ancestry, consistent with the high prevalence of gout observed in these populations in Aotearoa New Zealand.32 ,33 Cross-cultural validation and further testing in other languages is required. Measurement precision influences sample size requirements to detect change and size of detectable differences. Our findings indicate that the TIQ-20 is most suited to study populations with higher tophus burden.

Our analysis, to date, has shown that the TIQ-20 fulfils many aspects of the OMERACT filter.34 The questionnaire is feasible; it can be completed and scored within a few minutes. It is free, understandable and acceptable to patients. Minimal training is required for scoring, no special equipment is required, and raw data can be easily maintained within study documents. The truth aspects of the filter are also fulfilled; the process for development has led to face and content validity, with the central aspects of the patient experience of tophi captured within the questionnaire. Construct validity has been tested and is acceptable. The psychometric properties of the questionnaire are also acceptable with low ceiling and floor effects. The test-retest reproducibility is similar to that observed with that reported for the HAQ in gout studies.35 A key aspect of the OMERACT filter is sensitivity to change. Future work will focus on whether TIQ-20 scores change in response to treatment, particularly urate-lowering therapy over time.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Tore K Kvien

  • Contributors ND (the guarantor) accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. ND and WJT conceived of the study. OA coordinated the study. WJT and RJS did the Rasch analysis. AH, MEH, PT and JD recruited participants and coordinated study visits. LKS assisted with patient recruitment and construct validity analysis. OA, WJT and ND drafted the first version of the manuscript. All authors contributed to manuscript revisions and approved the final manuscript.

  • Funding This project was funded by the Health Research Council of New Zealand (12/111). OA was supported by a New Zealand Ministry of Health Pacific Health and Workforce Award (2013-PHDWA-01).

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval Northern X Ethics Committee (NTX/08/06/050).

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.