Objective To identify methods of tophus measurement for gout studies, summarise the properties of these methods and compile a detailed pictorial reference guide to demonstrate the methods.
Methods A systematic search strategy for methods of tophus measurement was formulated. For each method, papers were assessed by two reviewers to summarise information according to the specific components of the Outcomes Measures in Rheumatology (OMERACT) filter: feasibility, truth and discrimination. Detailed images were obtained to construct the reference guide.
Results Eight methods of tophus measurement were identified: counting the total number of tophi, physical measurement using tape measure, physical measurement using Vernier callipers, digital photography, ultrasonography (US), MRI, CT and dual energy CT. Feasibility aspects of the methods are well documented. Physical measurement techniques are more feasible than advanced imaging methods, but do not allow for assessment of intra-articular tophi or for data storage and central reading. The truth aspect of the filter has been documented for many methods, particularly Vernier callipers, US, MRI and CT. Reliability of most methods has been reported as very good or excellent. Sensitivity to change has been reported for all methods except MRI and CT.
Conclusion A variety of methods of tophus assessment have been described for use in clinical trials of chronic gout. Physical measurement techniques (particularly the Vernier calliper method) and US measurement of tophus size appear to meet most aspects of the OMERACT filter.
Statistics from Altmetric.com
The tophus is a pathognomonic feature of chronic gout. Tophi are chronic granulomatous lesions surrounding a core of monosodium urate monohydrate (MSU) crystals, encased by dense connective tissue.1 Tophi are usually non-tender but may lead to cosmetic problems, mechanical obstruction of joint movement, joint damage and musculoskeletal disability.2 Gout is considered a curable disease3; MSU crystal deposition is reversible and tophi regress and ultimately disappear when urate-lowering therapy (ULT) is effective.4,–,9 In a recent Delphi exercise regarding outcome measures for clinical trials, tophus regression was considered mandatory for studies of patients with chronic tophaceous gout,10 yet the optimal method for measuring this regression remains uncertain.11 ,12
Using a systematic review of the literature, the aim of this work was to identify methods of tophus measurement and summarise the properties of these methods for use in studies of chronic gout, using the framework of the Outcomes Measures in Rheumatology (OMERACT) filter. In addition, we wished to compile a guide of various methods of tophus measurement to provide a reference for researchers undertaking clinical research in chronic gout.
A systematic search strategy was formulated to provide a written summary of the evidence for each method of tophus measurement. Searches were performed in the following electronic databases: PubMed, Medline, Web of Science, Cochrane Central Register of Controlled Trials (The Cochrane Library), Excerpta Medica Database (EMBASE), European League Against Rheumatism (EULAR) meeting abstract archive and American College of Rheumatology Annual Scientific Meeting abstract archive. Google Scholar was also used, as well as a thorough scan of bibliographical references of individual publications. Data sources were English publications from these databases, hand searches and guidelines. No date restrictions were used. Specific searches related to each method of measurement identified were also performed using the following search strategy: method (eg, MRI)*tophus/tophi*measure*gout. A total of 1411 papers were generated by the search, with 1324 deemed to be not appropriate based on reviewing the abstract as they did not address any aspect of the OMERACT filter. A total of 21 papers were finally selected as being relevant to tophus measurement and were included in the review (supplementary table S1).
For each method, papers were assessed by two reviewers (CS and ND) to summarise detailed information describing the measurement protocol, equipment and personnel required, reporting of the measure and detailed images demonstrating the method. Further information regarding measurement and confirmation of the protocol were obtained from principal investigators involved in developing or validating each method. Detailed images were obtained to construct the reference guide. Patients provided written informed consent prior to image acquisition.
For each method, papers were also assessed by the two reviewers to summarise information according to the specific components of the OMERACT filter: feasibility, truth and discrimination.13 Further details and definitions regarding the aspects of the filter that were considered are outlined in supplementary methods S1.
Eight methods of tophus measurement were identified: counting the total number of tophi, physical measurement using tape measure, physical measurement using Vernier callipers, digital photography, ultrasonography (US), MRI, CT and dual energy CT (DECT). Tables 1–3 summarise the results of the assessment of feasibility, truth and discrimination components of the OMERACT filter for each method. Detailed descriptions of each method are provided in supplementary results S1.
Counting the total number of subcutaneous tophi
Recording the total number of visible tophi on physical examination is a rapid and inexpensive method that does not require specific equipment (figure 1A). In a 52-week clinical trial comparing febuxostat and allopurinol there was little change in the total number of tophi, and no difference among the treatment groups in the percentage reduction in the number of tophi.4 However a 58.5% reduction in the total number of tophi was reported after 40 months of effective ULT, with an effect size of 0.47.5 Between-group discrimination has been demonstrated; a significantly larger decrease in the total number of tophi was reported in the patients taking 120 mg febuxostat daily (−1.2) compared with placebo (−0.3) after 28 weeks of treatment, p<0.05.6
No intraobserver, interobserver and test–retest reproducibility data are currently available for this method of assessment. Furthermore, data storage to allow rechecking and central reading is not possible using this method.
Tape measurement of subcutaneous tophus size
This method involves the use of a standard tape measure to determine the distance between two pen marks that have been drawn on a predefined length and width axis (perpendicular to one another). The two-dimensional area is then calculated by multiplying these two measurements (figure 1B).14–15 A multicentre study assessed intraobserver reproducibility of subcutaneous tophus measurement using tape measure.15 This was determined by measuring the area of the same tophus at two visits separated by 5–10 days. The mean (SD) difference in tophus areas between visits was −0.2 (835) mm2 and the intraclass correlation coefficient (ICC) was 0.92. Interobserver reliability was assessed at two sites; mean (SD) difference was 7 (925) mm2 and −150 (982) mm2, and ICCs ranged from 0.85 to 0.92. There were no statistically significant differences in area measurements between visits. Reproducibility (intraobserver and interobserver) depended on anatomical location, size and observer experience with largest variations in measurement noted for elbow tophi, and least for well demarcated tophi on the hands. Measurement differences were smallest for tophi <500 mm2 but increased with tophus size >500 mm2.
In a randomised clinical trial, there was no significant difference in tophus regression between febuxostat and allopurinol treated groups after 1 year of treatment.4 However, post hoc analysis did show a trend to greater regression of an index tophus in those with mean post baseline serum urate <6 mg/dl at week 52 (75% reduction in index tophus size for those patients achieving serum urate <6 mg/dl vs 50% reduction in those who did not, p=0.06). In a long-term extension study of 1086 patients treated with febuxostat or allopurinol, index tophus size reduced between 59% following 3 years of effective ULT using this method of measurement, with an effect size of 0.48.5 In studies comparing febuxostat and allopurinol, discrimination of tophus size measurement between the treatment groups has not been demonstrated.4–5
Vernier callipers for measurement of subcutaneous tophus size
Measurement of the longest tophus diameter with Vernier callipers has been used as a method of subcutaneous tophus assessment (figure 1C).8 The properties of this assessment technique have been analysed in a study comparing physical measurement and CT measurement of tophus size in the hands.16 The longest tophus diameter was measured by two independent observers on the same day using 150 mm digital Vernier callipers. Of 20 patients in total, 5 underwent repeat examinations within 1 week. In this study, the ICC for intraobserver reproducibility was 0.996 and for interobserver reproducibly was 0.985. There was strong correlation between CT and physical tophus measurement (r=0.91, p<0.0001), and physical measurement had similar reliability to CT measurement.
Change sensitivity and between-group discrimination have been demonstrated in a 5-year longitudinal prospective study of patients treated with ULT.8 In this study, the velocity of tophus reduction was measured by analysing the time from baseline to complete resolution of the index tophus (reported as mm/month), with an effect size of 1.83. This measure strongly correlated with intensity of urate lowering (r=−0.62, p<0.05). Compared with allopurinol alone, benzbromarone alone or combination treatment of benzbromarone and allopurinol also led to greater velocity of tophus reduction, p<0.01.8
Digital photography for measurement of subcutaneous tophus size
A method of computer-assisted digital photographic assessment of subcutaneous tophus size has recently been described (supplementary figure S1).7 This method has shown sensitivity to change and between-group sensitivity7; following treatment with pegloticase for 6 months, 37% of those who consistently achieved serum urate of <6 mg/dl achieved complete resolution of tophi using this method, compared with 13% of those with higher serum urate concentrations (p=0.0002). Pegloticase treatment (8 mg every 2 weeks) was also associated with higher rates of complete resolution than placebo (p=0.006). Effect size has not been reported.
To date, intraobserver, interobserver and test–retest reproducibility have not been reported. Furthermore, some limitations of this method have been suggested; assessment of a three-dimensional lesion using two-dimensional images may not be ideal, and some difficulty identifying borders of resolving lesions has also been described.7
US for measurement of tophus size
Tophi are identified on US as hypoechoic to hyperechoic inhomogeneous material surrounded by a small anechoic rim (figure2).17 Tophus size can be reported as longest diameter and total volume.9 The properties of US as a tool for tophus measurement have been reported.9 US detected 37/41 (90%) of tophi detected by MRI. There was a strong relationship between longest tophus diameter measured by US and MRI (r2=0.65). Puncture of nodules suspected to be tophi on US recovered MSU crystals in 83% of procedures. Intraobserver ICCs were >0.90 for diameters and volume, and 0.71 to 0.83 in comparisons between observers.
In a 12-month prospective observational study of ULT, US assessment of tophus size was shown to be sensitive to change.9 A strong correlation was reported between average serum urate concentration and change in maximal diameter and volume of tophi during ULT. Guyatt's effect size was 1.7 for maximal diameter change and 1.93 for volume change. The smallest detectable difference (SDD) was 5.5 mm for longest diameter and 1.27 cm3 for volume. Patients with a reduction in maximal diameter greater than the SDD had a mean (SD) serum urate of 5.04 (0.79) mg/dl compared to 6.03 (0.62) mg/dl in patients whose tophi showed no reduction (p<0.01).9 Between-group discrimination has not been demonstrated.
Unlike the physical measurement techniques, US and other imaging methods are able to assess subcutaneous and intra-articular tophi. However, these methods do require expensive and specialised equipment, advanced training of operators and readers, and longer times for analysis. US is particularly operator dependent, which may limit its use in multicentre clinical trials. A further limitation of US may be the ability to store raw data for crosschecking and central reading.
MRI for measurement of tophus size
On MRI, tophi are identified as structures with an intermediate signal intensity appearance on T1, but more variability on T2-weighted images.18 ,19 A total volume measurement using MRI can be obtained on unenhanced consecutive spin echo images by manually tracing the margins of the tophus using consecutive images (supplementary figure S2).20 Contrast is not necessary for quantification of tophus volume.20 Longest tophus diameter has also been reported using MRI.9 Subcutaneous and intra-articular tophi can be assessed using this method.
Concurrent validity has been assessed by comparing US and MRI measurements of tophus size.9 MRI confirmed 37/46 (81%) of the nodules reported by US, and MRI measurement of longest diameter correlated well with US measurements of maximal diameter (r2=0.65).
A study of tophus volume measurement using MRI showed that intrareader reproducibility was excellent, with no statistically significant difference in mean tophus volume between visits.20 There was a small, but statistically significant, difference between readers. Intraobserver and interobserver differences were independent of tophus size.
To date, no data have been reported regarding sensitivity to change and between-group discrimination. MRI may have specific issues related to feasibility, particularly the long scanning time and positioning that may be uncomfortable for patients with joint disease. The high cost of MRI scanning may further limit its use as an outcome measure.
Conventional CT for measurement of tophus size
Conventional CT can be used to detect the presence of tophi in patients with gout.21,–,23 Tophus volume and longest diameter can be assessed using standard software, which allows semiautomated quantitative assessment of predefined tissues (supplementary figure S3).16, 24 Subcutaneous and intra-articular tophi can be assessed using this method.
A study has assessed reliability of CT measurement of subcutaneous tophus volume, and compared reliability of CT with Vernier calliper measurement of tophus size.16 CT assessment of tophus volume was found to have good reliability for analysis of tophus size, with intraobserver and interobserver reliability ICCs >0.98. There was no difference in reliability between CT and physical measurement and there was excellent correlation between the measurements.
To date, there have been no data reported regarding sensitivity to change and between-group discrimination. CT does have benefits of raw data storage, excellent resolution of tophi and short scanning time. However, the use of ionising radiation may reduce patient acceptability, noting that positioning away from core structures significantly reduces the risks related to radiation exposure.
DECT for measurement of tophus volume
Tophi can be identified by DECT, using a specific display algorithm that assigns different colours to materials of different chemical composition (figure 3).25,–,27 Using dedicated automated volume assessment software, tophus volumes can be measured in the hands, wrists, elbows, feet, ankles and knees. These volumes are then summed to obtain a total tophus volume load.26
The utility of DECT was assessed in 20 patients with tophaceous gout and 10 control participants with other forms of arthritis.26 In all patients with gout, red colour-coded urate deposits were present. None of the scans from control participants had these deposits on their DECT scans. DECT scans were able to show 440 areas of urate deposition in 20 patients, compared with only 111 areas found by physical examination. In a small clinical study, all responders to ULT (n=10) had a reduction in tophus volume using DECT with a median reduction of 64% over 19 months (p=0.002), effect size 0.37.28
To date, no information has been reported on intraobserver or interobserver reproducibility or test–retest reproducibility. Further data regarding change sensitivity and between-group sensitivity are also required. As with conventional CT, the use of ionising radiation may reduce patient acceptability. A further consideration is that, unlike conventional CT, DECT is not universally available, which may further limit the feasibility of this modality in multicentre studies.
This review has identified a number of different methods that have been used to assess tophus size. These modalities range from simple methods of physical measurement to complex and expensive methods requiring advanced imaging tools. To date, there has been little direct comparison of the properties of the various methods. However, the OMERACT filter does provide a framework to assess available data concerning the properties of the various methods, and to allow some comments about the advantages and disadvantages of each method.
The physical measurement techniques are generally more feasible than advanced imaging methods, as they are simple to perform, cost effective, non-invasive and acceptable to patients. However, physical measurement methods do not allow for storage of images, crosschecking of raw data or central reading. Despite the additional costs, patient inconvenience and the prolonged image acquisition and/or analysis time, advanced imaging methods do offer a number of advantages. For CT, DECT and MRI, raw data storage allows standardisation of assessment using central readers and optimal data management. The digital photography method may have particular advantages due to relatively low costs, excellent data storage qualities and high patient acceptability.
The truth aspect of the filter has been documented for many methods, particularly Vernier callipers, US, MRI and CT. All methods have high face validity. The physical measurement techniques and digital photography do not allow assessment of intra-articular tophi, which may be present in the absence of subcutaneous tophi.29 The advanced imaging methods all allow for simultaneous assessment of subcutaneous and intra-articular tophi. Acceptable construct and criterion validity have been reported only for Vernier calliper, US, MRI and CT methods.
For all methods except for counting of tophi, digital photography and DECT, reliability has been reported. In general, intraobserver and interobserver reliability are excellent, with the possible exception of volume assessment using MRI. All methods except CT and MRI have demonstrated sensitivity to change. Comparison between different methods is difficult when assessing sensitivity to change, as studies have used different interventions, patient groups and time periods. For example, a method used in a study of a highly potent ULT such as pegloticase is more likely to show sensitivity to change and between-group discrimination over a short time period, compared with another method used in a study of a less potent ULT. Between-group discrimination data are lacking for most methods except counting the total number, Vernier callipers and digital photography.
A number of uncertainties remain about measurement of tophus size. A particular issue is the reporting of change in index tophus size as the measure of global tophus regression. This reporting assumes that all tophi regress at a similar rate in response to effective treatment. However, the internal structure of tophi may vary in different sites or patients. Local factors that may influence the rate of tophus regression in response to the same ULT efficacy include the surrounding granulomatous tissue, the intensity of the accompanying inflammation, the presence of calcification, the density of the vascular net surrounding them and the characteristics of the tissues on which tophi formed (lax tissues such as the olecranon bursa may facilitate large tophi while in tighter areas of hands or feet tophi may be multiple and smaller).
In summary, this review has demonstrated that a variety of tophus measurement tools are currently in use in studies of chronic gout. Physical measurement techniques and US measurement of tophus size fulfil the major aspects of the OMERACT filter, and may be most widely adopted due to their relatively low cost and high patient acceptability. The Vernier calliper method has been shown to fulfil all aspects of the filter but has not been used in a RCT. A further objective of this document was to provide a useful pictorial reference guide for researchers studying patients with tophi. We hope that this guide will enhance the standardisation of tophus measurement in studies of chronic gout, and allow improved ability to compare the results of different studies through the use of consistent methodology. The ongoing research agenda includes assessing the components of the OMERACT filter that have not yet been addressed, comparative studies to determine the relative advantages of each method, the validity of reporting change of an index tophus as an outcome measure, and calculation of the minimum important clinical difference for each method.
CS was the recipient of a University of Auckland summer studentship. PM is an employee of Takeda Global Research & Development. SH is an employee of Savient Pharmaceuticals.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.