Statistics from Altmetric.com
Gout is the most common inflammatory arthritis in men and is increasing in prevalence.1 ,2 Most gout is managed in primary care where the diagnosis seldom relies upon identification of monosodium urate (MSU) crystals.2 Therefore, classification criteria that do not require MSU crystal identification would be useful for clinical research in primary care settings. Six classification criteria for gout have been developed but the most widely used is the 1977 American Rheumatism Association (ARA) criteria.3 ,4
Current classification criteria have been tested in populations with average disease duration of 7–10 years3 ,5 or where disease duration was not reported.6–9 However, identification of patients with early disease is important to test questions related to early treatment of gout or in order to study the natural history of gout in inception cohorts.
The Study for Updated Gout ClAssification CRiteria (SUGAR) was undertaken as part of an American College of Rheumatology (ACR) and European League Against Rheumatism project to update gout classification criteria4 and allows a direct comparison of existing criteria in early disease compared with established disease.
Between January 2013 through April 2014, consecutive patients attending a rheumatology clinic with joint swelling or a subcutaneous nodule within the previous 2 weeks, which were judged by a clinical investigator to be conceivably due to gout were enrolled into this cross-sectional study. The need for a recently swollen joint or subcutaneous nodule was to improve the likelihood of obtaining material for microscopy. Patients were generally referred from primary care for treatment. All clinical investigators were rheumatologists with an interest in gout and were the treating physician. Clinical manifestations and a clinical diagnosis (according to the treating physician) were recorded prior to synovial fluid (SF)/tissue polarised light microscopy. Clinical data including all items within published classification criteria were collected at the index visit using a pro forma with clear item definitions supported by a data collection interview schedule and web-based training. Each centre received Ethics Committee Approval or Institutional Review Board approval according to local requirements.
Gold standard for gout
The gold standard for classification was presence of MSU crystals in SF or tophus aspirate as observed by a competent observer. This gold standard was chosen on the basis of pathophysiology, clinical practice and recommendations for diagnosis by European League Against Rheumatism.10 While it is possible that some patients with gout will not show MSU crystals at a moment in time because of excellent treatment or sampling reasons, this was unlikely in this study since all patients had symptomatic disease and joint or tissue aspiration was performed by rheumatologists with an interest in gout.
All patients underwent arthrocentesis or tissue aspiration for polarising microscopy to identify MSU crystals. The choice of aspiration site(s) was at the discretion of the clinical investigator. Microscopy was undertaken by observers who had passed a two-stage MSU-identification certification procedure, which consisted of a web-based crystal recognition test followed by examination of five to eight vials of SF from the laboratories of Eliseo Pascual (European centres) or H. Ralph Schumacher (rest of the world). The web-based test was strict and had a high non-pass rate.11 Each SF sample in the second stage needed to be correctly identified as demonstrating MSU crystals or not to achieve certification.
Gout was defined as presence of MSU crystals identified by a certified observer. Non-gout was defined as absence of MSU crystals, irrespective of the clinical diagnosis. SF/tissue microscopy by a certified observer was performed within 1 month of the index visit and was blinded to the collection of classification items. The 1 month grace period was to allow scheduling of ultrasound-assisted arthrocentesis if deemed appropriate by the clinical investigator. Note that microscopy was performed according to the practice of the certified observer following arthrocentesis (immediately in the majority of cases).
Disease duration was defined by patient self-report of the time since onset of first symptoms. Early disease was defined as symptom onset of no more than 2 years; established disease was defined as symptom duration of more than 2 years.
The comparison criteria sets from published studies were the 1977 ARA preliminary criteria (survey and complete format),3 an abbreviated form of the ARA criteria (Mexico),12 a criteria set developed in primary care (Netherlands),6 the Rome13 and New York14 criteria and modified versions of the Mexico, Rome and New York criteria that excluded SF/tissue microscopy. The details of these criteria are shown in online supplementary table S1.
The sensitivity and specificity of each criteria set were calculated in early and established disease separately. In addition, a sensitivity analysis that excluded the non-gout patients who had a clinical diagnosis of gout but were MSU crystal-negative was performed to check that specificity estimates were not underestimated by contamination of the control sample with gout cases. It should be noted that specificity is likely to be overestimated in this analysis.
Statistical comparison of differences in sensitivity or specificity was done using logistic regression in gout (for sensitivity) and non-gout (for specificity). The ARA (survey) criteria were the reference category for criteria so that the quoted ORs are relative to the sensitivity/specificity of the ARA (survey) criteria. Separate models were used for early and established disease to compare sensitivity/specificity by disease duration and a full regression model that included disease duration as a categorical covariate and as an interaction term was also calculated to assess the overall effect of disease duration on sensitivity and specificity across all criteria sets. For empty cells, 0.5 was added to permit estimation.
Receiver operating characteristic points were plotted for each criteria at the published threshold. These are the false positive (proportion of non-gout classified as cases) and true positive (proportion of cases classified as cases) rates plotted against each other.
Twenty-five centres in 16 countries collected data from 983 patients (509 cases, 474 non-cases), of whom 702 (71.4%) were male (table 1). Early disease (2 years or less) was observed in fewer gout cases (144, 28.5%) than non-gout cases (228, 48.5%). Non-gout cases had various clinical diagnoses shown in table 1. There was some variation across centres/countries in relation to proportion of gout cases recruited, duration of current episode, total duration of disease, age and gender but not in relation to proportion of patients with tophi (see online supplementary table S2). The distribution of current joint involvement is shown in online supplementary table S3 and 74% of gout cases had first metatarsophalangeal (MTP1) involvement at any time during their disease course.
Across all criteria sets, later disease was associated with better sensitivity (95.3%) than early disease (84.1%), (OR 4.4, 95% CI 2.5 to 7.8, p<0.001). Conversely, early disease was associated with better specificity (79.9% vs 52.5%), (OR 4.7, 95% CI 2.8 to 7.7, p<0.001). There was no significant interaction between disease duration and particular criteria in respect of sensitivity or specificity.
The sensitivity and specificity for each classification criteria by disease duration are shown in table 2a and b. The point estimates for sensitivity and specificity are also shown plotted on a receiver operating characteristic plot in figure 1 and performance of each criteria in the whole data set is shown in online supplementary table S4. Note that criteria which include MSU crystals in SF/tissue alone as sufficient for classification will show 100% sensitivity by definition, since case-ness in the SUGAR dataset only required demonstration of MSU crystals. Analysis of the sensitivity of these criteria is not possible. Exclusion of MSU crystal examination as a criterion for classification generally led to a marked reduction in sensitivity. Table 2 separates the criteria that incorporate MSU crystal examination from those that do not.
Excluding non-gout cases (MSU negatives) who had a clinical diagnosis of gout, specificity estimates improved somewhat to 69.3–90.4% (early disease) and 39.3–77.1% (later disease) (see online supplementary table S5). The criteria with the best specificity in this analysis were the New York criteria (90.4% in early disease and 77.1% in later disease).
All clinical criteria had adequate sensitivity in later disease but in early disease the clinical versions of Rome, New York and ARA criteria had low sensitivity. The Netherlands criteria and the clinical version of the Mexico criteria demonstrated adequate sensitivity even in early disease. Conversely, specificity was generally less satisfactory and worse in later disease, particularly for the Mexico and Netherlands criteria.
Early disease is more challenging to classify since not all characteristic features will present early. This analysis of the SUGAR data set has shown that newer clinical-only classification criteria (Mexico and Netherlands) have fairly good sensitivity even for early disease but very poor specificity. Specificity in early disease for all criteria is better than in established disease, while in established disease specificity is problematic with values of less than 70%.
The context of study is important when deciding on the optimal sensitivity and specificity of classification criteria. For early phase studies of new treatments of unknown toxicity, criteria with very high specificity is probably necessary, whereas epidemiological or outcomes researchers may wish to be more inclusive and value sensitivity over specificity. The same will be true of studies in early disease.
A previous study showed the specificity of the clinical versions of the Rome, New York and ARA criteria to be 78.8–88.5%, which is somewhat higher than we observed.8 The difference may be due to the selection of patients in that study, who were recruited because they had undergone SF aspiration at any time,8 possibly leading to selection bias. In another study of the ARA criteria in primary care with possible gout, the specificity was only 64%,7 which is closer to what we observed. Possibly, the large number of controls with calcium pyrophosphate deposition disease (23% of control group) and inclusion of patients with a clinical diagnosis of gout but negative for MSU crystals as control patients (10%) contributed to the lower specificity estimates observed in our study.
The strengths of this study include the rigorous gold standard diagnostic test being available in all cases and non-gout cases, the large numbers of participants from multiple geographical sites, pertinent comparator diseases, and the comprehensive data collection that allowed classification by multiple criteria sets. In addition, we confirmed specificity estimates in patients without clinical diagnoses of gout.
Although demonstration of MSU crystals in SF or tophus tissue is considered to be the best way to diagnose gout in clinical practice, it is theoretically possible that some cases defined as gout or non-gout using MSU crystal identification were wrongly diagnosed. MSU crystals might not signify gout in the hypothetical situation of inflammatory arthritis and the small percentage of people with asymptomatic hyperuricaemia who have MSU crystals in SF.15 As far as we know, studies of MSU crystals in asymptomatic patients with hyperuricaemia have specifically excluded patients with other inflammatory arthritis so technically the frequency of MSU crystals in patients with non-gout inflammatory arthritis is unknown. While it is not possible to exclude this possibility in any of the gout cases, such a combination of factors seems unlikely. Only 15 MSU crystal positive cases were clinically diagnosed with a non-gout diagnosis (ie, <3%), so the impact on criteria performance is likely to be very small. It is not clear what the effect of any such misclassification by the gold standard would be on the performance of criteria. Increases or decreases in sensitivity and specificity are conceivable, so overall the effect on the observed results would tend to be tiny.
Some non-gout cases might actually have had gout but MSU crystals were absent because of excellent treatment or other factors. Again, this possibility cannot be disproved but is unlikely to represent a large number of subjects. Since all individuals were symptomatic and only 42 (9%) control patients were on uric acid lowering treatment, it was unlikely that they had depleted all MSU crystal deposition. There were 47 patients who were MSU crystal negative but received a clinical diagnosis of gout. Exclusion of these patients did increase the specificity of the criteria (see online supplementary table S5), but such estimates are likely to be overestimates and will affect all criteria to a similar extent. So, while the absolute value of the observed specificity estimates may be lower than the true values, the relative values across criteria are likely to be similar permitting a valid comparison. All the criteria under consideration in this study enabled classification as gout in the absence of MSU crystals. The effect of not having MSU crystals were neutral with respect to meeting or not meeting criteria (there was no ‘penalty’ or negative effect), which meant that it was possible to define non-gout cases as absence of MSU crystals and still be able to compare the performance of classification criteria in non-gout, when defined in this way.
The main limitation is the recruitment of patients from specialist care, which confers unavoidable spectrum bias (likelihood of more severe disease than is seen in primary care). The effect of severity bias will be to inflate sensitivity estimates (since more severe gout cases are more likely to fulfil classification criteria) and to deflate specificity estimates. So the absolute values of the sensitivity and specificity estimates are not generalisable to other (less severely affected) populations. Nevertheless, the relative difference in performance between early compared with established disease and the relative performance across different criteria is unlikely to be significantly affected by severity spectrum bias.
It is possible that patients with large joint disease were preferentially selected because of the need for joint aspiration. However, this is unlikely to have been a major factor since 74% of gout cases had MTP1 involvement at some time in their disease course, which is within the range of what is observed in other gout cohorts15 and furthermore 128/1004 (13%) joint aspirations were from the MTP1 joint. There were 34% of gout cases with currently tender MTP1. MTP1 involvement at any time was used in the evaluation of the different criteria sets.
Also, the definition of early disease (less than 2 years) is fairly arbitrary and the accuracy of the duration of self-reported symptoms may not be optimal nor similar between cases and controls. Finally, there was variation across countries in respect of some patient characteristics, which might reflect different populations, patient selection or local practice. Overall, such variation is unlikely to reflect systematic bias given the large number of centres and patients and probably just contributes to random error.
It was the intent of the SUGAR study to enrol acute and chronic disease without reference to stage of disease, so that the results will ultimately inform the development of criteria that classifies patients with symptomatic gout irrespective of disease stage. Since recruitment required a recently swollen joint or nodule to improve the likelihood of available SF/tissue for polarising microscopy, it may seem likely that the sample was biased towards acute disease. However, it was not specified how long a joint must have been swollen so that patients with persistently swollen joints, or patients with flares on a background of chronic disease were able to be recruited. The median duration of disease from first symptoms was 8 years in the gout group, indicating that chronic disease was common. Although the intent of some classification criteria (particularly the ARA criteria) is for classification of the acute gouty arthritis, in fact such criteria can be applicable to patients with chronic disease too, since (for example) the presence of tophi or radiographic changes contributes to fulfilling these criteria. These are features of chronic disease. Staging will likely require additional criteria to be applied, for example criteria that define acute flare in people known to have gout.16 The focus of the present study is to classify gout as a whole, rather than to classify specific stages of gout, which will require additional work.
The results of this study suggest that the major problem for existing classification criteria is not so much inadequate sensitivity in early disease but rather low specificity in early and (even more so) in later disease, particularly for criteria that do not require MSU crystal examination. Criteria with better specificity are required, especially for well-established disease.
The authors gratefully acknowledge the help of Joung-Liang Lan, Chien-Chung Huang, Po-Hao Huang, Hui-Ju Lin and Su-Ting Chang (China Medical University Hospital, Taiwan), Anne Madigan (Dublin, Ireland), Yi-hsing Chen (Taichung, Taiwan), Alain Sanchez-Rodríguez and Eduardo Aranda-Arreola (Mexico City, Mexico), Viktoria Fana (Copenhagen, Denmark), Panomkorn Lhakum and Kanon Jatuworapruk (Chiang Mai, Thailand), Dianne Berendsen and Femke Lamers-Karnebeek (Nijmegen, Netherlands), Olivier Peyr (Paris, France), Ana Beatriz Vargas dos Santos (Rio de Janeiro, Brasil), Fatima Kudaeva (Moscow, Russia), Angelo Gaffo (Birmingham AL), Douglas White (Hamilton, New Zealand), Giovanni Cagnotto (Pavia, Italy) and Juris Lazovskis (Sydney, Canada) with data collection, crystal examination or patient referral. The authors are grateful to Eliseo Pascual (Alicante, Spain) for help with MSU observer certification. The authors particularly acknowledge Victoria Barskova (Moscow, Russia) who died during the course of this study and wish to dedicate this manuscript to her memory.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.