Objective: To evaluate the ability of the widely used ACR set of criteria (both list and tree format) to diagnose RA compared with expert opinion according to disease duration.
Methods: A systematic literature review was conducted in PubMed and Embase databases. All articles reporting the prevalence of RA according to ACR criteria and expert opinion in cohorts of early (<1 year duration) or established (>1 year) arthritis were analysed to calculate the sensitivity and specificity of ACR 1987 criteria against the “gold standard” (expert opinion). A meta-analysis using a summary receiver operating characteristic (SROC) curve was performed and pooled sensitivity and specificity were calculated with confidence intervals.
Results: Of 138 publications initially identified, 19 were analysable (total 7438 patients, 3883 RA). In early arthritis, pooled sensitivity and specificity of the ACR set of criteria were 77% (68% to 84%) and 77% (68% to 84%) in the list format versus 80% (72% to 88%) and 33% (24% to 43%) in the tree format. In established arthritis, sensitivity and specificity were respectively 79% (71% to 85%) and 90% (84% to 94%) versus 80% (71% to 85%) and 93% (86% to 97%). The SROC meta-analysis confirmed the statistically significant differences, suggesting that diagnostic performances of ACR list criteria are better in established arthritis.
Conclusion: The specificity of ACR 1987 criteria in early RA is low, and these criteria should not be used as diagnostic tools. Sensitivity and specificity in established RA are higher, which reflects their use as classification criteria gold standard.
Statistics from Altmetric.com
Rheumatoid arthritis (RA) is a systemic autoimmune disease with a prevalence of around 1% of the population. It is characterised by chronic inflammation of the synovial joints which leads to progressive joint erosions and eventually to disability and loss of quality of life. This poor prognosis has led to an emphasis on rapid introduction of aggressive treatment by disease-modifying antirheumatic drugs.1 For this purpose, it is important to have diagnostic criteria which could, at an early stage of the disease, determine the diagnosis of RA, thus allowing rapid introduction of these drugs.
Compared with classification criteria, which at a late stage of disease can diagnose a disease with great specificity at the group level, diagnostic criteria should be able to diagnose a disease at an early stage with greater sensitivity.2 3 Classification criteria are developed for clinical research in the field of rheumatology (where a patient with rheumatic disease must be optimally classified from patients in a rheumatology outpatient clinic), or epidemiological studies (where a rheumatic patient must be classified from among a large population comprising healthy subjects). Diagnostic criteria are developed for a diagnostic situation (ie, to help the clinician reach a diagnosis when confronted with a given patient in an outpatient clinic). The objectives but also the development process of these types of criteria are thus very different.2
To date, no such diagnostic criteria have been largely validated in RA. A few sets of criteria constructed for diagnosis have been proposed,4 5 but they have not been externally validated and are not widely used. The main criteria used in RA are the American College of Rheumatology (ACR) or American Rheumatism Association 1987 revised criteria,6 published by Arnett et al. Although these criteria were developed as classification criteria, they are widely used for diagnosis. The ACR 1987 criteria for RA can be applied in list format (patients are required to satisfy at least four of seven components of the criteria list) as reported in table 1 or in a decision-tree format as reported in fig 1. These criteria are described in the initial article as simple, sensitive and specific, and were found to be as good as, or better than, earlier criteria. However, the ACR 1987 classification criteria are not best adapted to diagnose RA at an early stage for several reasons7 8; these criteria were not developed for diagnostic purposes and some of the criteria are rarely fulfilled in the first year after the onset of RA and may therefore lack sensitivity in early RA.7 8 Notwithstanding these theoretical limitations, the ACR criteria are, in practice, widely used for diagnosis, although data regarding their diagnostic capacity are limited.7 8 Furthermore, the situation is clearly different in early RA than in established RA.
The objective of this study was to assess through a systematic literature review, the diagnostic capacity (sensitivity and specificity) of the ACR 1987 RA classification criteria, for the diagnosis of RA, according to disease duration, and to perform a meta-analysis of these diagnostic capacities by receiver operator curves.
A systematic review of the published literature following the Cochrane Collaboration recommendations was performed.9
All articles reporting either sensitivity or specificity of ACR 1987 criteria in RA, or reporting data allowing the calculation of sensitivity or specificity, against a “gold standard” of expert opinion, were included. The analysis was restricted to adults; studies of juvenile arthritis were not taken into account. ACR 1987 criteria were assessed both in list format and in tree format, as available.
The search was conducted using electronic databases (Medline, PubMed and Embase), and abstracts from 2005, 2006 and 2007 international congresses including ACR and European League against Rheumatism congresses, with no limitations by type of publications. The Medline search was last updated on 31 December 2007 and the Embase search was performed on 10 April 2007. Literature retrieval was restricted to adults, human, English, French and Spanish language articles published between 1988 (initial publication of ACR 1987 criteria) and the date of the electronic database search. A free text search was conducted using the following key word combination: (“ACR rheumatoid arthritis criteria” or “ACR 1987” or “ARA 1987”) and “sensitivity”. In addition, references of the papers initially detected were hand searched to identify additional relevant reports. Articles were selected based on the abstract, then on the full text, and were analysed if they reported data allowing the calculation of sensitivity and/or specificity of the criteria for the diagnosis of RA versus expert opinion, used as gold standard (see below).6 7 10–26 Figure 2 reports the results of the article selection process.
The analysis of each manuscript was standardised using a predetermined list by one reader (FB). Because there is no recognised gold standard for the diagnosis of RA, and because the studies reported diagnostic capacities of the ACR criteria against expert opinion, the gold standard used for diagnosis of RA was expert opinion. To determine the sensitivity and specificity of the ACR 1987 criteria we considered the gold standard for the diagnosis of RA to be expert opinion (one or more than one expert), and control populations were composed of patients with other rheumatic diseases. The following data were collected in all studies: definition of RA (gold standard used), number of patients with RA and controls, diagnosis of control patients, distribution of patients between patients with RA (by gold standard) and controls, according to ACR 1987 criteria in list format and tree format, patient characteristics (rheumatoid factor positivity, specific x-ray changes) and disease duration. Studies involving early arthritis (arbitrarily defined as <1 year of disease) were distinguished from established RA (>1 year). This was performed on cross-sectional assessments in order to standardise data, because data for longitudinal prediction of RA were scarce and expert diagnosis was not always available at follow-up.
Descriptive data of patients are reported as means and standard deviation (SD) or percentage. Sensitivity and specificity of ACR criteria for the diagnosis of RA were calculated against the gold standard of expert opinion in all analysed studies.
A meta-analysis of diagnostic performances was performed by assessing the summary receiver operating characteristic (SROC) curves.27–30 The assessment of the SROC curve is derived from a regression model D = α+βS with:
If β is zero, the diagnostic odds ratio is constant among the studies: there is no heterogeneity. In this case, the SROC curve can be summarised by the diagnostic odds ratio. Covariates can be introduced in the regression model. The effect of a covariate is analysed by a classical test of the nullity of the corresponding regression parameter. In this study, the duration of the disease was added in the model as a binary variable. The considered model was D = α+βS+γ duration. The parameters were assessed employing weights to reflect interstudy heterogeneity.27 Goodness-of-fit of the regression model was measured by R2. Testing the difference between the log odds ratios for short and long duration is equivalent to testing H0:γ = 0. The indicator Q* was also assessed because it provides a good summary of the SROC curve: it is the value of the sensitivity at the point of the SROC for which the sensitivity and the specificity are equal. This point is at the intersection of the SROC curve and the line joining the points (0,1) and (1,0). The closer the curve to the top left corner (perfect sensitivity and specificity), the better the accuracy.
Pooled sensitivity and specificity were separately assessed by using the arcsin transformed proportions.31 Heterogeneity was tested with the Cochran Q statistic, and in cases of heterogeneity a random effect model was used.32 Ninety-five per cent confidence intervals (CIs) are given as (lower limit to upper limit). A funnel plot was performed to give visual indications for bias in the studies selected for the meta-analysis.33
Of the 138 publications identified, 19 articles reported interpretable data and were included in the analysis—that is, a total of 7438 patients (fig 2).
The total number of patients with RA was 3883 (546 early RA, 3337 established RA). In total, 3555 patients were included as controls in order to assess the set of ACR 1987 criteria. The control group included the following rheumatic diseases: mechanical pathology—for example, osteoarthritis (1226 patients, 34.5%), systemic autoimmune disease (794 patients, 22.3%), undifferentiated arthritis (489 patients, 13.8%), spondyloarthritis (460 patients, 12.9%), other diagnoses such as fibromyalgia (155 patients, 4.4%), septic or viral arthritis (9 patients, 0.3%) and crystal-induced arthropathies (162 patients, 4.6%) unspecified (260 patients, 7.3%). In patients with RA, rheumatoid factor was positive in 70 (17)% (mean (SD), 3043 available data) and specific radiographic changes were present in 69 (24)% (mean (SD), 1846 available data).
Meta-analysis of diagnostic properties (SROC curves)
Heterogeneity was not detected: the parameter β was not significantly non-null (β = −0.24, SD = 0.26, p = 0.36). The duration of the disease was significant in the regression model (γ = 2.17, SD = 0.60, p = 0.003), indicating diagnostic performances were better in established disease than in early disease. The log odds ratio for early disease was less than for established disease: log odds ratio = 1.86 (95% CI 0.91 to 2.81) versus 4.03 (3.31 to 4.75). The goodness-of-fit of this model was acceptable (R2 = 0.53), and was better than for the model without duration of the disease (R2 = 0.10, results not shown). The indicator Q* was 0.88 (0.85 to 0.92) for established disease and 0.72 (0.62 to 0.81) in early disease (p = 0.002). Figure 3 shows SROC curves taking into account disease duration for list criteria. Diagnostic performances of ACR list criteria were better in established disease than in early disease (comparison of log odds ratios, p = 0.003).
Determination of sensitivity and specificity
Table 2 gives the final results for sensitivities and specificities or ACR criteria for each study and pooled results according to the format of ACR 1987 criteria and the disease duration. Heterogeneity was detected (Cochran Q test; p<0.001) except for ACR 1987 tree and early disease (Cochran Q test, p = 0.17 for sensitivity and p = 0.21 for specificity), probably because of the number of studies (n = 2). Consequently, a random effect model was used. Pooled sensitivity and specificity of ACR 1987 criteria in list format in early RA were 76.5% (68.0% to 84.0%) and 76.5% (68.0% to 84.0%), respectively, versus in tree format, 80.4% (71.7% to 87.8%) and 33.1% (24.3% to 42.6%), respectively. Sensitivity and specificity of ACR 1987 criteria in list format in established RA were 78.6% (71.3% to 85.0%) and 89.5% (84.1% to 93.8%), respectively and in tree format, 80.2% (71.3% to 85.0%) and 92.6% (86.3% to 97.0%), respectively. Figure 4 represents graphically the meta-analysis of sensitivities and specificities.
Robustness of results
A funnel plot was performed for studies assessing the ACR criteria in list format (fig 5). This plot showed a lack of symmetry. The three studies with a large sample size (more than 400 subjects) were all in the far right part of the plot, indicating they had higher diagnostic performances. In the study by Harrison et al,12 the diagnostic performances were especially low, leading to an asymmetry in the low part of the funnel plot. In all, the shape of the funnel plot suggested a bias due to the study size. However, results appear valid and robust. Indeed, two sensitivity analyses for the ACR 1987 criteria in list format were performed dropping the studies leading to an asymmetric funnel plot (one analysis dropping the three studies with the highest sample size, and another one dropping the study with the lowest diagnostic performances). Removing the studies by Arnett et al,6 Bernelot Moens et al20 and Kobayashi et al25 did not modify the results (data not shown). Harrison’s study12 had great influence on the SROC curve because the values of the sensitivity and the specificity were far from the other studies and the sample size was important (n = 289). When this study was removed from the meta-analysis, the SROC curves for established versus early disease were not significantly different (p = 0.055, data not shown).
This systematic review of the diagnostic properties of the 1987 ACR criteria for RA indicates that the performances are better for established disease than for early disease. Sensitivity and specificity of these criteria are moderate in early RA (pooled results, respectively 77–80% and 33–77%); they are, however, better in established RA (respectively 79–80% and 90–93%), which reflects the use of the ACR criteria as gold standard in established RA. These results suggest these criteria should not be used as a diagnosis tool in short duration disease. This analysis also suggests that ACR criteria in tree format may be preferable to list format, as they have almost the same sensitivity (80% vs 79%) but higher specificity (93% vs 90%) in established RA. However, diagnostic capacities of the criteria in tree format in early RA should be interpreted with caution owing to the paucity of data (two studies, 302 patients). In established RA, the higher specificity and higher log odds ratio of the tree format list criteria clearly puts this test at an advantage over the list format. However, the traditional list format is easier to use in clinical practice.
Recently a new paradigm of aggressive treatment of early RA has been proposed, using methotrexate and biological agents, which can slow the progression of the disease and even induce remission if started very early in the disease course.34 But these treatments are not devoid of serious side effects. Because of this “window of opportunity”, it is essential to re-evaluate existing diagnostic tools to better identify subjects with RA for early treatment and for inclusion in clinical trials.
On the whole, this study confirms the report of Saraux et al7: ACR 1987 criteria were not developed as diagnostic criteria for five reasons. First, making a diagnosis was not the goal of the evaluation from which the criteria were derived. Second, some of the patients in that evaluation had longstanding disease. Third, the predictive value of each criterion could not be assessed as the number of patients with RA and controls was predefined. Fourth, the controls had a variety of disorders, some of which were readily distinguishable from RA such as osteoarthritis or fibromyalgia, which may have led to overestimation of the specificity of the criteria. In addition, criteria 5, 6 and 7 of the ACR 1987 list criteria (table 1) are not often fulfilled in the first year after the onset of RA and may therefore lack sensitivity in early RA.
Some modifications in the ACR 1987 criteria may be indicated. Several authors have called for the addition of cyclic citrullinated peptides antibodies (anti-CCP) to the revised classification criteria for RA.35 36 This addition may improve specificity because anti-CCP predict development of RA with a high probability.37 Bilateral compression pain in the metatarsophalangeal joints may increase the probability of diagnosing RA, because it was more strongly associated with erosive arthritis in the Leiden study.4 5 Additionally, new radiological techniques such as sonography may improve early finding of bone erosions, as conventional radiography is usually normal in the earliest months.38 Adding exclusion criteria may produce a further improvement, as proposed by Saraux et al,7 though this has not been assessed. Synovial fluid analysis is a very important tool to distinguish RA from crystal-induced arthritis such as gout. Chondrocalcinosis has been reported to coexist with RA,39 especially in elderly patients, but gout very rarely coexists with RA and should be considered as an exclusion criterion except in unusual circumstances.40 Moreover, applying ACR 1987 criteria “cumulatively” (each criterion satisfied if “ever” positive) rather than “cross sectionally” may improve sensitivity. Thus, Jacobsson et al reported in their study a 28% sensitivity of “ACR criteria but NOT Rome criteria” if applied “cross sectionally” versus 64.6% if applied “cumulatively”.18 Finally, a better selection of the control groups may also improve specificity as suggested by Levin et al13: using only patients with arthritis represents a fair test, since patients in this group are those whose diagnosis is most likely to be confused with RA.
Recently, the Leiden group developed new diagnostic criteria set for early arthritis characterised by the superior ability of the ACR 1987 criteria to discriminate, at the first visit, between self-limiting, persistent non erosive, and persistent erosive arthritis.4 5 The set consists of seven criteria: symptom duration at first visit, morning stiffness of least 1 h, arthritis in three or more joints, bilateral compression pain in the netatarsophalangeal joints, IgM rheumatoid factor positivity, anti-CCP positivity and erosions on radiographs of the hands or feet. However, these diagnostic criteria have not been validated and are not widely used in clinical practice.
This study has strengths: the total number of patients was high (7438 patients) and distribution in the two groups was equal: 52% with RA versus 48% controls. Several continents were represented, reflecting good representativity. The systematic literature review was performed according to Cochrane Collaboration recommendations. Using three languages ensured the search was exhaustive, even if finally all studies were published in English.
But some drawbacks were apparent: as usual in a systematic literature review, reports were heterogeneous. For example, important differences were seen in the characteristics of patients in the control groups: undifferentiated arthritis or peripheral spondyloarthritis, sometimes difficult to distinguish from RA, were mixed with fibromyalgia or osteoarthritis, which are usually readily distinguishable from RA. The difference between early RA and established RA as defined arbitrarily here was not always strictly respected. For example, Bernelot Moens20 included respectively 32% and 35% of early RA in his established RA study and Harrison12 included 16% of established RA in his early RA study. Furthermore, the gold standard using expert opinion was based on one or several doctors’ diagnosis. A panel of doctors might have improved the uniformity of the final diagnosis retained.7 The choice of a gold standard is often a delicate point in diagnostic meta-analyses. We chose to use doctors’ opinion as the gold standard because this was the standard most often reported in RA diagnostic studies. The doctors are expected to diagnose RA based on clinical, biological and radiological examinations; there may of course be some interference with the ACR criteria as these are widely used and widely known. The use of doctors as the gold standard may perhaps explain the low sensitivity and specificity of the 1987 ACR criteria: this conferred the advantage to the rheumatologist(s) who have more information for accurate diagnosis such as family history, presence of a Felty syndrome, positive squeeze test, etc. However this gold standard poses the problem of the variability of expert opinion. Finally, some articles were analysed even when their aim was not to evaluate sensitivity or specificity of ACR 1987 criteria. For example, Pedersen et al10 evaluated the validity of RA diagnoses in the Danish National Patient Registry. All these differences may explain the wide range of sensitivity and specificity reported, as shown in fig 3. In particular, Harrison’s study12 appears to be an “outlier” in terms of sensitivity and specificity, although this study was of good methodological quality. It should be noted that the sensitivity analysis excluding Harrison’s study indicated similar results, although the difference between performances in early and late disease then became non-significant.
A strength of this study is the statistical analysis performed. SROC curves are a validated method to perform meta-analyses of diagnostic properties,29 and a separate meta-analysis was performed to obtain pooled sensitivity and specificity values. Even though the results were heterogeneous, the heterogeneity of the gold standard may have explained some if this. Bias was searched for by a funnel plot. The funnel plot indicated possible bias, therefore several sensitivity analyses were performed, showing that our results are valid and robust.
Two study designs are possible to assess diagnostic properties of scores: cross sectional (eg, this study) or longitudinal (ie, to assess at baseline the criteria and at follow-up the final diagnosis, gold standard). Although a longitudinal study may appear theoretically a better choice, a cross-sectional analysis was the only possibility in our case, in order to standardise data, as data on longitudinal prediction of RA were scarce and expert diagnosis was not always available at follow-up. To determine the diagnostic value, a cohort of patients with early arthritis should be created and followed up for at least 2 years, to determine the final diagnosis, without selection of the number of control patients or control/patients with RA. This should allow the diagnostic value of each clinical, radiological, biological and exclusion criterion to be assessed.
It may be time to revise the 1987 ACR classification criteria for RA to take into account changes which have appeared during the past 20 years, and to develop more sensitive criteria for diagnosis of early RA.
In conclusion, this systematic review allowed us to determine the diagnostic capacity of the gold standard classification criteria in RA. These results may be used as a basis when assessing new diagnostic criteria in RA.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.