Objective To summarise the evidence on the performance of the Assessment of SpondyloArthritis international Society (ASAS) classification criteria for axial spondyloarthritis (axSpA) (also imaging and clinical arm separately), peripheral (p)SpA and the entire set, when tested against the rheumatologist's diagnosis (‘reference standard’).
Methods A systematic literature review was performed to identify eligible studies. Raw data on SpA diagnosis and classification were extracted or, if necessary, obtained from the authors of the selected publications. A meta-analysis was performed to obtain pooled estimates for sensitivity, specificity, positive and negative likelihood ratios, by fitting random effects models.
Results Nine papers fulfilled the inclusion criteria (N=5739 patients). The entire set of the ASAS SpA criteria yielded a high pooled sensitivity (73%) and specificity (88%). Similarly, good results were found for the axSpA criteria (sensitivity: 82%; specificity: 88%). Splitting the axSpA criteria in ‘imaging arm only’ and ‘clinical arm only’ resulted in much lower sensitivity (30% and 23% respectively), but very high specificity was retained (97% and 94% respectively). The pSpA criteria were less often tested than the axSpA criteria and showed a similarly high pooled specificity (87%) but lower sensitivity (63%).
Conclusions Accumulated evidence from studies with more than 5500 patients confirms the good performance of the various ASAS SpA criteria as tested against the rheumatologist's diagnosis.
- Outcomes research
Statistics from Altmetric.com
The Assessment of SpondyloArthritis international Society (ASAS) has developed and validated criteria (ASAS cohort) for spondyloarthritis (SpA), as well as for their subsets, axial (axSpA) and peripheral SpA (pSpA).1 ,2 As in other rheumatic diseases,3 in the absence of a ‘true’ gold-standard expert opinion has been used as an external ‘anchor’ to develop and test the SpA classification criteria. In the original validation studies, the ASAS criteria outperformed other classification criteria.
After their publication, the performance of the ASAS SpA criteria has been tested all over the world in different cohorts using the same approach. Some of these cohorts are expectedly similar to the ASAS cohort, while others differ (eg, setting, inclusion criteria, disease duration). Appropriate data pooling and exploring relevant between-study differences yield unique insights into the criteria performance and applicability in a broad population of patients.
The aim of this systematic literature review is to summarise the published data pertaining to the performance of the ASAS classification criteria for axSpA (also ‘imaging arm’ and ‘clinical arm’ separately), pSpA and the entire SpA set when tested against the rheumatologist's diagnosis.
The scope of the literature search was defined according to the PICO format (patients, intervention, comparator, outcomes; online supplementary table S1).4 MEDLINE and EMBASE databases were searched without language restriction. Eligible studies were observational cohorts assessing the performance of the ASAS SpA criteria against the rheumatologist's diagnosis, published from March 2009 (date of the axSpA ASAS criteria release) up to August 2016. Studies in which the primary aim was not assessing the performance of the ASAS criteria but still provided enough data to allow such an analysis were also included. In order to retrieve additional references, abstracts from the American College of Rheumatology and European League Against Rheumatism annual conferences (2014 and 2015) were searched. Only studies with full text available were included, since abstracts neither provide appropriate detail for risk of bias (RoB) assessment nor appropriate data for analysis. Details on the search strategy are provided in online supplementary text 1.
Study selection, data extraction and assessment of risk of bias
Two reviewers (AS and RR) independently screened all titles and abstracts to identify eligible studies fulfilling the inclusion criteria followed by full-text review if appropriate (articles excluded and reason thereof in online supplementary table S2). Both reviewers independently extracted data on the studies' main characteristics, patient characteristics and disease characteristics, and criteria performance (ie, sensitivity, specificity, likelihood ratios of the ASAS criteria against the rheumatologist's diagnosis). Authors of the selected publications were contacted to obtain raw data (2×2 tables necessary for meta-analysis) on criteria performance, when this information was not available in the publication. The same two reviewers independently assessed the RoB of each study using the Quality Assessment of Diagnostic Accuracy Studies 2 tool.5 Disagreements were resolved by consensus, and a third review author was involved when necessary (DvdH).
Pooled sensitivity and specificity were estimated by random effects bivariate generalised linear mixed models. Parameter estimates from each model were used to derive the positive likelihood ratio (LR+) and negative LR (LR−) and 95% CIs. In case of limited data, two univariate random effects models were used by assuming no correlation between sensitivity and specificity.6 Separate models were fit for the axSpA criteria, the pSpA criteria and the SpA criteria. The ‘imaging arm’ and the ‘clinical arm’ of the axSpA criteria were analysed separately using two approaches: (i) considering all patients that fulfil each arm irrespective of fulfilment of the other and (ii) considering patients that fulfil one arm exclusively.
A series of sensitivity analyses was performed (whenever possible and appropriate) to assess the effect of the following on the criteria performance: (i) target population (original validation study inclusion criteria vs different inclusion criteria); (ii) risk of bias (low vs high RoB); (iii) study's main aim (criteria performance assessment vs other); (iv) setting (hospital vs community) and (v) symptom duration (<2 years vs ≥2 years).
All analyses were performed in Stata V.12.1. The Cochrane Collaboration's Review Manager Software V.5.3 was used to build forest plots.
Of 1486 screened articles (after deduplication), 9 fulfilled the inclusion criteria (table 1).1 ,2 ,7–13 All but one study were considered to be at low RoB (see online supplementary table S3). In total, 5739 patients (range: 157–1210) had been included, and 2936 (51.2%; range: 25.2%–69.4%) had been diagnosed by the rheumatologist as SpA.
This literature review included the original studies in which the axSpA criteria and the pSpA criteria (also the entire set) were validated.1 ,2 In addition, five studies assessed the ASAS axSpA criteria,8–10 ,12 ,13 one study assessed the pSpA criteria7 and one study the SpA criteria (providing separate data also for the axSpA and pSpA criteria).11 Raw data on the criteria performance were obtained from all, except two studies.12 ,13
In table 1, main patient characteristics and disease characteristics per study are shown. The majority of the studies assessing the axSpA criteria had similar inclusion criteria compared with the original validation study.8–10 ,12 ,13 However, in one study, inflammatory back pain was required, or otherwise patients had to have one additional SpA feature.11
Two studies assessing the pSpA criteria used different inclusion criteria as compared with the ASAS cohort. In one study, only patients with peripheral arthritis were included (excluding those with only enthesitis or dactylitis),7 while in another study patients had to have typical SpA arthritis (asymmetrical, and predominantly in lower limbs) or arthralgia associated with one additional SpA feature (not including enthesitis and dactylitis).11
Performance of the ASAS SpA classification criteria
The sensitivity and specificity of the various criteria for each individual study are shown in figure 1, and the results of the meta-analysis in table 2. The ASAS SpA criteria were assessed in two studies (N=1750) yielding a high pooled sensitivity and specificity (73%; 88%).2 ,11
Three studies (N=749) assessed the ASAS pSpA criteria.2 ,7 ,11 Although specificity was consistently high (82%–90%; pooled: 87%), sensitivity was much lower in the two studies, with inclusion criteria differing from the original validation study (49%–56% vs 78%; pooled: 62%).
Seven studies, with 4990 patients in total, together generated a very high pooled sensitivity and specificity (82% and 87% respectively) for the axSpA criteria, with little variation across studies.1 ,8–13 The pooled sensitivity of the ‘imaging arm’±‘clinical arm’ and ‘clinical arm’±‘imaging arm’ was 57% and 49%, respectively (26% and 23% when considering patients fulfilling each arm exclusively). High estimates of pooled specificity were found for both ‘arms’, irrespective of the definition (range: 92%–97%). However, the LR+ of the ‘imaging arm’ only was higher as compared with the ‘clinical arm’ only (9.6 vs 3.6).
The ASAS axSpA criteria performed similarly well irrespective of the population in which they were applied, the setting, symptom duration, RoB and study's main aim (sensitivity (range): 78%–85%, specificity (range): 80%–93%; online supplementary table S4). Due to a scarcity of data, sensitivity analyses for the ‘imaging arm’ and ‘clinical arm’ of the axSpA criteria, the pSpA criteria and the SpA criteria could not be performed.
Pooled data from eight cohorts (including more than 5500 patients) confirm the good performance of the various ASAS SpA classification criteria as tested against the rheumatologist's diagnosis. This review confirms that splitting the ‘arms’ of the axSpA criteria results in loosing sensitivity while retaining specificity, which indicates that the full set of axSpA criteria is the preferred set.
While the pooled specificity for both the axSpA criteria and pSpA criteria was similarly high (87% for both), the pooled sensitivity for the pSpA criteria was much lower than that for the axSpA criteria (62% vs 82%). This difference may be explained by restrictive inclusion criteria. Unlike the ASAS cohort, the Early Arthritis Clinic cohort only included patients with arthritis, and not those with dactylitis only or enthesitis only.7 Similar ‘restrictions’ were seen in the ESPERANZA cohort.11 The low sensitivity found in these studies suggests that both enthesitis and dactylitis are considered by the rheumatologists as fitting the pattern of pSpA, which adds to the credibility of the ASAS pSpA criteria (that include these presentations).
Sensitivity analyses have shown the ‘robustness’ of the axSpA criteria when applied in different settings (hospital and community), in patients with short (<2 years) and long (≥2 years) symptom duration and in different populations.
Not surprisingly, the splitting of the axSpA criteria into two ‘arms’ compromised sensitivity, but retained (very high) specificity, if patients that fulfil each ‘arm’ irrespective of fulfilment of the other were considered, and if those that fulfil one ‘arm’ exclusively were analysed. The larger LR+ for the ‘imaging arm’ as compared with the ‘clinical arm’ reflects the rheumatologist's reliance on positive imaging findings. The prospective validation of the ASAS criteria against the rheumatologist's diagnosis after >4 years of follow-up in the ASAS cohort has shown that both ‘arms’ still properly discriminate between axSpA and no-axSpA.14 Another prospective study has also suggested the arms' low specificity when tested against radiographic sacroiliitis (modified New York criteria) after 8 years of follow-up (‘imaging arm’: 22%; ‘clinical arm’: 56%), but the setting in this study was a prognostic rather than a diagnostic setting, and figures are difficult to interpret.15
In conclusion, the ASAS axSpA and pSpA criteria have shown to perform well in patients included in several cohorts all over the world, as assessed by rheumatologists. This review does not give resolution to the applicability of the ASAS classification criteria in primary care, since such a setting had not been tested. It is important to realise that the criteria's performance depends entirely on the prevalence of SpA in the underlying population (pretest likelihood).
The authors thank the authors of the included papers for providing raw data. They also thank Yemisi Takwoingi, co-convenor for the Cochrane Screening and Diagnostic Test Methods, for the support in statistical analysis.
Handling editor Tore K Kvien
Contributors Study concept and design: AS, SR, RL and DvdH. Data collection: AS and RR. Statistical analysis and data interpretation: AS, SR, RR, RL and DvdH. All authors revised the manuscript critically for important intellectual content and gave final approval of the version to be published. AS prepared the first version of the manuscript.
Funding AS received a research grant from Fundação para a Ciência e Tecnologia (grant number: SFRH/BD/108246/2015).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.