Objective To establish the predictive validity of the Assessment of SpondyloArthritis international Society (ASAS) spondyloarthritis (SpA) classification criteria.
Methods 22 centres (N=909 patients) from the initial 29 ASAS centres (N=975) participated in the ASAS-cohort follow-up study. Patients had either chronic (>3 months) back pain of unknown origin and age of onset below 45 years (N=658) or peripheral arthritis and/or enthesitis and/or dactylitis (N=251). At follow-up, information was obtained at a clinic visit or by telephone. The positive predictive value (PPV) of the baseline classification by the ASAS criteria was calculated using rheumatologist's diagnosis at follow-up as external standard.
Results In total, 564 patients were assessed at follow-up (345 visits; 219 telephone) with a mean follow-up of 4.4 years (range: 1.9; 6.8) and 70.2% received a SpA diagnosis by the rheumatologist. 335 patients fulfilled the axial SpA (axSpA) or peripheral SpA (pSpA) criteria at baseline and of these, 309 were diagnosed SpA after follow-up (PPV SpA criteria: 92.2%). The PPV of the axSpA and pSpA criteria was 93.3% and 89.5%, respectively. The PPV for the ‘clinical arm only’ was 88.0% and for the ‘clinical arm’±‘imaging arm’ 96.0%, for the ‘imaging arm only’ 86.2% and for the ‘imaging arm’+/-‘clinical arm’ 94.7%. A series of sensitivity analyses yielded similar results (range: 85.1–98.2%).
Conclusions The PPV of the axSpA and pSpA criteria to forecast an expert's diagnosis of ‘SpA’ after more than 4 years is excellent. The ‘imaging arm’ and ‘clinical arm’ of the axSpA criteria have similar predictive validity and are truly complementary.
- Outcomes research
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The term spondyloarthritis (SpA) encompasses a group of chronic rheumatic diseases sharing common clinical, genetic and imaging features. Patients with SpA can be divided (with some overlap) according to their clinical presentation into axial SpA (axSpA), for those with predominantly axial symptoms, and peripheral SpA (pSpA) if peripheral manifestations dominate the clinical picture.
It has become evident that the requirement for the presence of radiographic sacroiliitis, as defined by the modified New York (mNY) criteria,1 leads to a delayed diagnosis of axSpA.2 ,3 Magnetic resonance imaging (MRI) has been proven to detect inflammation in the sacroiliac joints early in the disease course, far before structural changes are seen in radiographs.4 ,5 These findings have initiated the aggregation of patients with non-radiographic (nr-axSpA) and radiographic axial SpA (r-axSpA—also known as ankylosing spondylitis), under one ‘umbrella’ term being axSpA. The Assessment of SpondyloArthritis international Society (ASAS) has published criteria for axSpA and pSpA.6–8
Since their release, the ASAS criteria have been implemented worldwide. In the original validation studies,7 ,8 the new ASAS criteria proved to reflect the current perception of what ‘SpA looks like’ (‘gestalt’) better than the European Spondyloarthropathy Study Group9 and Amor10 criteria when tested against the expert's diagnosis. After that, the ASAS axSpA criteria,11–13 the pSpA14 criteria and the entire set15 ,16 have consistently shown good criterion and construct validity.
However, it has been argued that the ASAS axSpA criteria are too loose and include patients without SpA (mislabelling):17 Patients with nr-axSpA are more often women and have lower C reactive protein (CRP) levels when compared with patients with r-axSpA.18–20 Recent studies have suggested that the ‘clinical arm’ could drive such differences.11 ,21 However, the same studies have also shown that patients classified by the ‘imaging arm’ and ‘clinical arm’ are similar regarding the presence of SpA features and burden of clinical symptoms. Moreover, it has been hypothesised that the male gender is a risk factor for the development of radiographic damage,2 and it has been shown that the elevated CRP drives progression to r-axSpA,22 thereby explaining, at least partially, these differences in the nr-axSpA subpopulation.
While previous validation studies have shown high specificity of the ASAS criteria, mostly in cross-sectional analyses (except for one follow-up study in a Chinese population13), these studies do not give resolution with regard to predictive validity: will patients with a classification of axSpA still be considered as having a diagnosis of SpA after some years.
A similar question pertains to the pSpA criteria. Some claim that an entry symptom of arthritis may easily include patients with other forms of early arthritis,23 and that the entry symptom of ‘enthesitis’ may evoke confusion with non-inflammatory diseases.24
Hence, it had been upfront decided that patients from the validation cohort would be reassessed after 5 years. Therefore, the aim of this study was to establish the predictive validity of an ASAS classification—either as axSpA (also split by imaging and clinical arm) or pSpA—by comparing such a classification with the final diagnosis after follow-up in the original ASAS cohort.
The ASAS cohort is an international, multicentre, prospective study. From November 2005 to January 2009, rheumatologists from 29 ASAS centres worldwide have included 975 consecutive patients who first presented for diagnostic work-up. To be included, eligible patients had to fulfil one of two criteria: (1) ‘axial population’: chronic (>3 months) back pain of unknown origin (no definite diagnosis) with an age of onset below 45 years, with or without peripheral symptoms; and (2) ‘peripheral population’: patients with peripheral arthritis and/or enthesitis and/or dactylitis and the absence of current back pain with suspicion of SpA but no definitive diagnosis.7 ,8
All patients were assessed at baseline and after a mean follow-up of 4.4 years (range: 1.9–6.8). Of the 29 original ASAS centres, 22 participated in the follow-up corresponding to 909 of the original 975 patients. At follow-up, these patients were contacted to assess their willingness to attend the follow-up visit. A total of 345/909 physically attended the follow-up visit and 219 provided only information via telephone (figure 1). Of the 22 participating centres, 10 had ≥75% patients with follow-up data available (N=291), while 12 had <75% (N=273).
The current Good Clinical Practice guidelines were followed, and the study has been approved by the local ethics committees. All patients provided written informed consent at the baseline visit that also included the follow-up visit.
Clinical, laboratory and imaging data were collected for all patients at baseline. The same assessments (except for HLA-B27 typing) were also performed at follow-up for patients attending the follow-up visit. For these patients, the rheumatologist provided a diagnosis at both time points (not necessarily the same clinician). Patients assessed by telephone at follow-up had also received a diagnosis by the rheumatologist at baseline, while the follow-up diagnosis was self-reported: Patients were asked whether during follow-up they had received a diagnosis that was different from the diagnosis based on the first study visit. Details on the methods used for data collection were previously published and were similar for both the ‘axial population’ and ‘peripheral population’.7 ,8 A summary of these methods is provided in the online supplementary appendix 1.
All patients with follow-up data available were considered in the analysis (N=564). The rheumatologist's diagnosis (SpA vs no-SpA) at follow-up was used as external reference (combining the follow-up visit and telephone diagnosis), against which the baseline ASAS classification was tested. The rheumatologists did not have access to the patients’ baseline classification status according to the ASAS criteria. Missing values for baseline SpA features were interpreted as being absent. For patients assessed at follow-up, the level of confidence about the diagnosis was recorded on a numerical rating scale from 0 (not confident at all) to 10 (very confident).
The predictive validity of the baseline ASAS classification for axSpA and pSpA was analysed in terms of positive predictive value (PPV) and negative predictive value (NPV). Similarly, the entire set was assessed combining the axSpA criteria (applied in patients with predominant back pain with/without peripheral manifestations) with the pSpA criteria (applied in patients with currently exclusive peripheral manifestations). The ‘imaging arm’ and the ‘clinical arm’ of the axSpA criteria were analysed separately using two approaches: (1) considering all patients who fulfil each arm irrespective of fulfilment of the other and (2) considering patients who fulfil one arm exclusively.
In addition, the ASAS criteria predictive validity was assessed separately for countries with a low versus high background prevalence of HLA-B27 (median prevalence used as cut-off).
Three sensitivity analyses were performed to assess the possible effects of the following on the predictive validity results: (1) missing baseline data, (2) telephone versus physical visit and (3) completeness of reassessed patients per centre. First, an analysis was performed on patients with complete data on all SpA features at baseline (N=345); Second, an analysis only on patients who physically attended the follow-up visit (N=345) was done. By chance the same number of patients, but different patients (n=345), were included in these analyses; finally, a ‘≥75% complete follow-up analysis was done, including only patients from centres with high levels of follow-up participation (N=291).
Data analysis was performed using STATA V.12.1.
Table 1 describes the baseline characteristics comparing patients with/without follow-up data available and comparing patients assessed at the follow-up visit or by telephone. These groups were globally comparable.
At the end of follow-up, 396 (70.2%) patients were diagnosed as SpA (257 (64.9%) in the follow-up visit group and 139 (35.1%) in the telephone group), while 168 (29.8%) received either another diagnosis or no diagnosis at all. Among the ‘axial population’, 280 (71.1%) were diagnosed as axSpA, while among the ‘peripheral population’ 116 (68.2%) got a diagnosis of pSpA. Table 2 shows the baseline characteristics of all patients with SpA and split for axSpA and pSpA. Additional information on baseline characteristics is provided in online supplementary tables S1 and S2.
Change in diagnosis and symptoms from baseline to follow-up
Among the 394 patients from the ‘axial population’, the baseline diagnosis was changed in 37 (30/246 (12.2%) in the follow-up visit group and 7/148 (4.7%) in the telephone group). Of these 394 patients, 246 were assessed at the follow-up visit (figure 1), providing information on the predominance of manifestations. The majority (185; 75.2%) maintained the same symptomatic pattern they had at baseline (ie, back pain+/-peripheral manifestations), with few presenting with only peripheral symptoms (15; 6.1%) and 46 (18.7%) becoming asymptomatic. The majority of these asymptomatic patients were treated during follow-up (41; 89.1%) and half (23; 50.0%) were still receiving medication at the follow-up visit (NSAIDs: 10 (43.5%); methotrexate: 2 (8.7%); tumour necrosis factor inhibitors (TNFi): 6 (26.1%); and 5 (21.7%) different combinations).
Of the 170 patients from the ‘peripheral population’, 19 (11.1%) had their diagnosis changed between baseline and follow-up (18/99 (18.2%) in the follow-up visit group and 1/71 (1.4%) in the telephone group). Of these 170 patients, 99 were assessed at the follow-up visit and only 31 (31.3%) maintained exclusive peripheral symptoms at follow-up, while 37 (37.4%) developed back pain and 31 (31.3%) became asymptomatic. Similar to the ‘axial population’, also the majority of asymptomatic patients (22; 71.0%) were treated during follow-up, and 16 (51.6%) still needed treatment at the follow-up visit (NSAIDs: 7 (43.8%); methotrexate: 1 (6.3%); TNFi: 3 (18.8%); and 5 (31.3%) different combinations).
In total, 77 (22.3%) patients were asymptomatic at the follow-up visit. On the other hand, 109 (31.6%) patients developed at least 1 new SpA feature compared with baseline.
Predictive validity of the ASAS SpA classification criteria
The predictive validity of the ASAS SpA classification criteria is presented in table 3 and figure 2. Of the 564 patients with follow-up assessment, 335 had fulfilled the axSpA or pSpA criteria at baseline and 229 had not. Of these 335 patients, 309 were diagnosed as SpA at follow-up (PPV: 92.2%). Of the 229 patients not fulfilling ASAS criteria at baseline, 142 were indeed considered having no or another diagnosis than SpA (NPV: 62.0%), but 87 received a diagnosis of SpA at follow-up. The PPV of the axSpA and pSpA criteria was 93.3% and 89.5%, respectively.
The PPV of the ASAS SpA criteria did not differ when applied in patients from countries with high versus low background HLA-B27 prevalence (91.2% and 92.7%, respectively; online supplementary appendix 3).
The sensitivity analyses yielded a PPV of the ASAS SpA (range: 92.6–95.1%), axSpA (range: 93.4–95.1%) and pSpA (range: 87.9–95.7%) criteria similar to the main analysis (table 4). Comparable results were found for the ‘imaging arm’ (range: 94.5–96.5%) and ‘clinical arm’ (range: 96.4–98.2%); and also considering those fulfilling the ‘imaging arm’ only (range: 85.1–86.7%) and ‘clinical arm’ only (range: 87.9–92.9%) (see online supplementary appendix 4).
Imaging arm of the axSpA criteria
Among the 240 patients classified positive according to the axSpA criteria at baseline, 190 (79.2%) had sacroiliitis on imaging (radiograph and/or MRI), hence fulfilling the ‘imaging arm’ (irrespective of fulfilment of the ‘clinical arm’). Remarkably, when imaging was positive, almost all patients were classified positive (190/193: 98.4%) by the axSpA criteria at baseline and almost all received a SpA diagnosis at follow-up (PPV: 94.7%). The PPV was similarly high comparing patients with only radiographic sacroiliitis (n=42; PPV: 97.6%), only sacroiliitis on MRI (n=117; PPV: 94.9%) and with both (n=31; PPV: 90.3%).
Similarly, patients fulfilling the ‘imaging arm’ only (thus excluding patients who also fulfil the ‘clinical arm’) had a high probability (PPV: 86.2%) of being diagnosed axSpA after more than 4 years (mean (SD) level of confidence: 8.6 (1.5)).
Clinical arm of the axSpA criteria
The PPV of the ‘clinical arm’ (±‘imaging arm’) was 96% and the majority of the 50 patients fulfilling the ‘clinical arm’ only at baseline were diagnosed as SpA at follow-up (PPV: 88.0%). Similar to the ‘imaging arm’ only, the follow-up diagnosis for these 50 patients was established with high confidence (mean: 8.5 (SD: 1.5)) and was consistent with baseline diagnosis: of the 44 patients diagnosed as axSpA at follow-up, 38 (86.4%) had also received the same diagnosis at baseline.
Patients fulfilling the ‘clinical arm’ only had a mean of 3.4 (SD: 1.1) SpA features at baseline, and inflammatory back pain (43; 86.0%) was most prevalent, followed by good response to NSAIDs (34; 68.0%), peripheral arthritis (23; 46.0%) and elevated CRP (20; 40%). The large majority (36; 72.0%) of these patients still had either axial or peripheral symptoms at the end of follow-up.
The long-term follow-up of the original ASAS cohort provided an excellent predictive validity for the ASAS axSpA and pSpA classification criteria and for the combined set. In addition, patients fulfilling the ‘clinical arm’ had disease characteristics in accordance with the rheumatologists’ perception of what ‘SpA looks like’ (‘gestalt’) resulting in a good predictive validity similar to that of the ‘imaging arm’.
A previous report on the ASAS axSpA criteria predictive validity has shown similarly good results (PPV: 87.9%).13 However, this study was limited to Chinese patients and had a short follow-up (2 years). Moreover, patients with r-axSpA and with predominantly peripheral manifestations were excluded limiting the study's external validity.
The current study is the first prospectively testing the entire set of the ASAS SpA criteria against the rheumatologist's diagnosis in a worldwide population over 4 years later. In fact, most of previous studies tested the ASAS criteria concurrent validity, where both the criteria and the ‘external reference’ (rheumatologist's diagnosis) were determined simultaneously. In the current study, the time lag between the criteria application (baseline) and the rheumatologist's diagnosis (follow-up) allowed assessment of the criteria accuracy for predicting a diagnosis of SpA taking into account the disease course (predictive validity).
Several metrics are generally used to describe criteria performance, among which sensitivity and specificity are the most often reported. However, since these metrics are defined on the basis of subjects with or without the disease, they do not inform about the probability of having SpA once the criteria are applied (post-test probability).25 This probability is given by the predictive values (both positive and negative), which, as stated above, are particularly informative when derived from longitudinal studies, such as the ASAS cohort.
The somewhat low NPV should be interpreted cautiously in the context of a longitudinal study, particularly in SpA, which exhibits often an evolving character with increasing number of manifestations over time. Indeed, during follow-up approximately one third of the patients developed at least one additional SpA feature, which may explain why some patients not captured by the ASAS criteria at baseline were regarded as SpA by the rheumatologist at follow-up. Thus, the NPV may reflect the number of patients with SpA that, at baseline, are not captured by the criteria and also the natural course of the disease.
It has been argued that when applied in clinical practice, the ‘clinical arm only’ carries the risk of misclassification.17 ,24 In that sense, it is a common belief that the ‘clinical arm’ adds sensitivity to the axSpA criteria, while compromising specificity. Our findings do not support these claims. On the contrary, we found similarly high PPVs for both arms of the axSpA criteria. Moreover, the additional patients captured by the ‘clinical arm’ showed a ‘SpA-like’ phenotype, which persisted over time, possibly explaining the consistency and the high level of confidence for the diagnosis of this subgroup. These data support the view that the ‘clinical arm’ comprises a group of patients who belong to the SpA spectrum as much as those fulfilling the ‘imaging arm’. Thus, the ‘clinical arm’ is truly complementary and may be of particular use when imaging is not available.
A noteworthy finding in this study is the dominant place that sacroiliitis on MRI holds in the ASAS axSpA criteria. Remarkably, almost all patients who had sacroiliitis on imaging were classified ‘positive’ and most patients fulfilling the ‘imaging arm’ had only sacroiliitis on MRI (without radiographic sacroiliitis). The fact that most of these were indeed diagnosed as axSpA at follow-up (PPV: 94.9%) demonstrates how well the axSpA criteria reflect the rheumatologists’ expectations on the ability of sacroiliitis on MRI to discriminate between patients with and without axSpA. However, it is important to highlight that sacroiliitis on MRI was at the basis of the nr-axSpA concept18 and instigated the development of the ASAS axSpA criteria.2 Hence, circularity in reasoning cannot be excluded, but is not necessarily detrimental as long as sacroiliitis on MRI truly reflects the disease consequences closely linked to their risk factors and pathophysiology as it is currently believed. More research is needed to clarify this issue.
The HLA-B27 prevalence in patients with pSpA was expectedly lower (48.3%) than in axSpA, but similar to what is known for pSpA and also found in another recent cohort (47.5%; early arthritis clinic: EAC).14 Despite this, the prevalence of pSpA in that cohort was much lower (3.8%) when compared with the current study (68%). Importantly, the pSpA criteria discriminated well between pSpA and no-SpA (PPV: 89.5%), even with similar proportions of peripheral arthritis in both groups (91.4% vs 90.7%). However, there was a significant difference in the proportion of enthesitis (60.3% vs 25.9%), which was infrequent in the EAC cohort (17.1%), possibly reflecting different inclusion criteria. This may, at least in part, explain the pSpA prevalence disparity between the two cohorts and stresses the central role of enthesitis in the disease. Thus, the allowance of enthesitis as an entry feature yields more pSpA cases without increased risk of mislabelling, as previously suggested.
This study has a number of limitations. The most relevant one is the high number of patients without follow-up data. Attrition unfortunately is common in long-term follow-up studies, especially if there is no regular protocol with assessments between the baseline and follow-up visit. Understandably, patients who complied with a follow-up visit had more active sacroiliitis on MRI at baseline, deemed to be associated with ‘worse prognosis’. Hence, it could be expected that, if ‘good prognosis’ patients have preferentially dropped out, the performance of the criteria in centres with high participation rates (≥75% complete data) would be worse than in centres with low participation rates. However, this was not the case and argues against ‘channelling bias’ causing a spuriously high PPV. Finally, patients with less definite (‘equivocal’) diagnoses at baseline were not more likely to be lost to follow-up either since the level of diagnostic confidence was almost identical in patients with follow-up (mean (SD): 8.3 (1.5)) compared with those lost to follow-up (8.2 (1.5)).
Missing data on MRI is another potential limitation. However, missing data are common in observational cohorts, as they reflect clinical practice, where clinicians must make decisions (on diagnosis) even without complete information. It is plausible to assume that in such a scenario, missing information can best be considered negative. Nevertheless, it is always possible that patients diagnosed as no-SpA at baseline are more likely to have missing data, which would decrease their likelihood of fulfilling the criteria. Under that scenario, an analysis of patients with complete information only would yield worse PPVs, but that was not what we found.
Another limitation of this study is the self-reported diagnosis in some patients. However, the predictive values of the ASAS criteria in all patients versus patients who presented physically at a follow-up visit were similar, which adds to the credibility of the self-reported diagnosis provided by telephone.
In conclusion, and keeping in mind how the above-mentioned constraints were handled in the analysis, the ASAS SpA criteria have proven to accurately discriminate between patients with and without the disease when applied in patients with similar symptoms. Therefore, the ASAS criteria are valid for selecting patients for clinical and therapeutic trials and, especially when applied in settings similar to the ASAS cohort, they may guide rheumatologists in establishing a proper diagnosis.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Handling editor Tore K Kvien
Contributors Study concept and design: MR, JS, RL and DvdH. Statistical analysis and data interpretation: AS, RL, MR, JS and DvdH. Data collection: MR, JS, RL, DvdH, NA, JB, JB, EC-E, MD, OF, FH, JG, YK, WPM, HM-O, IO, SO, ER, SS, IJS, RV-O, FVdB, IvdH-B, UW and JW. All authors collaborated on further data interpretation, revising the manuscript critically for important intellectual content and gave final approval of the version to be published. ARS prepared the first version of the manuscript.
Funding This study was supported financially by ASAS. AS received a research grant from ASAS for a fellowship, during which the study analysis was performed.
Competing interests None declared.
Patient consent Obtained.
Ethics approval Local Ethics Committees.
Provenance and peer review Not commissioned; externally peer reviewed.