Statistics from Altmetric.com
Ankylosing spondylitis (AS) is a chronic inflammatory rheumatic disease with a wide variety of spinal and extraspinal signs and symptoms. The concept of disease activity—a reflection of the underlying inflammation—encompasses a wide range of domains and measures.1 As currently used single component measures or indices have limitations because they measure only one aspect of the disease, are fully patient or physician orientated or lack face and/or construct validity, the Assessment of SpondyloArthritis international Society (ASAS) has developed a disease activity score (DAS) for use in AS, the ASDAS.2 The development of the ASDAS was statistically derived in analogy with the development of the DAS in rheumatoid arthritis. The development process resulted in four candidate ASDAS scores (table 1). Assessment of back pain, duration of morning stiffness and at least one acute phase reactant are included in all four ASDAS scores. Two versions of the ASDAS include both erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP), and the other two versions of the ASDAS include either ESR or CRP. Patient global assessment of disease activity, pain and swelling in peripheral joints and fatigue are part of one or more ASDAS versions. A first validation was performed in the Outcome in Ankylosing Spondylitis International Study (OASIS) database, and all ASDAS scores performed equally well and at least as good as—but frequently better than—the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI).3 4 Since patients in OASIS have on average a rather low level of disease activity and treatment efficacy could not be assessed, further validation in other independent databases was needed before all aspects of validity of the ASDAS in comparison with available measures could appropriately be interpreted. In particular, the discriminatory validity against external constructs—such as high and low disease activity, change over time upon effective treatment and discrimination between treatments with a known differential treatment effect—are important. In the present validation study we were able to address these aspects of truth and discrimination in two independent datasets: (1) the Norwegian register NOR-DMARD which collects data on all patients with inflammatory arthropathies including AS starting with a conventional disease-modifying antirheumatic drug (DMARD) or a tumour necrosis factor (TNF) blocker; and (2) a dataset of patients participating in double-blind placebo controlled randomised clinical trials (RCTs) with TNF blockers. We also took the opportunity to compare the validity of the ASDAS versions with not only the BASDAI but all individual components of the BASDAI, the ASDAS scores and the patient and physician global assessment.
NOR-DMARD is a patient registry which includes all consecutive patients from five Norwegian rheumatology departments who had AS according to the treating physician and were starting a new DMARD regimen from December 2000 onwards.5 Measures of disease activity and health status were assessed at baseline, 3 months, 6 months, 12 months and yearly thereafter. For the present analysis, all patients with baseline data were used for cross-sectional analysis (n = 618). For the longitudinal analyses all patients with at least one follow-up visit at either the 3-month or 6-month time points were included (n = 297). Since the BASDAI was added to the list of assessments only from 2006 onwards, assessments including (parts of) BASDAI could be based only on the patients entering the database after 2006 (n = 217 for cross-sectional analysis and n = 54 for longitudinal analysis). Due to the varying patient numbers in the various analyses, it is not useful to present baseline characteristics of these patients in detail, but the patients are an appropriate representation of patients with AS in general seen by rheumatologists in Norway. For example, of all the patients included in the registry, 70% were male, 91% were positive for HLA-B27, the mean (SD) age was 42.5 (10.5) years and mean (SD) disease duration since diagnosis was 12.0 (10.2) years.
The second dataset consists of all patients with AS according to the modified New York criteria participating in RCTs comparing TNF blockers and placebo in the following centres: Maastricht University Medical Center, Charité Medical University Berlin, Rheumazentrum Ruhrgebiet Herne and Ghent University Hospital. A total of 137 unselected patients could be included with baseline and 3-month follow-up data (29 patients equally divided between active treatment and placebo had only 6-week follow-up data as this was the end of the placebo period). The patients were participating in trials evaluating either adalimumab 40 mg every other week (n = 44) or etanercept 50 mg weekly (n = 52) or infliximab 3–5 mg/kg every 6–8 weeks (n = 81), with 86 patients in the anti-TNF groups and 91 patients in the placebo groups. Of the patients included in the RCTs, 64% were male, 82% were positive for HLA-B27, the mean (SD) age was 39.8 (10.5) years and mean (SD) disease duration since diagnosis was 12.2 (9.8) years.
The following disease activity assessments were available in both datasets: patient global assessment of disease activity, physician global assessment of disease activity and the six individual questions of the BASDAI (BASDAI 1, fatigue; BASDAI 2, total back pain; BASDAI 3, pain and swelling of joints; BASDAI 4, pain at enthesis locations; BASDAI 5, severity of morning stiffness; BASDAI 6, duration of morning stiffness (10 representing a duration of 2 h or longer). All scores were obtained on a 0–10 visual analogue scale (VAS), with 0 representing the normal situation and 10 the most extreme situation. In addition, ESR (mm/h) and CRP (mg/l) levels were obtained. With these assessments, the four versions of the ASDAS as described in table 1 and the BASDAI could be calculated.
As there is no gold standard to assess disease activity in AS, several constructs that are compatible with an external standard representing high and low disease activity states were created. In the NOR-DMARD database, patients were grouped into high and low disease activity according to the physician at baseline (>6 vs <4; patients with intermediate values were left out) and after 6 months of treatment (⩾4 vs <4; since there were too few patients in the >6 category). The patients answered the following question to assess whether they were in an acceptable symptom state (PASS): “Is your current condition satisfactory considering your general level of functioning and pain” (yes/no), and this was also used to group the patients into high and low disease activity based on their judgement. Cut-off points for clinical improvement were explored by the following question: “Did you experience considerable improvement since the start of your treatment?” (yes/no) after 3 and 6 months of treatment. Moreover, the assumption was made that patients treated with TNF blockers would have a greater response than those treated with DMARDs. In the RCTs, discrimination and sensitivity to change was based on a differential change in the patients treated with TNF blockers compared with patients treated with placebo. As it is important to know if the various disease activity measures perform equally well in different subgroups, patients in the RCTs were divided into those with raised versus normal baseline CRP levels and those with versus those without peripheral arthritis at baseline.
Pearson correlations between the patient global assessment and the physician global assessment and all the individual and combined scores were calculated. To assess the discriminatory capacity of the indices and measures with respect to patient subgroups with high and low disease activity, the standardised mean difference (SMD) was calculated (difference of the group means divided by the pooled SD of the group means). This SMD is unitless and can be used to compare the discriminatory ability across the various measures: the higher the value, the greater the discriminatory capacity. Confidence intervals around SMDs were calculated using the method described by Nakagawa and Cuthill,6 and SMDs were statistically tested using standard errors around the SMDs.
The t score of a two-sided independent sample t test is presented as an additional statistic to compare the various measures. Again, the higher the t score, the greater the discriminatory capacity.
In the RCTs, Guyatt’s effect size (ES) could be calculated as a measure of sensitivity to change as the RCTs have a placebo group that provides the variability of the measure under the untreated condition. Guyatt’s ES is the mean change in the anti-TNF group divided by the SD of the change in the placebo group. Higher values indicate a better effect/noise ratio. Values of >0.8 are considered compatible with good sensitivity to change.
Table 2 presents the correlation between the four versions of the ASDAS, the BASDAI and its individual components, and both patient global assessment and physician global assessment at baseline. Notably, the correlation between patient and physician global assessment was only 0.30. The BASDAI correlated better with the patient global assessment (0.77) but worse with the physician global assessment (0.29). A similar pattern was found for all the individual components of the BASDAI. Remarkably, the correlation between the total back pain score and the patient global assessment (0.66) was almost as high as that between the entire BASDAI and the patient global assessment (0.77). In contrast, the acute phase reactants were more closely associated with the physician global assessment (0.38 for CRP and 0.36 for ESR) than with the patient global assessment (0.18 and 0.15). The four ASDAS versions correlated acceptably with both patient global assessment (0.58–0.74) and physician global assessment (0.44–0.54).
Table 3 shows the data of the NOR-DMARD database stratified for various subgroups. The average physician and patient global values per subgroup confirm the selection we have made. An SMD for either the patient or physician global was not calculated because the selection was artificial. The means of the ASDAS versions were between 1.4 and 1.8 for patients who considered themselves as being in a satisfactory condition, and these values corresponded with a physician global value of 1.2. The means of the ASDAS versions were between 4.2 and 4.9 for patients with high disease activity according to the physician, which corresponds with a patient global value of 7.0. With “high versus low disease activity according to the physician global” as an external construct (both at baseline and after 6 months), the SMD was highest for the four ASDAS versions. At baseline, the performance of the ASDAS versions was closely followed by the CRP level and, to a lesser extent, by the ESR. However, BASDAI (SMD 0.63 (95% CI 0.33 to 0.92)), patient global value (SMD 0.76 (95% CI 0.45 to 1.09)) and also individual patient-reported components discriminated significantly worse between high and low disease activity than the four versions of the ASDAS (ASDAS 1: SMD 1.47 (95% CI 1.21 to 1.72); ASDAS 2: SMD 1.55 (95% CI 1.28 to 1.79), ASDAS 3: SMD 1.33 (95% CI 1.08 to 1.57); ASDAS 4: SMD 1.37 (95% CI 1.12 to 1.62). At 6 months, again the four ASDAS versions (best ASDAS: SMD 2.00 (95% CI 1.62 to 2.37)) outperformed the other assessments, but the BASDAI (SMD 1.42 (95% CI 1.06 to 1.78)) and patient global values (SMD 1.43 (95% CI 1.07 to 1.79)) were now second best. The difference in SMD between ASDAS 4 and the BASDAI and between ASDAS 4 and the patient global assessment was statistically significant (p<0.001 for both comparisons). Overall, the SMDs for all assessments were higher at the 6-month time point than at the baseline time point, even with a discrimination of ⩾4 versus <4 at the 6-month time point and <4 versus >6 at baseline. A possible explanation is the larger variation in the scores between the patients after 6 months. All patients entered in the database had a sufficiently high disease activity to warrant starting a DMARD or biological agent, so at baseline all patients had a relatively high level of disease activity. When the analysis was repeated with patient perception as an external construct (being in a satisfactory condition versus not being in a satisfactory condition), the picture was somewhat different. As expected, the BASDAI performed best, but the ASDAS versions still performed well, similar to the individual patient components and significantly better than the physician global assessment.
Table 4 shows data based on change scores. Only patients with complete scores for all variables were included to be able to compare t scores across variables. In the first analysis the patients’ judgement of a considerable improvement in health (yes/no) after 3 months of treatment was the external construct. Remarkably, the ASDAS versions discriminated best between the two states of this patient-reported outcome. The second best measures were back pain and morning stiffness which performed slightly better than the entire BASDAI.
The efficacy of TNF blockers was assumed to be better than that of conventional DMARDs (most patients received sulfasalazine or methotrexate), and the ASDAS versions were analysed again for their capacity to discriminate between DMARD use and anti-TNF use. Again the discriminatory power of the ASDAS versions proved to be best, but the ASDAS versions were closely followed by morning stiffness, back pain and BASDAI. The differences in SMD between the ASDAS versions and the patient-reported outcomes were not statistically significant.
The last set of analyses was performed on the data of the RCTs comparing the change in patients treated with TNF blockers and those treated with placebo (table 5). The highest SMDs were obtained by the ASDAS versions (ASDAS 1: SMD 1.59 (95% CI 1.25 to 1.92); ASDAS 2: SMD 1.51 (95% CI 1.17 to 1.83); ASDAS 3: SMD 1.50 (95% CI 1.16 to 1.83); ASDAS 4: SMD 1.56 (95% CI 1.21 to 1.88)), followed by the physician global assessment (SMD 1.24 (95% CI 0.91 to 1.56)), ESR (SMD 1.17 (95% CI 0.85 to 1.48)) and BASDAI (SMD 1.09 (95% CI 0.77 to 1.40)). The SMDs of all ASDAS versions were significantly higher than the SMD of BASDAI (p<0.001). Based on these data, it can be calculated that the sample size of a clinical trial can be reduced by approximately 40% if one of the ASDAS versions was used as the primary outcome measure instead of the BASDAI. In particular, when looking at the Guyatt ES, it is remarkable that the entire BASDAI scores perform similarly to several of its individual components (back pain, morning stiffness).
Table 6 shows the data stratified according to various subgroups. When the patients were stratified into those with a normal baseline CRP level and those with a raised CRP level and the discriminatory capacity was investigated, the following results were obtained. In both situations the ASDAS versions perform best. The patient-reported outcomes performed relatively better in patients with a normal CRP level at baseline. The physician global assessment performed equally well in patients with normal or raised CRP levels. The proportion of patients in the RCTs with a raised CRP level was 80% compared with 69% of patients in the NOR-DMARD database.
If the distinction is made on the basis of the presence or absence of peripheral arthritis, the ASDAS versions again outperformed the other measurements in both groups. Moreover, the presence of peripheral arthritis did not seem to influence significantly the discriminatory power of any of the measures. The percentage of patients with peripheral arthritis in the RCTs was 40% compared with 43% in the NOR-DMARD database. In addition, when we tested for a possible influence by gender, the ASDAS again had the best discriminatory ability in both men and women (data not shown).
This validation study shows that all four ASDAS versions fulfil important aspects of the truth criterion of the OMERACT filter as they reflect disease activity from both the patient and the physician perspective, which are known to be inherently different.4 7 In addition, they are highly discriminatory in differentiating patients with different levels of disease activity and in differentiating those with different levels of change. This latter aspect is very important in the assessment of treatment efficacy in clinical trials.
The consistency of the results is remarkable. The ASDAS versions performed best in all settings: patient- or physician-based, reflecting status or change, normal or raised CRP levels, and in the presence or absence of peripheral arthritis. The only situation in which the conventional BASDAI outperformed the ASDAS versions was the distinction between being in a satisfactory condition according to the patient versus not being in a satisfactory condition, which can be explained by the fact that BASDAI is entirely patient-reported while ASDAS includes CRP and/or ESR levels in addition to patient-reported assessments.
ASDAS is the first validated disease activity index in AS which combines patient-reported assessments and acute phase reactants. It is obvious that physicians pay more attention to acute phase reactants than patients, and the ASDAS concept therefore meets the criticism that the assessment of disease activity in AS is dominated by patient-reported domains. ASDAS therefore may have better face validity.
When the performance of the entire BASDAI was compared with its single components, it was found that different measures performed best in different settings. This underscores the importance of using an index, as a single measure of BASDAI reflects only part of the entire construct of disease activity. Another major finding was that the BASDAI frequently performed very much like one of its single components. In particular, back pain and the two questions on morning stiffness (severity, duration), as well as the average of these two, performed very well. This may indicate a significant level of redundancy in the BASDAI: aggregated information is captured by only one question. This is especially the case if change is assessed (tables 4–6). Two of the best performing single components (total back pain and duration of morning stiffness) are both included in the ASDAS. In the studies used for this validation, the questions in the BASDAI were obtained on a VAS. However, we showed in another study that the BASDAI assessed on a VAS and numerical rating scale (NRS) gave very similar results.8 As the use of a NRS has several advantages over the use of a VAS, we encourage the use of a NRS when assessing the questions for the ASDAS.
The ASDAS is a continuous measure and, as such, is comparable to the BASDAI or the DAS in rheumatoid arthritis.9 It can be used to discriminate between groups of patients or over time after an intervention. The great advantage in comparison with a response measure such as the ASAS20 is that the ASDAS not only can provide information about improvement, but also about the actual disease activity state that has been reached.10 This is relevant in monitoring patients over time, where it is useful to follow the disease activity state actually achieved rather than being informed about improvement with reference to the baseline (which can be more than 2 years earlier).11 A possible further step is the definition of what can be considered an important improvement in an individual patient and what ASDAS value can be considered as remission, low or high disease activity level. Based on the present analyses, we can get an impression of the meaning of the various ASDAS levels; values around 4.5 are comparable to a physician global value of at least 6 and values around 3.5 to a score of at least 4. In contrast, values of 1.5 and 1.9, respectively, are compatible with patients considering themselves in a good condition and having low disease activity according to the physician. With respect to change, a change of 0.25 is observed in patients treated with a DMARD which is in general considered not to be very effective, and a change of 0.4 is seen in patients who do not consider themselves to be considerably improved. A change of this magnitude is therefore probably compatible with normal variation. In contrast, patients treated with TNF blockers (responders and non-responders combined) showed an improvement of 1.55 and patients who showed a considerable improvement had a change of 1.85. Although these figures cannot be used as cut-off values before further validation is performed, they give an impression of the significance of observed values.
After presenting the validation results to the members of the ASAS in a face-to-face meeting, it was concluded that there is little difference in performance between the four ASDAS versions. The selection of the preferred ASDAS version should therefore be based on feasibility issues.7 The focus of the discussion was mostly on the importance and feasibility of the acute phase reactants and included the following elements: (1) including both ESR and CRP has the advantage of including two “non-patient-reported” measures; (2) they provide additive information, especially in patients in whom only one of the two is raised; (3) a disadvantage is the extra cost of having two laboratory measures; (4) ESR measurement can be performed more easily than CRP measurement but it is more difficult to obtain in a standardised way in multicentre studies; (5) the test needs to be carried out relatively fast after drawing the blood; (6) CRP has the advantage of being a standardised assessment which can be measured in a central laboratory, also after storing serum. A formal vote of the ASAS members present at the meeting was taken to decide first whether an ASDAS with two or only one acute phase reactant was preferred followed by a selection for one of the two remaining ASDAS versions. Of all the ASAS members present at the meeting, 72% voted against having both acute phase reactants in the ASDAS and 56% preferred the ASDAS version with CRP only. Finally, 77% of the ASAS members present were in favour of keeping the ASDAS score with ESR only as an option in case CRP was not available.
In conclusion, we developed and validated an ASAS-endorsed ASDAS consisting of total back pain, duration of morning stiffness, the BASDAI question on peripheral joints, patient global assessment of disease activity and CRP. As an alternative, the ASDAS version with ESR—which consists of the same variables apart from the acute phase reactant, but with slightly different weighting—can be used if CRP is not available. However, it should be clearly understood that these ASDAS versions with CRP or with ESR are not interchangeable. One version should be used consistently within patients or within a study. The ASDAS is a highly discriminatory tool, clearly showing the advantage of the use of a well-balanced index covering the same underlying construct without too much redundancy. The use of the measure in RCTs enables a reduction in the sample size while preserving the same statistical power to detect a treatment effect.
Competing interests Hans Bijlsma was the Handling Editor for this article.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.