Background: The Bath Ankylosing Spondylitis Metrology Index (BASMI) characterises the spinal mobility of patients with ankylosing spondylitis. Two versions have been published using categorical scores with either scores 0–2 for each of the five assessments, or scores 0–10. For metric purposes, we recently defined a BASMI version with linear score definitions.
Aim: to evaluate agreement between three BASMI definitions and to test sensitivity to change.
Patients and methods: The performance of the BASMI2 (based on the 2-step definition), BASMI10 (based on the 10-step definition), and BASMIlin (based on the linear definition), are compared in 598 status assessments and 222 follow-up assessments with a 24-week interval after an intervention with either placebo or a tumour necrosis factor (TNF) blocker from various cohorts of patients with ankylosing spondylitis (AS). Descriptive statistics and Bland–Altman plots were applied to compare the pairwise agreement of the three definitions. To assess sensitivity to change, Guyatt effect size using change data from the placebo and actively treated patients were used.
Results: Bland–Altman analysis showed that the differences between BASMI2 scores and scores obtained by either of the two other definitions were highly dependent on the magnitude of the measurement. Guyatt effect sizes were 0.66 for the BASMI2, 0.95 for the BASMI10, and 1.04 for the BASMIlin, respectively, demonstrating best sensitivity to change for the newly-developed BASMIlin.
Conclusions: The BASMI10 and BASMIlin have clear metric advantages as compared to BASMI2, among which are their superior sensitivity to change and feasibility of BASMIlin in computer evaluations. The BASMI10 and BASMI2 are not interchangeable.
Statistics from Altmetric.com
The Bath Ankylosing Spondylitis Metrology Index (BASMI) is a combined index comprising five assessments of spinal mobility in patients with ankylosing spondylitis (AS). It includes assessments of lateral lumbar flexion, tragus-to-wall distance, lumbar flexion, intermalleolar distance, and cervical rotation.1 2 These measurements were found to be most reliable and clinically useful to reflect axial status based on an extensive literature review by a group of rheumatologists, physiotherapists and research associates with a special interest in AS. In their initial validation they have shown that a combination of these 5 measures is a good reflection of the information obtained by 20 separate clinical measurements.1 It is a useful measure for characterising the outcome of AS, comprising the effect of radiological spinal changes and soft tissue involvement.3 4 The Assessment in Ankylosing Spondylitis International Working Group (ASAS) adopted the BASMI as part of their core set for spinal mobility measurements in AS.5
In the definition of the BASMI published in 19941 each continuous assessment was converted into a nominal score of 0, 1 or 2, based on the conversion table shown in table 1. The sum of these scores (BASMI2) is also nominal and can adopt whole numbers only, with a range from 0–10.
In 1995, a second definition was published2 in which each continuous assessment was converted into a nominal score of zero to 10 based on a separate conversion table shown in table 2. The sum of these scores divided by 5 (the BASMI10) is also nominal in nature, thus creating a similar construct with a range of 0–10, which can adopt whole multiples of 0.2. The similar range of BASMI10 and BASMI2 suggests that these constructs would be interchangeable.
A disadvantage of both nominal definitions is that the continuous scale of the assessments is converted into a nominal one, ie, that small changes in the assessments can as well be suppressed as lead to large jumps in the resulting BASMI (theoretically up to the half scale range in worst-case for BASMI2). In intervention trials where small improvements in spinal mobility are of interest, this disadvantage can be essential. Furthermore, a computer evaluation with the discontinuous conversion is rather complicated.
Moreover, the scale range of the various spinal mobility assessments is different between the BASMI2 and BASMI10. To our knowledge, the effect on the performance of the BASMI2 and BASMI10 has never been formally compared. Though most publications involve the BASMI2, it is often not clear to the reader of an article which of the definitions was used.
We propose here a BASMI definition that is based on continuous data, with a linear assessment-to-score conversion in the range 0–10. We describe its application for computer use in the analysis of clinical trials, and also its application for use without a computer “at the bedside”. We compare the performance of our definition with the existing BASMI2 and BASMI10 definitions with respect to comparability, test–retest reliability and sensitivity to change by applying them in observational data and in data from patients treated with tumour necrosis factor (TNF) blockers in various clinical trials.
PATIENTS AND METHODS
Proposed linear BASMI definition
The assessments A of the five components are converted into the scores S using the equations given in table 3. The factors in the equations have been chosen as to establish agreement between BASMI10 and our definition for the centre of each field of the BASMI10 conversion table. Like the BASMI10, the BASMIlin is also the average of the five scores. If the assessment results are to be evaluated by hand without a computer and the resulting BASMIlin scores are to be compared with former results “at the bedside”, it is simpler and less time-consuming to enter the scores into the form shown in fig 1 and to perform the conversion in a graphical way with the help of the double scales presented in this figure.
Patients and assessments
For the comparison of the performance of the three BASMI versions (BASMI2, BASMI10, and BASMIlin) we used existing assessments from various cohorts. All spinal mobility assessments necessary to calculate the BASMI were made in 187 patients from the OASIS (Outcome in Ankylosing Spondylitis International Study) cohort, which was used for several previous studies.6–10 The assessments are performed as described in fig 1. For the cervical rotation the method with the goniometer was applied. In 154 of the 187 patients two assessments were made with a 2-year interval, and in the remaining 33 of the 187 patients a single assessment was made (in total 341 assessments).
In addition, baseline data from 257 patients that took part in three clinical trials to assess anti-TNF therapy were used. In total, 598 assessments could be used for comparison of status scores according to different BASMI definitions. Moreover, 63 patients treated with placebo and 159 patients treated with anti-TNF therapy had a baseline and 24-week follow-up assessment available, and could be used to compare sensitivity to change of different BASMI definitions.
Descriptive statistics and Bland–Altman analysis (showing the difference between two BASMI definitions per patient (y-axis) plotted against the mean of two BASMI definitions per patient (x-axis) were used to describe the absolute agreement across different BASMI definitions. Apart from showing the degree of agreement, Bland–Altman plots give insight into whether both BASMI definitions behave similarly over the entire range of measurement or not.
The effect size according to Guyatt (mean change of TNF-treated patients divided by the standard deviation of the change of the placebo-treated patients) per BASMI definition was calculated for the patients in the trials to investigate differences in sensitivity to change.
Table 4 presents summary statistics for the various groups of patients for the BASMI2, BASMI10 and BASMIlin. The entire range of possible scores is well represented in the various cohorts. Expectedly, the summary statistics of BASMI10 closely resemble those of BASMIlin. By contrast, the mean status score according to the BASMI10 and BASMIlin definitions are substantially higher as compared to the mean BASMI2 score.
Figure 2 shows Bland–Altman plots comparing the scores of the three BASMI definitions for individual patients. If two BASMI versions yielded exactly similar results, all data would lie along the x-axis over the entire range of measurement. Comparing BASMI10 and BASMIlin, there are only small differences between the two scores and there is no clear pattern discernible along the range of measurement. By contrast, however, if the BASMI2 is compared to either the BASMI10 or the BASMIlin, the BASMI10 and BASMIlin outweigh the BASMI2 in the lower part of the range and undervalue the BASMI2 in the higher part of the range (a heteroscedastic pattern).
Calculated Guyatt effect sizes were 0.66 for the BASMI2, 0.95 for the BASMI10, and 1.04 for the BASMIlin respectively, demonstrating best sensitivity to change for the newly-developed BASMIlin definition.
The most important conclusion of this method comparison is that the different BASMI versions that have been defined previously yield different BASMI scores if applied to the same patients. In particular the BASMI2 performs fundamentally differently, which becomes clear if one interprets the Bland–Altman plots. Summary statistics do not clearly show the problem, as the mean BASMI2 is only slightly lower as compared to the BASMI10 and BASMIlin definitions. The Bland–Altman plots comparing the BASMI2 with both other BASMI definitions gives an immediate explanation: the classic BASMI2 score is lower than the scores of the others if the BASMI is in the lower range, but higher if the BASMI is in the higher range, so that both differential effects extinguish each other, and the net effect on the grand mean is negligible. Such an effect can only be demonstrated in Bland–Altman plots, as already suggested in the past by Bland and Altman, who argued that in method comparison one is interested in whether the difference between the measurements by the two methods is related to the magnitude of the measurement.11 The explanation for this clear discrepancy is that the BASMI2 does not behave as a linear measure, while the BASMI10 and the BASMIlin do. BASMI2 is built on vaguely-defined categories that may have a clinical implication but that also render the instrument inappropriate as a monitoring tool. The fundamental objection here is that a change of 1 unit in BASMI2 has different meanings when the baseline BASMI value is 8 as compared to, eg, 5. The BASMI10 and the BASMIlin in particular show fewer floor effects, ie, are able to show changes in patients with limited restrictions. This can best be illustrated by the lateral lumbar flexion. If the lateral lumbar flexion is 10 cm or more this is always scored as 0 in the BASMI2 where there are still five grades to differentiate between measurements over 10 cm in the BASMI10 and the BASMIlin. Similar effects are seen for the modified Schober where measurements above 4 cm always result in a score of 0 in the BASMI2 while there are four grades available in the BASMI10 and the BASMIlin.
The results of this method comparison show that the proposed BASMIlin definition, which is truly linear, closely resembles BASMI10 scores. The small differences shown in fig 2C can be attributed to rounding errors associated with the categories in the BASMI10 definition. This finding underpins the fact that BASMI10 does not suffer so much from the metric limitations inherent to BASMI2, and this makes BASMI10 a more reliable instrument to use in monitoring spinal mobility over time than BASMI2. Furthermore, the proposed BASMIlin has a slightly better sensitivity to change as compared to the BASMI10 version, which is undoubtedly due to the elimination of the categorical character of the latter. More importantly, with the linear equations presented in table 3 a computer evaluation in clinical trials, and putatively in clinical practice, is far easier to perform as compared to the conversion tables of the other two BASMI versions. Also, an evaluation by hand “at the bedside” is easy to perform if the form presented in fig 1 is used.
The BASMI10 and BASMIlin have clear and relevant advantages as compared to the BASMI2. The main advantage of the new BASMIlin over the BASMI10 is its practicability in computer evaluations in clinical trials. It is important to realise that the BASMI10 and BASMI2 are not interchangeable. Which of the BASMI versions was applied should therefore be defined in every publication in which spinal mobility is measured by using an index instead of separate measurements.
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.