Article Text

## Abstract

**Background:** The Bath Ankylosing Spondylitis Metrology Index (BASMI) characterises the spinal mobility of patients with ankylosing spondylitis. Two versions have been published using categorical scores with either scores 0–2 for each of the five assessments, or scores 0–10. For metric purposes, we recently defined a BASMI version with linear score definitions.

**Aim:** to evaluate agreement between three BASMI definitions and to test sensitivity to change.

**Patients and methods:** The performance of the BASMI_{2} (based on the 2-step definition), BASMI_{10} (based on the 10-step definition), and BASMI_{lin} (based on the linear definition), are compared in 598 status assessments and 222 follow-up assessments with a 24-week interval after an intervention with either placebo or a tumour necrosis factor (TNF) blocker from various cohorts of patients with ankylosing spondylitis (AS). Descriptive statistics and Bland–Altman plots were applied to compare the pairwise agreement of the three definitions. To assess sensitivity to change, Guyatt effect size using change data from the placebo and actively treated patients were used.

**Results:** Bland–Altman analysis showed that the differences between BASMI_{2} scores and scores obtained by either of the two other definitions were highly dependent on the magnitude of the measurement. Guyatt effect sizes were 0.66 for the BASMI_{2}, 0.95 for the BASMI_{10}, and 1.04 for the BASMI_{lin}, respectively, demonstrating best sensitivity to change for the newly-developed BASMI_{lin}.

**Conclusions:** The BASMI_{10} and BASMI_{lin} have clear metric advantages as compared to BASMI_{2}, among which are their superior sensitivity to change and feasibility of BASMI_{lin} in computer evaluations. The BASMI_{10} and BASMI_{2} are not interchangeable.

## Statistics from Altmetric.com

The Bath Ankylosing Spondylitis Metrology Index (BASMI) is a combined index comprising five assessments of spinal mobility in patients with ankylosing spondylitis (AS). It includes assessments of lateral lumbar flexion, tragus-to-wall distance, lumbar flexion, intermalleolar distance, and cervical rotation.1 2 These measurements were found to be most reliable and clinically useful to reflect axial status based on an extensive literature review by a group of rheumatologists, physiotherapists and research associates with a special interest in AS. In their initial validation they have shown that a combination of these 5 measures is a good reflection of the information obtained by 20 separate clinical measurements.1 It is a useful measure for characterising the outcome of AS, comprising the effect of radiological spinal changes and soft tissue involvement.3 4 The Assessment in Ankylosing Spondylitis International Working Group (ASAS) adopted the BASMI as part of their core set for spinal mobility measurements in AS.5

In the definition of the BASMI published in 19941 each continuous assessment was converted into a nominal score of 0, 1 or 2, based on the conversion table shown in table 1. The sum of these scores (BASMI_{2}) is also nominal and can adopt whole numbers only, with a range from 0–10.

In 1995, a second definition was published2 in which each continuous assessment was converted into a nominal score of zero to 10 based on a separate conversion table shown in table 2. The sum of these scores divided by 5 (the BASMI_{10}) is also nominal in nature, thus creating a similar construct with a range of 0–10, which can adopt whole multiples of 0.2. The similar range of BASMI_{10} and BASMI_{2} suggests that these constructs would be interchangeable.

A disadvantage of both nominal definitions is that the continuous scale of the assessments is converted into a nominal one, ie, that small changes in the assessments can as well be suppressed as lead to large jumps in the resulting BASMI (theoretically up to the half scale range in worst-case for BASMI_{2}). In intervention trials where small improvements in spinal mobility are of interest, this disadvantage can be essential. Furthermore, a computer evaluation with the discontinuous conversion is rather complicated.

Moreover, the scale range of the various spinal mobility assessments is different between the BASMI_{2} and BASMI_{10}. To our knowledge, the effect on the performance of the BASMI_{2} and BASMI_{10} has never been formally compared. Though most publications involve the BASMI_{2}, it is often not clear to the reader of an article which of the definitions was used.

We propose here a BASMI definition that is based on continuous data, with a linear assessment-to-score conversion in the range 0–10. We describe its application for computer use in the analysis of clinical trials, and also its application for use without a computer “at the bedside”. We compare the performance of our definition with the existing BASMI_{2} and BASMI_{10} definitions with respect to comparability, test–retest reliability and sensitivity to change by applying them in observational data and in data from patients treated with tumour necrosis factor (TNF) blockers in various clinical trials.

## PATIENTS AND METHODS

### Proposed linear BASMI definition

The assessments *A* of the five components are converted into the scores *S* using the equations given in table 3. The factors in the equations have been chosen as to establish agreement between BASMI_{10} and our definition for the centre of each field of the BASMI_{10} conversion table. Like the BASMI_{10}, the BASMI_{lin} is also the average of the five scores. If the assessment results are to be evaluated by hand without a computer and the resulting BASMI_{lin} scores are to be compared with former results “at the bedside”, it is simpler and less time-consuming to enter the scores into the form shown in fig 1 and to perform the conversion in a graphical way with the help of the double scales presented in this figure.

### Patients and assessments

For the comparison of the performance of the three BASMI versions (BASMI_{2}, BASMI_{10}, and BASMI_{lin}) we used existing assessments from various cohorts. All spinal mobility assessments necessary to calculate the BASMI were made in 187 patients from the OASIS (Outcome in Ankylosing Spondylitis International Study) cohort, which was used for several previous studies.6^{–}10 The assessments are performed as described in fig 1. For the cervical rotation the method with the goniometer was applied. In 154 of the 187 patients two assessments were made with a 2-year interval, and in the remaining 33 of the 187 patients a single assessment was made (in total 341 assessments).

In addition, baseline data from 257 patients that took part in three clinical trials to assess anti-TNF therapy were used. In total, 598 assessments could be used for comparison of status scores according to different BASMI definitions. Moreover, 63 patients treated with placebo and 159 patients treated with anti-TNF therapy had a baseline and 24-week follow-up assessment available, and could be used to compare sensitivity to change of different BASMI definitions.

### Analysis

Descriptive statistics and Bland–Altman analysis (showing the difference between two BASMI definitions per patient (y-axis) plotted against the mean of two BASMI definitions per patient (x-axis) were used to describe the absolute agreement across different BASMI definitions. Apart from showing the degree of agreement, Bland–Altman plots give insight into whether both BASMI definitions behave similarly over the entire range of measurement or not.

The effect size according to Guyatt (mean change of TNF-treated patients divided by the standard deviation of the change of the placebo-treated patients) per BASMI definition was calculated for the patients in the trials to investigate differences in sensitivity to change.

## RESULTS

Table 4 presents summary statistics for the various groups of patients for the BASMI_{2}, BASMI_{10} and BASMI_{lin}. The entire range of possible scores is well represented in the various cohorts. Expectedly, the summary statistics of BASMI_{10} closely resemble those of BASMI_{lin}. By contrast, the mean status score according to the BASMI_{10} and BASMI_{lin} definitions are substantially higher as compared to the mean BASMI_{2} score.

Figure 2 shows Bland–Altman plots comparing the scores of the three BASMI definitions for individual patients. If two BASMI versions yielded exactly similar results, all data would lie along the x-axis over the entire range of measurement. Comparing BASMI_{10} and BASMI_{lin}, there are only small differences between the two scores and there is no clear pattern discernible along the range of measurement. By contrast, however, if the BASMI_{2} is compared to either the BASMI_{10} or the BASMI_{lin}, the BASMI_{10} and BASMI_{lin} outweigh the BASMI_{2} in the lower part of the range and undervalue the BASMI_{2} in the higher part of the range (a heteroscedastic pattern).

Calculated Guyatt effect sizes were 0.66 for the BASMI_{2}, 0.95 for the BASMI_{10}, and 1.04 for the BASMI_{lin} respectively, demonstrating best sensitivity to change for the newly-developed BASMI_{lin} definition.

## DISCUSSION

The most important conclusion of this method comparison is that the different BASMI versions that have been defined previously yield different BASMI scores if applied to the same patients. In particular the BASMI_{2} performs fundamentally differently, which becomes clear if one interprets the Bland–Altman plots. Summary statistics do not clearly show the problem, as the mean BASMI_{2} is only slightly lower as compared to the BASMI_{10} and BASMI_{lin} definitions. The Bland–Altman plots comparing the BASMI_{2} with both other BASMI definitions gives an immediate explanation: the classic BASMI_{2} score is lower than the scores of the others if the BASMI is in the lower range, but higher if the BASMI is in the higher range, so that both differential effects extinguish each other, and the net effect on the grand mean is negligible. Such an effect can only be demonstrated in Bland–Altman plots, as already suggested in the past by Bland and Altman, who argued that in method comparison one is interested in whether the difference between the measurements by the two methods is related to the magnitude of the measurement.11 The explanation for this clear discrepancy is that the BASMI_{2} does not behave as a linear measure, while the BASMI_{10} and the BASMI_{lin} do. BASMI_{2} is built on vaguely-defined categories that may have a clinical implication but that also render the instrument inappropriate as a monitoring tool. The fundamental objection here is that a change of 1 unit in BASMI_{2} has different meanings when the baseline BASMI value is 8 as compared to, eg, 5. The BASMI_{10} and the BASMI_{lin} in particular show fewer floor effects, ie, are able to show changes in patients with limited restrictions. This can best be illustrated by the lateral lumbar flexion. If the lateral lumbar flexion is 10 cm or more this is always scored as 0 in the BASMI_{2} where there are still five grades to differentiate between measurements over 10 cm in the BASMI_{10} and the BASMI_{lin}. Similar effects are seen for the modified Schober where measurements above 4 cm always result in a score of 0 in the BASMI_{2} while there are four grades available in the BASMI_{10} and the BASMI_{lin}.

The results of this method comparison show that the proposed BASMI_{lin} definition, which is truly linear, closely resembles BASMI_{10} scores. The small differences shown in fig 2C can be attributed to rounding errors associated with the categories in the BASMI_{10} definition. This finding underpins the fact that BASMI_{10} does not suffer so much from the metric limitations inherent to BASMI_{2}, and this makes BASMI_{10} a more reliable instrument to use in monitoring spinal mobility over time than BASMI_{2}. Furthermore, the proposed BASMI_{lin} has a slightly better sensitivity to change as compared to the BASMI_{10} version, which is undoubtedly due to the elimination of the categorical character of the latter. More importantly, with the linear equations presented in table 3 a computer evaluation in clinical trials, and putatively in clinical practice, is far easier to perform as compared to the conversion tables of the other two BASMI versions. Also, an evaluation by hand “at the bedside” is easy to perform if the form presented in fig 1 is used.

## CONCLUSIONS

The BASMI_{10} and BASMI_{lin} have clear and relevant advantages as compared to the BASMI_{2}. The main advantage of the new BASMI_{lin} over the BASMI_{10} is its practicability in computer evaluations in clinical trials. It is important to realise that the BASMI_{10} and BASMI_{2} are not interchangeable. Which of the BASMI versions was applied should therefore be defined in every publication in which spinal mobility is measured by using an index instead of separate measurements.

## REFERENCES

## Footnotes

**Competing interests:**None.