Statistics from Altmetric.com
- AS, ankylosing spondylitis
- Gd-DTPA, gadolinium-diethylenetriamine-pentaacetic acid
- ICC, intraclass correlation coefficient
- MRI, magnetic resonance imaging
- SDD, smallest detectable difference
- STIR, short τ inversion recovery
- VU, vertebral unit
Ankylosing spondylitis (AS) is a chronic inflammatory rheumatic disease that mainly affects the spine. Conventional plain radiography of the spine and pelvis is the current standard for imaging in AS, as it can visualise chronic changes such as syndesmophytes.1 Magnetic resonance imaging (MRI) has been shown to detect acute spinal lesions and to assess the change of such lesions over time in patients treated with the anti-tumour necrosis factor antibody, infliximab.2
Active spinal lesions can be assessed by MRI by using T1 weighted, fat saturated sequences after application of contrast agents such as gadolinium-diethylenetriamine-pentaacetic acid (Gd-DTPA). Enhancement of the contrast agent is believed to indicate continuing inflammation. It is unclear whether another MRI technique, the short τ inversion recovery (STIR) sequence, which it is known can visualise normal bone marrow and bone marrow oedema,3 performs similarly well in this regard. The first scoring system for evaluation of MRI sequences in AS, the ASspiMRI-a, which has recently been proposed and evaluated by our group,2 includes both techniques. STIR is easier and faster to perform, and less costly than techniques depending on the use of contrast agents, but the Gd-DTPA technique is believed to be more specific in depiction of inflammatory spinal lesions. So, the question of relative performance of both techniques is clinically relevant.
The primary aim of this study was to compare the performances of T1 weighted, fat saturated post-Gd-DTPA and STIR MRI sequences by using the recently proposed scoring method to assess spinal inflammation in patients with AS.
PATIENTS AND METHODS
Thirty eight patients with AS, who had to fulfil the modified NY classification criteria for AS,4 were randomly selected. Twenty five (66%) of the 38 patients with AS were male, with a mean age of 40.9 years (range 32–54), and 35 (92%) of the patients were HLA-B27 positive. The mean (SD) C reactive protein was 222 (219) mg/l and the mean (SD) erythrocyte sedimentation rate 31.2 (23.0)/1st h. The patients had active disease with a mean (SD) Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) of 6.4 (1.4) and a mean (SD) Bath Ankylosing Spondylitis Functional Index (BASFI) of 5.5 (2.1). Conventional radiographs of the pelvis were available in all patients and, in some, also of the spine.
Magnetic resonance imaging
MRI investigations were executed with a 1.5 T unit (Magnetom vision, Siemens, Erlangen, Germany), using a spine array coil or a body array coil, or both. The MRI techniques applied to assess spinal inflammation in patients with AS were performed as described previously.5 The sagittal section orientation was chosen and the following sequences were used:
T1 weighted spin echo sequences (repetition time/echo time 500/12 ms, slice thickness 3 mm, four acquisitions, field of view 20 cm×40 cm, matrix 128×512 pixels) before, and
The same sequence with fat saturation after application of Gd-DTPA (Schering AG, Berlin, Germany) at 0.1 mmol/kg body weight.
No dynamic imaging was performed. C2 and L5 were taken as orientation points and the spine was examined in two parts, always starting with the upper part. After rapid adjustment of the table into the appropriate position the lower part of the spine was examined.
Similarly, fat saturated STIR sequences (repetition time/inversion time/echo time 4000/150/60 ms, slice thickness 3 mm, five acquisitions, field of view 25 cm×40 cm, matrix 121×256 pixels) were performed.
T2 weighted images were also available and were taken into account in doubtful cases of differentiation between chronic and acute lesions.
Scoring of the MRI sequences
After all MR images had been blinded for patient identity, an independent person randomly selected the order of the films, which were then evaluated twice by two readers (JB, WG). Each evaluation included first the STIR and, secondly, the Gd-DTPA MR images of each patient. Thus, each image was evaluated four times. The recently proposed MRI scoring system, ASspiMRI-a, which has been evaluated for assessment of acute inflammatory and possibly simultaneous erosive spinal lesions2,6 was used to analyse the MR images of both sequences on the basis of a vertebral unit (VU), which was defined as the region between two virtual lines drawn through the middle of each vertebral body.2
Separate scores were used to test the intra- and interrater variability. Definite involvement of a VU by inflammation was defined as a score >1 as proposed elsewhere.7 All scorings were done on the basis of single VUs. Concordance and discordance rates were calculated by identifying positive definite inflammatory involvement in both MRI sequences and by always comparing the same VUs in each MRI sequence.
The reliability of the entire score was evaluated by estimating the variability between the two readers, as well as the variability within the readers. A nested variance analysis approach was used for the calculation of both types of variance—the interrater variance and the intrarater variance. The intrarater variance was estimated in an analysis of a variance type I model with the patients as the first factor and the reader as the second random factor. Similarly, the interrater variance was estimated in a nested model with patients as first factor and readings as second factor.
The intrarater variance was used to calculate the smallest detectable difference (SDD) between two readings of one reader for one patient. By means of the normal approximation, the SDD was calculated by 1.812 times the square root of the interrater variance. This ensures an 80% probability that an observed difference is larger or smaller than the measurement error.
Inter- and intrareader reliability of the ASspiMRI for both MRI sequences
Table 1 shows that the intra- and interreader variance was low (<10% of all variance) for both sequences, resulting in high intraclass correlation coefficient (ICC) values. By far the greatest proportion of variance could be attributed to true variance among patients. As a consequence, we obtained high ICC values for scoring the entire spine by both sequences. The ICC values were clearly lower for the three spinal segments separately, with consistently lower scores for the STIR sequence than for the Gd-DTPA sequence (table 1). The analysis of variance results stratified for the spinal site showed that the thoracic spine, as compared with the lumbar and cervical spine, generated by far the highest amount of between-patient (true) variance, and also the highest amount of intra- and interreader variance. The stratified analyses of the three spinal segments also showed poorer ICCs than the analysis of the entire spine, with in general, and as expected, somewhat higher levels of interrater than intrarater variances. As a consequence of low levels of intrarater variance, low SDDs were found. The smallest detectable difference was calculated to be 4.7 for the Gd-DTPA sequence and 5.6 for the STIR sequence (table 1).
Comparison of the Gd-DTPA and the STIR sequence using the MRI activity score ASspiMRI-a
The overall correlation between the ASspiMRI-a scores obtained with the two MRI techniques, Gd-DTPA and STIR, was rather good (r = 0.84; p = 0.01). The distribution of the scorings showed a clear preponderance for the lower scorings (0 and 1–3) rather than the higher scorings (4–6), but no important differences between the two MRI sequences. Scorings of 1–3 were found in 20.3% of the VUs in the Gd-DTPA sequence and in 25% of the VUs in the STIR sequence, while scorings of 4–6 were found in 3.9% in the Gd-DTPA sequence and in 2.3% in the STIR sequence (fig 1).
Overall, the level of involvement was high, with a range of 23.2%–35.7% of all VUs, and with 81.6%–92.1% of all patients showing at least one inflammatory lesion (table 2). The percentage of involvement was similar among different readers and among different readings.
Concordance of the scorings, defined as the percentage VUs with spinal inflammation (score ⩾1) compared with the percentage without inflammation (score 0), in both MRI sequences and for both readers/readings separately were found in 83% of the VUs scored (fig 2), with minor variation across the three segments (80.5% concordance in the VUs of the cervical spine, 83.3% in the VUs of the thoracic spine, and 87.7% in the VUs of the lumbar spine). The level of discordance was higher when using the STIR sequence than when using the Gd-DTPA sequence: STIR showed inflammation in 10.1% of the VUs that were found to be normal with Gd-DTPA. In contrast, STIR showed no inflammatory lesions in 6.4% of the VUs in which Gd-DTPA identified inflammation. Table 3 shows an analysis of concordant VU pairs for each reader and for each reading separately, both at the patient level and at the level of single VUs. The percentage of concordant observations was similar among both readers, but appeared to increase slightly in the second reading as compared with the first one in both readers.
More inflammatory spinal lesions were seen by the STIR sequence than by the Gd-DTPA sequence: inflammation was present in 30.6% of the VUs, as assessed by STIR, compared with 26.8% of the same VUs when assessed by Gd-DTPA for the entire spine (p = 0.001). A detailed evaluation of the three spinal segments showed inflammation in 20.7% and 16% of the VUs in the cervical spine for the STIR and the Gd-DTPA sequence, respectively (p<0.05), in 38.7% and 34.5% for the thoracic spine (p<0.001), and in 23% and 20.3% for the lumbar spine for the STIR and Gd-DTPA sequences, respectively (p<0.05).
MRI techniques are rapidly gaining importance in the evaluation of acute spinal inflammation in AS. In this study we evaluated the two most important techniques and their prominent findings: enhancement of Gd-DTPA seen in T1 sequences with fat saturation technique after application of the contrast agent and/or the bone marrow oedema seen by the STIR technique with intrinsic fat suppression. For evaluation of these MRI changes we used the recently proposed scoring system ASspiMRI-a, which was developed by our group.2 For both sequences we compared the intra- and interreader reliability and the sensitivity of the sequences to detect inflammatory lesions in the entire spine.
The results of our study confirmed a high level of intra- and interreader reliability for both sequences, when evaluated by the ASspiMRI-a. The ICCs obtained for scores of the whole spine were good; those of the separate parts of the spine were poorer, except for the thoracic spine. The latter observation is due to the fact that this site is the source of a higher level of variability in scorings, and emphasises the importance of scoring the entire spine instead of only scoring part of the spine. This is also in agreement with the concept that a combination of a higher number of components leads to higher reliability. Although intra- and interreader reliability of STIR appeared to be somewhat worse than that of Gd-DTPA, the differences in ICC were small, and probably not of considerable importance. This study also shows that the two MRI techniques analysed by using the ASspiMRI-a scoring system have face validity.8 MRI is the most reasonable way to assess spinal inflammation. Furthermore, the correlation between the ASspiMRI-a and C reactive protein suggest that the criterion validity is good (data not shown).
Active lesions in the spine of patients with AS can be detected by both STIR and T1 Gd-DTPA MRI sequences. Both readers consistently saw more active lesions with STIR than with Gd-DTPA, which suggests that STIR is somewhat more sensitive to signals representing inflammatory activity than Gd-DTPA. Obviously, it is not known whether this higher sensitivity reflects a true or false signal. The analysis of concordant and discordant observations showed that both techniques provide complementary information: some “STIR negative” patients/VUs appeared to be “Gd-DTPA positive”, and vice versa. However, STIR positive observations were more often Gd-DTPA negative than the Gd-DTPA positive observations were STIR negative. The latter observation suggests—but does not prove—that there may be some overreporting when using the STIR technique, and that the Gd-DTPA technique is more selective and specific. An additional explanation for this phenomenon may also be that STIR was read first, and Gd-DTPA afterwards, which may indicate a potential recall bias. In clinical practice Gd-DTPA is often used to confirm abnormalities detected by STIR.
An interesting finding that only partially relates to the topic of this manuscript is the observation that concordance rates increased in both readers in their second reading. This may point to a training effect, and adds to the conclusion that STIR and Gd-DTPA do not really differ in their ability to pick up a true inflammatory signal.
Finally, we should mention that the content of the ASspiMRI-a was not the topic of this study. The inclusion of erosions as an additional outcome measure in an activity score has been debated at OMERACT 2004. The prevalence of a score >3 (which indicates inflammation plus erosion) has been in the range of 10% in the studies performed so far (unpublished data). We are currently analysing how much information and sensitivity to change is lost when the ASspiMRI-a is performed without counting erosions.
In summary, both STIR and Gd-DTPA sequences can detect spinal inflammation in patients with AS. It needs to be emphasised that the conclusions of this methodological paper apply to the situation in which the ASspiMRI-a scoring method should be used: clinical trials and observational studies (groups of patients). Clearly, picking up signals in an individual patient (for diagnostic purposes and for differential diagnosis) is a different issue. Thus, our conclusion is that, for the purpose of clinical studies and group comparisons, STIR and Gd-DTPA perform equally well, and that feasibility should determine whether both or just one technique should be used. At present, it cannot be finally decided whether one technique provides “truer” findings of spinal inflammation than the other because there is no “gold standard”. Possibly, some information is gained in individual patients when both sequences are available. In clinical practice, the STIR technique is likely to be preferred for feasibility reasons (costs, time).
Published Online First 13 January 2005
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.