Objectives: To develop an ultrasound enthesis score and to assess its validity in the diagnostic classification of the spondyloarthropathies (SpAs).
Methods: Twenty-five patients with SpA and 29 healthy controls participated in a blinded, gender-matched, cross-sectional study involving ultrasound assessment. The following entheses were explored bilaterally: proximal plantar fascia, distal Achilles tendon, distal and proximal patellar ligament, distal quadriceps and brachial triceps tendons. The ultrasound score evaluated enthesis thickness, structure, calcifications, erosions, bursae and power Doppler signal. The value of each elemental lesion was calculated using a three-model analysis. Validity was analysed by receiver operating characteristic (ROC) curves. Inter-reader and interexplorer intraclass correlation coefficients (ICCs) were calculated.
Results: The logistic regression model overestimated the score of three elemental lesions: calcification (0–3), Doppler (0 or 3) and erosion (0 or 3), while scoring tendon structure, tendon thickness and bursa as 0 or 1. ROC curves established an ultrasound score of ⩾18 as the best cut-off point for differentiation between cases and controls. This cut-off point was exceeded by 5/29 controls (17%) and by 21/25 patients with SpA (84%). The sensitivity, specificity, positive and negative likelihood ratios (LR+, LR−) were 83.3%, 82.8%, 4.8% and 0.2%, respectively. The inter-reader and interexplorer ICCs were 0.60 and 0.86, respectively.
Conclusion: The findings suggest that the ultrasound enthesis score could be a valid tool in the diagnosis of SpA.
Statistics from Altmetric.com
The diagnosis of spondyloarthropathy (SpA) is often made several years after the disease begins. The Rome, New York and modified New York criteria for the classification of ankylosing spondylitis (AS) have high specificities, but their sensitivities are low.1 2 The absence of both radiographic sacroiliitis and impaired spinal mobility at early stages of the disease contributes to the long delay (5–10 years) in the diagnosis of AS in many patients.3
On the other hand, inflammatory involvement of the enthesis, a characteristic feature of the SpAs, is regarded as the primary lesion in this disease.4 5 A tenderness enthesitis index at 66 entheseal insertions in SpA has been developed. This index correlates with pain and stiffness scores, yet is time consuming and has poor interobserver reliability.6 At the present time, only clinical evidence of heel enthesitis is included in the European Spondylarthropathy Study Group (ESSG) Preliminary Classification Criteria for the diagnosis of SpA.7 However, ultrasound (US) detection of enthesitis is more sensitive and specific than clinical examination and furthermore, it is reproducible.8–10 The aim of this study is to explore the validity of enthesis ultrasonography for the diagnostic classification of SpA.
PATIENTS AND METHODS
A blinded, gender-matched, cross-sectional study was performed on 25 consecutive non-selected patients with SpA and 29 healthy controls. All patients satisfied the ESSG preliminary classification criteria for the diagnosis of SpA.7 The sample included 19 cases of AS (modified New York criteria), two cases of undifferentiated SpA, one case of juvenile SpA, two cases of psoriatic arthritis and one case of reactive arthritis. The mean disease evolution time was 15 years (range 4–34). Exclusion criteria included a history of knee, ankle, or elbow surgery, peripheral neuropathy, or corticosteroid injection within the previous 6 weeks at any of the sites to be examined. Twenty-nine healthy controls (friends of hospital workers or patients) without any known inflammatory or mechanical musculoskeletal disease were invited to participate. For ethical reasons, no analytical or radiological studies, except a brief anamnesis were performed on the control group. The study was approved by the hospital ethics committee and both patients and controls gave their informed consent.
Ultrasonography was performed by an experienced rheumatologist, using a GE Logiq 5 Pro ultrasound system (General Electric Healthcare, Kyunnggi-do, Korea), with a 7–12 MHz linear array transducer. The sonographer was blinded to patients and controls and subjects were asked not to communicate with the US examiner. The US study bilaterally explored entheses at six sites: proximal plantar fascia, distal Achilles tendon, distal and proximal patellar tendon insertion, distal quadriceps tendon and distal brachial triceps tendon. Each tendon was scanned in both the longitudinal and transverse planes. Knee enthesis examination was performed with the patient in the supine position and the knee flexed at 70°. The Achilles tendon and the plantar aponeurosis were examined with the patient lying prone and the feet hanging over the edge of the examination table at 90° of flexion. The triceps insertion was examined with the arm flexed at 90°.
The US exploration evaluated the following elemental lesions of enthesis at each site: calcifications, bursae, erosions, power Doppler signal in bursa or enthesis full tendon (cortical bone profile, intratendon and paratendon on the enthesis insertion) and thickness and structure (figs 1 and 2).
Calcifications were examined at the area of the enthesis insertion and scored as 0 if absent, or a 1 if a small calcification or ossification with an irregularity of enthesis cortical bone profile was seen. Calcifications were given a score of 2 if there was clear presence of enthesophytes (hyperechoic spurs forming at a tendon insertion into bone, growing in the direction of the natural pull of the tendon involved), or if medium-sized calcifications or ossification were seen. Lastly, they were classified as a 3 if large calcifications or ossifications were present. To simplify matters, ossifications and enthesophytes at the enthesis were also included as calcifications.
Bony erosion was defined as a cortical breakage with a step-down bone contour defect of in-longitudinal and transverse axes.
Power Doppler settings were standardised with a pulse repetition frequency of 400 Hz, a gain of 20 dB and a low wall filter.
Fascia and tendon thickness were measured at the point of maximal thickness on the bony insertion. The following criteria were used for abnormal structure thickness: quadriceps tendon thickness >6.1 mm, proximal and distal patellar tendon >4 mm, Achilles tendon >5.29 mm and plantar aponeurosis >4.4 mm.8 The normal US features and thickness of the structures examined have been previously described.8 To reduce subjectivity, the threshold of abnormal thickness was set 0.1 mm above the reported standard deviation of each site in the normal population.8 We used >4.3 mm as the measure for abnormal structure thickness of triceps insertion; this value was based on our own controls (mean (SD) of 29 controls 3.66 (0.54) mm), then 0.1 mm above the standard deviation was added. Structure was defined as pathological if loss of fibrillar pattern, hypoechoic aspect, or fusiform thickening of the enthesis occurred.
For reliability, an inter-reader and interexplorer analysis was carried out. The sonographic images for each subject were stored. Inter-reader agreement was measured for six rheumatologists at five hospitals in the area of Madrid. All were experts in musculoskeletal ultrasonography, although only two had previous experience with the enthesis US score (readers 1 (EdM) and 2 (TC) worked in the same department and were directly involved in the development of the index). As a result of its geographical origin, the score was called the MAdrid Sonographic Enthesis Index (MASEI). An agreement on ultrasonographic definitions was reached before the inter-reader assessment.
The first 17 patients and 19 controls were used in the inter-reader study. Cases and controls in this reliability study were masked in individual blocks and had a bilateral enthesis exploration as previously described. In total, 1363 digital images were read by each of the experts.
In addition, an interexplorer study of 22 subjects (cases and controls) was conducted. For this purpose, each subject was independently blinded and consecutively scanned by two of the rheumatologists involved in the study (TC and EdM). For quantification of the enthesis lesions, the procedure described above was also used.
To determine the value of every elemental lesion, a three-model analysis was used: logistic regression, main components and latent class model. Receiver operating characteristic (ROC) curves were used to calculate the predictive capacity of the score for every US reader and for the final score (table 1). ROC curves and model analysis were determined by the ESTATA program.
For the reliability analysis, the two-way, mixed-effect model (absolute agreement) and single-measure intraclass correlation coefficients (ICCs) were used. They were determined by SPSS (version 9). Values of p<0.05 were considered to be significant.
The study sample included 29 healthy subjects (19 male, 10 female) with a mean (SD) age of 46.1 (13.4) years (range 22–64) and 25 patients with SpA (16 male, 9 female) mean (SD) age 43.3 (15.6) years (range 17–70). Both, age and gender ratio were comparable.
Value of enthesis elemental lesions
The number and score of elemental lesions in patients with SpA were significantly greater than in controls (fig 3). After determination of the number and score of elemental lesions, the appropriate value of each lesion was investigated. Three different statistical models were used to estimate the accurate value of every elemental lesion. The chosen logistic regression model overestimated three elemental lesions and established that the best predictive value was reached when calcifications were scored on a semiquantitative scale of 0–3, Doppler and erosions were scored as 0 or 3 points and scores for tendon structure, tendon thickness and bursa were either 0 or 1 (table 2). Main components and latent class models showed similar results with minor differences and a smaller area under the curve. US abnormalities can be easily seen in healthy controls; fig 3 shows that the extent of abnormalities rather than the presence of them discriminates between SpA and controls.
Validity of the sonography enthesitis index in SpA
To explore the concurrent criterion validity, ROC curves were determined from the values assigned to each elemental lesion by the logistic regression model. As result, a value of ⩾18 was established as the best cut-off point to differentiate between cases and controls (table 1). The ROC area under the curve was 0.89 (95% CI 0.80 to 0.98).This cut-off point was exceeded by 5/29 controls (17%) and by 21/25 patients with SpA (84%). The mean (SD) value of the MASEI score was 12.96 (7.84) in the control group and 25.44 (7.92) in the patients with SpA (p<0.001). Healthy women had a lower US enthesis index (9.50 (4.14)) than women with SpA (25.25 (6.84)); p<0.05. Healthy men had also a lower US score than men with SpA, 14.79 (8.77) vs 25.78 (10.02); however, this difference was not significant. No gender differences were found between men and women in the SpA group (men 25.78 (10.02) vs women 25.25 (6.84)). In comparison with the “gold standard”, the MASEI achieved a sensitivity of 83.3%, specificity of 82.8%, positive likelihood ratio (LR+) of 4.8% and negative likelihood ratio (LR−) of 0.2%.
The inter-reader agreement of cases and controls among the six readers was ICC of 0.60 (95% CI 0.42 to 0.76; p<0.001). Readers 1 and 2, who had obtained the images and developed the index, had an ICC of 0.83 (95% CI 0.64 to 0.92; p<0.001). Table 3 shows the sensitivity, specificity, percentage of subjects correctly assigned (cases and controls) for each reader and ICC of every reader pair. The US interexplorer reliability assessed by readers 1 and 2 achieved an interexplorer agreement, with an ICC of 0.86 (95% CI 0.70 to 0.94); p<0.001.
To the best of our knowledge, this is the first study to investigate the validity of enthesis ultrasonography for diagnostic classification in the SpAs. These results are encouraging and open new insights for the use of enthesis US in SpA.
Previous studies in this field have provided important and preliminary data about the relevance of enthesis US in SpAs.8 9 13–16 As in other studies, our results demonstrated a high prevalence of abnormal peripheral enthesis in patients compared with controls (fig 3).8 9
To use an enthesis US score for diagnostic purposes, a strict methodological process that includes face validity, content validity, criterion validity, construct validity, discriminant validity and feasibility must be followed. In accordance with this approach, our study included additional dimensions of validity which have not been previously assessed in enthesis US.
The first objective was to develop a sum score (table 2) with a degree of sensitivity and specificity sufficient to differentiate SpA from controls. Balint et al developed the GUESS (Glasgow Ultrasound Enthesitis Scoring System), a quantitative US score of lower limb enthesis and showed that most of the enthesopathies that were demonstrated with US were not detected when clinical examination was performed.8
The GUESS index has face and content validity because it measures what we theoretically are supposed to measure and it covers different aspects of the enthesis such as calcifications, thickness, erosions and bursae in multiple enthesis. To improve the face and content validity, our score added both structural aspects of the enthesis (loss of fibrillar pattern, hypoechoic aspect and fusiform thickening) and the power Doppler signal, as a useful tool to evaluate enthesis and bursa blood flow.9 16
Another aspect of face validity is the determination of which enthesis should be scanned. For this purpose, the most representative and most commonly affected entheses reported in previous studies were chosen.8 9 16 In addition, an enthesis of the upper limb was desirable, based on the hypothesis that at this level there will be less influence of possible mechanical problems than in the lower limb. The olecranon was explored because in previous experience it affected 60% of the patients with SpA. The epicondyles were not examined as other studies did,9 because that would have increased the number of entheses to explore and could potentially introduce mechanical issues. Furthermore, the mathematical calculations improved the accuracy and validity of the value of every elemental lesion in the final index as shown in the ROC curve results. This statistical approach has not previously been used for calculation of the values assigned to build SpA US scores.8 9 16
The concurrent criterion validity (the degree that a measure or test reflects a “gold standard” applied to the same subject) was previously determined by D’Agostino et al.9 Their research clearly showed differences of US enthesis findings between patients with SpA, rheumatoid arthritis and mechanical back pain.9 Our study went one step further in exploring the usefulness of US as a diagnostic tool. This test showed a good sensitivity, specificity, LR+ and LR−. The validity of the score depends on the 12 entheses, because a smaller number or unilateral exploration reduces the area under the curve of the score. It was the number of elemental lesions rather than the presence or absence of the lesions which discriminated between SpA and controls. A possible bias might arise if the US rheumatologist recognised that patients had the disease. However, owing to blinding of the study this was only be possible on a few occasions, and therefore we think that this bias did not occur.
The gender-matched subanalysis of this study showed that control women had significantly fewer lesions in enthesis than men. As a result of this observation, future studies including a control population might be gender matched. On the other hand, the US score in patients with SpA is as sensitive in women as in men and the MASEI score can be used in both. This finding has relevant clinical implications in women, in whom the diagnosis of SpA with classical methods is difficult, owing to both less extensive decrease in spinal mobility and less radiological damage.
Power Doppler ultrasonography has demonstrated increased vascularity, which is related to inflammation that occurs in enthesitis,9 16 as well as sensitivity to change, discriminant validity, another aspect of validity of construct.17 18 These properties make power Doppler US a relevant component of a US score. In this study, power Doppler had a good sensitivity and specificity in the global score. Furthermore, most patients in this cross-sectional study were not clinically active, but 60% had a Doppler signal in at least one enthesis. Occasionally, a power Doppler signal was also seen in controls, but to a lesser extent than in patients. This study used only a cortical power Doppler signal with good results.9 This measure is probably more specific, but the sensitivity may be lower, which might decrease its strength when used for diagnostic applications. Differences among US machines might also account for these disparities.
Another dimension to consider in the validation process is reproducibility. Previous reports have studied the intraobserver and interobserver reliability of US in enthesis.8 9 16 This study investigated inter-reader and interexplorer reliability. The inter-reader study had the distinctiveness of six readers from five different centres and this most probably increased the strength of the score, because reliability is always greater among investigators working in the same centre. The results showed that MASEI can be implemented by other investigators. Nonetheless, experience and training is crucial to improve the results. Table 3 shows that readers who were familiar with the development of the index (readers 1 and 2) had better sensitivity and a better percentage of correctly assigned cases and controls (ICC = 0.83) than the other readers. However, readers 3, 4 and 5 who had not participated in the score development reached ICC agreements comparable to those of readers 1 and 2. Only reader six had a tendency to give few points while scoring. In conclusion, training can improve the reliability of the index and we recommend practising on a sufficient number of patients and controls before clinical application.
To our knowledge, this is the first report of a systematic analysis of enthesis interexplorer acquisition with power Doppler (interobserver reliability of examination) and it shows good reliability, with an ICC of 0.86 (95% CI 0.70 to 0.94). A previous study without Doppler,10 showed an ICC of 0.72 (95% CI 0.56 to 0.83).
The MASEI score required <20 min to perform. Therefore, it is feasible and efficient in comparison with other diagnostic procedures.
Finally, the value of the index is not that it satisfies traditional SpA classification criteria and diagnoses patients with a sensitivity of 83.3% and a specificity of 83%, rather, its greatest importance is in developing a US score that can be used in early SpA, which is difficult to diagnose. Before accomplishing this, the ability of the score to classify patients and controls correctly must be studied. The preliminary data applying the MASEI in patients with early SpA show that the proposed index has a similar utility in the preradiographic stage of this disease to that in established disease.19 The clinical New York criteria have shown a good specificity but a low sensitivity, which contributes to a delay in diagnosis of months or years.3 The sensitivity and specificity of 84% and 82.8%, respectively, reached a satisfactory level that makes it possible to include the MASEI score in a core set of diagnostic criteria for SpA. On the other hand, the LR+ of 4.83 can be used in an algorithm based on Bayes’ theorem to allow calculation of the probability of disease in any individual patient with probable SpA.20 As an example, a pretest probability of 5% is increased by a positive MASEI only by 20% and decreased to 1% if MASEI is negative. However, used in the clinical setting, with more signs or symptoms it can be decisive.
In summary, this US enthesis study demonstrated that the proposed US enthesis score is reliable. In addition, it showed that enthesis US can be a valid diagnostic tool for SpA in both men and women. Although more data are needed, these results are promising and encourage further research.
We thank Dr Loreto Carmona and the Rheumatology Spanish Foundation for statistical advice.
Competing interests: None.
Funding: This study was supported by a grant from Whyet Pharma Spain.
Ethics approval: Approved by the hospital ethics committee.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.