Objective: To assess the intra and interobserver reproducibility of musculoskeletal ultrasonography (US) among rheumatologists in detecting destructive and inflammatory shoulder abnormalities in patients with rheumatoid arthritis (RA) and to determine the overall agreement between US and MRI.
Methods: A total of 14 observers examined 5 patients in 2 rounds independently and blindly of each other. US results were compared with MRI. Overall agreement of all findings, of positive findings on MRI, as well as intra and interobserver reliabilities, were calculated.
Results: Overall agreement between US and MRI was seen in 79% with regard to humeral head erosions (HHE), in 64% with regard to posterior recess synovitis (PRS), in 31% with regard to axillary recess synovitis (ARS), in 64% with regard to bursitis, in 50% with regard to biceps tenosynovitis (BT), and in 84% for complete cuff tear (CCT). Intraobserver and interobserver κ was 0.69 and 0.43 for HHE, 0.29 and 0.49 for PRS, 0.57 and 1.00 for ARS, −0.17 and 0.51 for bursitis, 0.17 and 0.46 for BT and 0.52 and 0.6 for CCT, respectively. The intraobserver and interobserver κ for power Doppler (PD) was 0.90 and 0.70 for glenohumeral signals and 0.60 and 0.51 for bursal signals, respectively.
Conclusions: US is a reliable imaging technique for most shoulder pathology in RA especially with regard to PD. Standardisation of scanning technique and definitions of particular lesions may further enhance the reliability of US investigation of the shoulder.
Statistics from Altmetric.com
Shoulder involvement is a challenging issue in rheumatoid arthritis (RA), bringing about a deleterious impact on the quality of life in many of those affected by the disease.1–5 In RA, ongoing synovial inflammation may lead not only to erosive shoulder disease, but also to rotator cuff rupture. Taking the severity of morbidity and the serious complications of shoulder pathology for patients with RA into account, detecting synovitis at an early stage is a key issue for prevention of irreversible damage.
Ultrasonography (US) is an imaging modality that is now widely accepted in rheumatology research and clinical practice to visualise joints and soft tissues in patients with various rheumatic diseases. US is able to not only image the damage to cartilage and bone, but also to identify tendon pathology and synovial inflammation. Patients are likely to have shoulder tendon disease if US is abnormal.6–8 Despite the increasing use of US, the technique is regarded as examiner-dependent. Furthermore, notwithstanding increasing data on the reliability of US in the evaluation of small joints of the hand and the feet, there is a clear paucity of studies regarding reliability of US for other joints.9–12
In the light of these limitations, we undertook a first step in investigating these issues for patients with established RA and shoulder disease. We addressed the agreement between US and MRI, as well as the intra and inter-reader variability among rheumatologists with experience in musculoskeletal US.
PATIENTS AND METHODS
Five patients with symptomatic shoulder disease and RA were selected from the Barcelona University Rheumatology Unit. There were two men and three women, with a median age of 64.8 (55 to 76) years and median disease duration of 6.6 (1 to 10) years. All had established RA according to the American College of Rheumatology (ACR) 1987 criteria for RA. All patients were investigated twice (ie, the procedure was repeated during the afternoon session), with rearrangement of the patients in a different order and on a different location.
Observers consisted of a group of 14 rheumatologists from 9 countries with variable expertise (median experience 10 years, range 3 to 16 years) in musculoskeletal US. All were members of the Outcome Measures in Rheumatology (OMERACT) US group. The observers met for 1 day to perform the investigation. The sonographers performed the US investigation independently from each other. The observers were blinded to the clinical details and MRI results. All investigators met for a brief training session before the exercise, to review the scoring method and for initial training of observers not familiar with some aspects of the scoring system or the US machine. A statistician was on hand to receive the filled score sheets. The score sheets from the morning session were sealed in envelopes until the second session was concluded.
All scans were performed using a Siemens Acuson Antares (Siemens, Erlangen, Germany) machine with a 7.5 to 15 MHz linear array transducer. The shoulder scoring system assessed elements of inflammation, as well as structural tendinous and bony damage. Rotator cuff tendons were investigated for the presence of total or partial tears in a longitudinal and a transverse plane on static and dynamic positions. The synovial structures of the shoulder including subacromial-subdeltoid bursa, sheath of the long biceps tendon and the axillary and posterior recess of the glenohumeral joint, were examined for the presence of effusions and synovial hypertrophy. The humeral head was examined for the presence of erosions. Power Doppler assessment of selected synovial sites including biceps sheath, subacromial-subdeltoid bursa and axillary and posterior recesses was carried out with settings standardised to a pulse repetition frequency of 400 to 500 Hz and low wall filters. The power Doppler gain was adjusted to a level just below the disappearance of colour signs under the bony cortex as recommended by Rubin et al.13 OMERACT definitions for joint effusion, synovial hypertrophy, tenosynovitis and bone erosions were adhered to.14 The following definitions for the classification of ultrasonographic findings were used: cortical irregularities >2 mm were considered as erosions; a hypoechoic area of at least 3 mm around the long head of the biceps tendon was considered a tenosynovitis of the long biceps tendon, bursal thickness >3 mm or effusion as effusion/synovial hypertrophy of the subacromial–subdeltoid bursa, >3 mm effusion/synovial hypertrophy at the posterior recess superior to the glenoid labrum as synovitis, >3 mm effusion/synovial hypertrophy at the axillary recess as synovitis. No ultrasonographic distinction was made between effusions and synovial hypertrophy and these abnormalities were taken together for the analyses.
Assessment of the affected shoulder by MRI took place within 5 working days prior to the US investigation in all patients. MR imaging was performed with a 1,5-T unit (Signa Excite, General Electric, Milwaukee, Wisconsin, USA) using a flexible wrap-around coil.The following sequences were used: T1-weighted spin-echo sequence (repetition time (TR) of 500 ms, echo time (TE) of 13,6 ms, slice thickness (SL) 4 mm and fields of view (FOV) of 140–160 mm in an axial, transverse and oblique coronal slice orientation parallel to the course of the tendon of the supraspinatus; T2-weighted fat suppressed images in a coronal and a sagittal plane with a TR of 3300 ms and a TE of 71 ms.
After selection of a suitable slice on which abnormal changes were visualised, a dynamic contrast-enhanced study gadolinium diethylenetriaminepentaacetic acid (GD-DTPA; 0.1 mmol/kg of body weight) was performed using an axial T1-weighted (TR of 500 ms, TE 13.6 ms and FOV 140–160 mm), axial contrast-enhanced fat suppressed T1-weighted and coronal oblique contrast-enhanced fat suppressed T1-weighted sequences were performed.
The MRI scans were evaluated by two radiologists who were in consensus and had no knowledge of the results of the ultrasonography. The MRI scans were analysed for the presence or absence of the same structures that were visualised by ultrasonography (table 1). The MRI criterion for effusion was an intra-articular or intrabursal area with a high signal on T2-weighted sequences without contrast enhancement on fat-suppressed T1-weighted sequences. The criterion for synovitis was enhancing material seen on the fat-suppressed T1-weighted sequences.15
Overall agreement between US and MRI was calculated for each observer. Averaged overall agreement and kappa index (κ) are shown. Since Cohen κ is artificially low in cases of high or low prevalence, we have used κ adjusted by prevalence and bias instead of κ standard.16 17 Furthermore, the mean positive and negative percentages of agreement were calculated.
Intraobserver reliability is presented as overall agreement between the first and second round for each scan and κ adjusted by prevalence and bias. Agreement indexes were interpreted as follows: 0.81–1.00 excellent agreement; 0.61–0.80, good agreement; 0.41–0.60, moderate agreement; 0.21–0.40, fair agreement; 0.00–0.20, slight agreement and <0.00, poor agreement.
Interobserver reliability was studied by calculating the generalised Scott π, also known as majority κ. The Scott π is the accepted standard for interobserver reliability for nominal data in communication studies, when there are more than two observers. Comparable to κ, π discounts the level of “observed agreement” by the level of “expected agreement” due to chance in the following way: π = (observed agreement–expected agreement)/(1–expected agreement). Because there were 14 observers, agreement was declared (observed agreement) when at least 10 out of 14 observers assigned the same score to a given US scan. The probability of having at least 10 equal scores by chance was the expected agreement used to compute majority κ.
MRI investigation reported biceps tenosynovitis of the long biceps tendon, glenohumeral synovitis, subacromial-subdeltoid bursitis and humeral head erosions in four out of five patients with RA. Rotator cuff tears, either partial or complete, were seen on MRI in fewer patients. No total cuff tear of the infraspinatus tendon was found with MRI (table 2).
Table 3 lists the mean overall agreement (for either the presence or absence of pathological findings, ie, the accuracy) between the US observations and the MRI findings, as well as the positive agreement (ie, sensitivity, when the pathological finding is present on the MRI) and the negative agreement (ie, specificity, when the pathological finding was not present on the MRI). Table 2 demonstrates that good agreement between US and MRI was found for the presence/absence of erosions of the humeral head. Presence or absence of glenohumeral synovitis was found with moderate to good agreement. Regarding posterior recess synovitis, the mean agreement between MRI and US assessment was much better (64%) than for axillary recess synovitis (31%). Excellent agreement was found between US and MRI for the presence or absence of a complete rotator cuff tear, whereas with regard to partial cuff tears, US demonstrated a lower sensitivity.
Table 4 lists the mean intraobserver agreement and the corresponding mean κ values. The mean overall agreements for intraobserver reproducibility ranged from moderate to excellent. Excellent intraobserver agreement was observed for humeral head erosions, complete and partial tears of subscapularis and infraspinatus tendon and power Doppler signals regarding the glenohumeral joint and bursa. The mean κ value for intraobserver reproducibility for humeral head erosions was good (0.69). The mean κ values for synovitis were moderate (0.57) and fair (0.29) for axillary recess synovitis and posterior recess, respectively, whereas the κ value for bursitis was poor (−0.17). The mean κ values for synovial power Doppler flow in the joint recesses were excellent, whereas that for the power Doppler flow in the bursa was good. A poor κ value was found for long biceps tendon tenosynovitis. According to κ values, the intraobserver reproducibility for partial cuff tear ranged from slight to moderate, for total cuff tear from poor to moderate.
Table 5 lists the majority κ values for interobserver agreement. Increasing κ values signify better agreement between the 14 observers. The mean interobserver κ value for bony erosions was moderate (0.43). The κ value for tenosynovitis of the long biceps tendon was moderate (0.46), and a poor κ value for the power Doppler signal within the tendon sheath was found. Mean κ values for glenohumeral joint synovitis ranged from moderate (0.49) to excellent (1.0), with excellent interobserver mean majority κ values for the presence of power Doppler signal for either the axillary joint recess or the posterior recess. Interobserver κ for bursitis was moderate (0.51), with a negative mean κ value for PD signal within the bursa. The κ for partial cuff tear scored a good agreement, as well as the interobserver agreement for complete cuff tear.
This is the first study undertaken to date that focus on validation of ultrasonography-detected shoulder abnormalities in patients with established RA. We used MRI as the reference imaging technique. The results show that US can reliably assess evidence of joint destruction (ie, rotator cuff tears and erosions of the humeral head). In addition, our study shows that US can reliably assess signs of joint inflammation (eg, synovitis of the posterior recess and Doppler signals). This is also the first study to examine the US reproducibility of erosions and synovitis in the shoulder among rheumatologists-sonographers; however, the results indicate that for a limited number of shoulder changes, US still is an imaging technique with a wide spectrum of intra and interobserver variability.
Various studies on patients with shoulder disease in RA have demonstrated that US is comparable to MRI in being more sensitive than radiography in detecting bone erosion.18–21 One study showed that erosions were reported by US in 30 patients and by MRI in 39 patients, vs 26 by conventional radiography.18 19 The studies of Scheel et al20 and Naredo et al21 demonstrated that an excellent agreement between US and MRI existed for humeral head erosions (84.5 and 100%, respectively), with a similar or lower detection rate of synovitis or effusion of the shoulder (88.5 and 50%, respectively). In our study, although radiography was not used as a comparator, we confirmed the high agreement level between US and MRI for detecting erosions of the humeral head. As to humeral head erosions detected by US, our mean κ values showed a good intraobserver and a moderate interobserver reliability.
We considered partial or complete cuff tear as a second outcome parameter of chronic damage of the shoulder in RA. We found an excellent agreement between the US and the MRI for complete cuff tear and, as expected, lesser agreement for partial cuff tear. The intra and interobserver agreements for partial and complete cuff tears varied from poor to excellent. Again, these results are consistent with earlier studies, ranging from 60% to 94%.6 7 22 23
Furthermore, we compared the agreement of US and MRI in detecting inflammatory changes (ie, synovitis, bursitis and long biceps tendon tenosynovitis). The detection of early inflammation of a joint is of key importance for the optimal management of patients with RA. Since the visualisation of synovial hypertrophy of the shoulder joint is extremely difficult with US due to its deep anatomic site, we did not actually diagnose synovitis by the detection of synovial hypertrophy, but by the presence of effusion and by the presence of power Doppler flow. As most synovitis—but not all—is accompanied by effusion, this probably introduces a small methodological error. All inflammatory conditions were found more frequently by MRI than by US. Our study was able to detect synovitis of the posterior process in over 60% of patients, indicating a high sensitivity, whereas the agreement for synovitis of the axillary recess was much lower. If the conclusion is justified that the posterior recess is more sensitive to synovitis, it would significantly shorten the US examination time.24 With regard to bursitis, there was a high agreement between US and MRI, but a negative mean interobserver κ value. Moreover, the mean interobserver κ value for biceps tenosynovitis was poor. The poor and negative mean κ value mainly have technical reasons (ie, due to the high prevalence of the abnormality on the MRI the chance is relatively high that the US examination will effectively find the abnormality). Because the adjusted κ corrects for this high chance, the κ value will become negative when the observed US findings are less than the expected US findings.
These findings may lead to the conclusion that in cases of RA, signs of shoulder synovitis should be looked for at the posterior recess and not in the axillary recess. Furthermore, the intra and interobserver agreement for the power Doppler signal in the posterior and the axillary recess of the shoulder were excellent, suggesting that power Doppler signal may be used in multiple center studies as a parameter for active shoulder synovitis.
The lowest sensitivities of US were found for assessing axillary recess synovitis, long biceps tenosynovitis or partial cuff tears. It cannot be precluded that MRI overdiagnosed these findings (eg, some cases of tendonitis might have been interpreted as partial cuff tears). Although US is a dynamic investigation, thus improving its sensitivity to detect for example small quantities of fluid or small cuff tears, it was noted that not all observers fully used the dynamic approach. Moreover, some investigators were not familiar with the equipment and the scanner settings. The level of experience was also different for each sonographer, but we did not examine whether there was a correlation between the investigator’s experience and his US performance. Perhaps a 10 min investigation, chosen because this amount of time reflected a busy clinical practice, was for some investigators too short to perform a thorough examination.
A limitation of our study is the small number of patients. Since there were only 5 available machines and 14 sonographers, the experiment was set up in such a way that as much information as possible would be obtained in a single working day. A larger sample including healthy persons and patients with various degrees of shoulder disease could yield more accurate information regarding the index of agreement between observers and test-retest reliability, but this would require a longer experiment.
In summary, this study shows that US is a reliable method for detection of erosions and complete cuff tear, and also reliable in detecting synovitis of the posterior recess of the glenohumeral joint and subdeltoid–subacromial bursitis. The 14 ultrasonographers/rheumatologists were able to detect these changes with a moderate to good interobserver reproducibility and similar intraobserver reproducibility. More studies are warranted focusing on improvement of US diagnosing of particularly biceps tenosynovitis, partial cuff tears and axillary synovitis.
The authors would like to thank Roche Pharmaceuticals Espana for funding the study. The authors would also like to thank Siemens for providing five US machines.
Competing interests: None declared.
Funding: Roche Pharmaceuticals Espana funded this study.
Ethics approval: Ethics approval was obtained.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.