Article Text

Concise report
Examination of intra and interrater reliability with a new ultrasonographic reference atlas for scoring of synovitis in patients with rheumatoid arthritis
  1. Hilde Berner Hammer1,
  2. Pernille Bolton-King1,
  3. Vivi Bakkeheim2,
  4. Torill Helene Berg1,
  5. Elisabeth Sundt1,
  6. Anne Katrine Kongtorp1,
  7. Espen A Haavardsholm1
  1. 1Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
  2. 2Department of Rheumatology, St Olavs Hospital, Trondheim, Norway
  1. Correspondence to Dr Hilde Berner Hammer, Department of Rheumatology, Diakonhjemmet Hospital, Box 23, Vinderen, N-0319 Oslo, Norway; hbham{at}


Objective Synovitis in patients with rheumatoid arthritis (RA) may be scored semiquantitatively (0–3) for B-mode (BM) and power Doppler (PD) ultrasonography. The objective was to assess the reliability of BM and PD examinations with a novel ultrasonographic atlas as reference.

Methods Representative ultrasound images (including scores 0–3) of BM and PD from 24 different joints were collected to develop an ultrasonographic atlas. Ten RA patients were assessed twice by five rheumatologists performing BM and PD scoring (0–3) of 16 joints bilaterally (metacarpophalangeal 1–5, wrist (radiocarpal, intercarpal, radioulnar), elbow, knee, talocrural and metatarsophalangeal 1–5), with the novel ultrasonographic atlas as a reference.

Results The median (range) percentages of exact agreements for BM/PD assessments were 73.1 (70.3–80.6)/83.7 (76.7–87.6) and for close agreement 98.1 (96.2–99.7)/98.0 (96.8–98.4) with weighted κ values of median (range) 0.77 (0.70–0.83) for BM and 0.83 (0.73–0.86) for PD. The intrarater intraclass correlation coefficients (ICC) for BM/PD scores were 0.95 (0.93–0.99)/0.97 (0.95–0.99) and interrater ICC were 0.95 (0.86–0.99)/0.97 (0.94–1.00). Scoring of 32 joints was completed in median 15 min (range 12–20).

Conclusion With the use of an ultrasonographic atlas as reference high intra and interrater reliability was found for BM and PD scoring. This novel atlas may be a useful resource in clinical practice and research.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Ultrasonography is increasingly being used to assess synovitis in patients with rheumatoid arthritis (RA). The definitions of synovitis (grey scale or B-mode (BM)) and vascularisation (assessed by the use of power Doppler (PD)) have been approved by OMERACT1 and the degree of BM synovitis and PD activity is usually scored on a four-point semiquantitative scale (0–3).2,,5

Ultrasonography is an operator-dependent method, but several studies have shown fair to good reliability of the semiquantitative scoring.6,,9 A recent review of the reliability of ultrasound scoring in RA patients found high intra and interrater reliability of still images, whereas only a few studies had assessed the reliability of the acquisition of ultrasound.10 The introduction of an MRI atlas11 and formal training and calibration of readers have previously been shown to improve reliability in MRI studies,12 and a similar approach would potentially be useful also for ultrasound.

The objective of the present study was to assess the intra and interrater reliability of ultrasound scoring performed by five rheumatologists with different levels of ultrasound experience. A novel ultrasonographic atlas was used as a reference.

Patients and methods


Ten RA patients13 were included (median age 63 years; range 34–75) and disease duration 7 years (range 3–12); 70% were women, and the patients received methotrexate monotherapy (n=3), triple therapy (methotrexate, sulfasalazine and hydroxychlorochine) (n=1), methotrexate and biological medication (rituximab (n=2), etanercept (n=1), infliximab (n=1)), etanercept monotherapy (n=1) and prednisolone monotherapy (n=1).

Ultrasonographic atlas

A novel ultrasonographic atlas was developed by HBH including representative images of 24 peripheral joints with standard scanning14; (proximal interphalangeal 1–5, metacarpophalangeal 1–5, wrist (radiocarpal, intercarpal and radioulnar), elbow, shoulder (glenohumeral and acromioclavicular), hip, knee, talocrural and metatarsophalangeal 1–5). In the atlas a photo shows the localisation of the probe and ultrasound images of normal and inflamed joints. The atlas includes semiquantitative scores for the presence of BM (combined score for synovitis and joint fluid) and PD with 0 = none, 1 = minor, 2 = moderate or 3 = major presence of ultrasonographic pathology. The atlas is composed of a total of 768 characteristic ultrasound images (24 joints with four different score levels, four examples of each score level and both BM and PD of each score) (figure 1 and supplementary figure S1, available online only, including the joints presently assessed).

Figure 1

Ultrasonographic images of the metacarpophalangeal (MCP) 2 (A) and metatarsophalangeal (MTP) 2 (B) joints (from the ultrasonographic atlas, see supplementary figure S1, available online only).

Sonographers and calibration of ultrasound scoring

Five sonographers (rheumatologists) with 3–8 years of ultrasound experience participated in the reliability exercise. HBH had extensive practice in joint scoring, whereas the others were introduced to scoring and to the use of the ultrasonographic atlas in connection with this study. The joints selected for assessment were metacarpophalangeal 1–5, wrist (radiocarpal, intercarpal and radioulnar), elbow, knee, ankle (talocrural) and metatarsophalangeal 1–5 joints bilaterally (a total of 32 joints). Four weeks before the reliability exercise the sonographers met for 2 days of ultrasound training. The new ultrasonographic atlas was used as a reference standard and the meeting included calibration sessions with discussions of several joint images, as well as practical sessions with dynamic joint scoring. Then a training set including 120 images of relevant joints with different degrees of disease activity was sent to each of the sonographers to score individually. The results were discussed the day before the final part of the study to reach consensus on the scoring method using the ultrasonographic atlas as reference.

The reliability study was conducted on one day. All 10 RA patients were assessed twice by each assessor using five similar Siemens Antares Sonoline machines (Siemens Medical Solutions, Mountain View, CA, USA) with linear probes (5–13 MHz and setting at 11.4 MHz) and identical settings optimised for PD in superficial joints (PRF 391 Hz, low wall filter and frequency 7.3 MHz).15 The patients were assessed in a random order for the first and second ultrasound examination (with at least 1.5 h between the two examinations). The sonographers had both digital and printed versions of the ultrasonographic atlas available during the examination, and they were allowed a maximum of 20 min to perform the BM and PD scoring (0–3). For elbow, knee and ankle joints there was only BM scoring, whereas both BM and PD scoring was performed in metacarpophalangeal 1–5, each of the three wrist joints and metatarsophalangeal 1–5 joints bilaterally. Details of the scanning are given in table 1. The time consumed and the scores of each patient were recorded (images were not stored), and score sheets were stored separately after finishing each ultrasound examination. The assessors did not have access to these results later. The calculations on the scoring results were executed by EAH who did not perform ultrasonography.

Table 1

The joints presently assessed and the scans used for the semiquantitative scoring (0–3) of B-mode and power Doppler

Laboratory and clinical examinations

C-reactive protein and the erythrocyte sedimentation rate were assessed (with routine inhouse measurements). A trained study nurse (AKK) performed the assessment of tender and swollen joints. The patient's evaluation of disease activity visual analogue scale was assessed and the disease activity score in 28 joints was calculated.


Intra and interrater reliabilities were evaluated using a two-way mixed effects model using a consistency definition, in which the between-measure variance is excluded from the denominator variance, and both single measure and average measure intraclass correlation coefficients (ICC) were calculated for total scores of both BM and PD. The average measure ICC corrects for the number of readers and was calculated for the interreader reliability. In addition, weighted κ values were calculated on a joint-by-joint level for both BM and PD scores. ICC values and κ values are comparable; scores above 0.60 are considered good and scores above 0.80 are very good. The percentage exact agreement and percentage close agreement were also calculated on a joint-by-joint level for both BM and PD scores. The percentage close agreement was defined as scores within ±1 interval for both BM and PD.


The median (range) of the laboratory and clinical variables of the patients were: C-reactive protein 15 mg/l (1–96), erythrocyte sedimentation rate 27 mm/h (10–107), number of swollen and tender joints (of 28) six (1–17) and three (0–19), respectively, and the disease activity score in 28 joints 4.4 (2.9–8.2).

The ultrasound examinations of 32 joints were performed during median (range) 15 min (12–20). A BM score of 1 or greater was present in a median (range) of 148 joints (104–177) (of 317 applicable joints) and 84 joints (69–94) had a PD score of 1 or greater (of 257 joints assessed by PD). The median (range) was 28.0 (14–63) for the BM total score and 18.4 (7–56) for the PD total score.

For BM and PD the median (range) percentages of intrareader exact agreements were 73.1 (70.3–80.6) and 83.7 (76.7–87.6), respectively, and of close agreements 98.1 (96.2–99.7) and 98.0 (96.8–98.4), respectively. The weighted κ values were median (range) 0.77 (0.70–0.83) for BM and 0.83 (0.73–0.86) for PD (table 2).

Table 2

Intrarater reliability for the five rheumatologists of B-mode and power Doppler examinations on a joint-by-joint level with weighted κ, percentage exact agreement and percentage close agreement

The intrarater ICC were median (range) 0.95 (0.93–0.99) for BM scores and 0.97 (0.95–0.99) for PD scores and interrater ICC were median (range) 0.95 (0.86–0.99) for BM scores and 0.97 (0.94–1.00) for PD scores.


High intra and interrater reliability was demonstrated in this study. A novel ultrasonographic atlas was used as a reference standard, and an extensive training and calibration of the scoring system was performed before the scoring exercise.

The five sonographers did not have any major differences in their intra and interreader reliability, even if only one of them had previous experience in ultrasound scoring systems. This encouraging observation indicates that scoring of synovitis is possible to learn in a short time (as has been shown for ultrasonography in general).16,,18 The reference atlas (supplementary figure S1, available online only) provides readers with a new tool for the standardised assessment of RA joints, making it possible to score ultrasound-assessed joints according to the best possible match with reference images.

There is currently no consensus regarding which joints should be assessed by ultrasound during the follow-up of RA patients on medical treatment. Previous studies have explored seven19 to 78 joints.5 In addition, tenosynovitis is common in RA patients, and a recent study indicated that ultrasound scoring of a few tendons is highly responsive to biological treatment.20 The optimal number and scanning of joints and tendons for ultrasound assessments should be explored further.

When the variance between patients is high, eg, the joint inflammatory activity in a mixed RA population as in this study, the ICC are likely to be higher than in a homogeneous population of for example patients in remission. The RA patients in this study had a broad range of disease activity as well as disease duration and were assumed to be representative of established RA patients.

A limitation of the present study was that only 10 patients were assessed. However, a similar number of patients was used in another reliability study.12 A strength of our study was that the five assessors had different levels of experience, which supports the external validity of the findings. Another strength is that this reliability study included the acquisition of ultrasound images, which has been emphasised to be greatly needed.10

The present study showed that rheumatologists after training and calibration had a high reliability in their performance of ultrasound assessments of 32 joints in RA patients. An ultrasonographic atlas with reference images was used for each of the possible scores and we suggest that this tool (figure 1 and supplementary figure S1, available online only) may be useful both in clinical practice and in clinical trials.


Supplementary materials


  • Funding The study was supported by an unrestricted grant from Abbott Norway.

  • Competing interests None.

  • Ethics approval This study was conducted with the approval of the regional committee for medical and health research ethics (REK), south-east.

  • Provenance and peer review Not commissioned; externally peer reviewed.