Article Text

Download PDFPDF

Concise report
A 78-joints ultrasonographic assessment is associated with clinical assessments and is highly responsive to improvement in a longitudinal study of patients with rheumatoid arthritis starting adalimumab treatment
  1. Hilde Berner Hammer,
  2. Margareth Sveinsson,
  3. Anne Katrine Kongtorp,
  4. Tore K Kvien
  1. Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
  1. Correspondence to Dr Hilde Berner Hammer, Department of Rheumatology, Diakonhjemmet Hospital, Box 23, Vinderen, N-0319 Oslo, Norway; hbham{at}


Objectives To examine associations between ultrasonography (US) assessments (B-mode (BM) and power Doppler (PD)) of a large number of joints and traditional assessments of disease activity, and to examine the sensitivity to change of the US scores and clinical measures in patients with rheumatoid arthritis (RA).

Methods Twenty patients with RA initiating adalimumab treatment were examined at baseline and after 1, 3, 6 and 12 months with US (BM and PD) using an Outcome Measures in Rheumatoid Arthritis Clinical Trial semiquantitative scoring (0–3) of 78 joints as well as assessment of clinical and laboratory variables with calculation of composite indexes.

Results The US scores were associated with composite scores as well as clinical and laboratory variables (r=0.41–0.84, p<0.05–0.001 at 12-months' follow-up). Compared with clinical assessments, US detected higher numbers of inflamed joints. The US scores decreased after 1 and 3 months (p<0.005) and PD showed the highest percentage improvement. Both BM and PD had high standardised response means throughout the study (−0.83 to −1.27), of similar magnitude to composite indexes, but higher than the clinical and laboratory variables.

Conclusions The comprehensive US assessments were associated with clinical and laboratory variables of disease activity and were highly sensitive to change during treatment with biological agents.

View Full Text

Statistics from


A valid and sensitive assessment of disease activity is necessary for optimal treatment of patients with rheumatoid arthritis (RA). Ultrasonography (US) has been shown to be both valid and reliable for assessing joint inflammation1,,4 and to be sensitive to change during treatment with biological agents.4,,11 Several methods for quantifying joint synovitis (B-mode (BM)) and vascularisation (power Doppler (PD)) in RA have been used, most of them as described by the Outcome Measures in Rheumatoid Arthritis Clinical Trial (OMERACT) with a semiquantitative scale from 0 to 3.2,,4 6,,11

Several studies have reported significant associations between US findings in a limited number of joints and clinical and laboratory markers of inflammation,4 12 and both BM and PD scores of a limited number of joints have been reported to decrease during treatment with biological agents.4,,13 One of the objectives of this study was to examine whether US assessment of a large number of joints would produce even higher associations with clinical and laboratory examinations than has been found previously with the use of a smaller number of joints. The other, and most important, objective was to examine sensitivity to change of a comprehensive US assessment as well as traditional clinical variables during treatment with a biological agent.

Patients and methods

A total of 20 patients (median (range) age 53 (21–78) years, disease duration 7.5 (1–26) years, 15 women, 70% rheumatoid factor positive) with RA14 were included on the same day that they started treatment with adalimumab (40 mg every second week) as the first biological agent. All patients used metho-trexate as a basic disease-modifying antirheumatic drug, 14 patients were additionally receiving prednisolone (median (range) dose 7.5 (3.75–15) mg) and two patients used daily non-steroidal anti-inflammatory drugs.

US examinations were performed by one experienced sonographer (HBH) at baseline (the day of inclusion) and after 1, 3, 6 and 12 months with a 5–13 MHz probe and fixed settings optimal for PD signals (Siemens Antares, Sonoline; Siemens Medical Solutions, California, USA). To ascertain standardisation, the same US machine and the same PD setting optimised for more superficial structures (most of the joints assessed) was used throughout the study, even if this was not optimal for deep structures like the hip. No software upgrading was performed during follow-up.

The following joints were assessed bilaterally using standard projections (in parentheses)15: proximal interphalangeal (PIP) 1–5 (dorsal), metacarpophalangeal (MCP) 1–5 (dorsal), carpometacarpal 1–5 (dorsal), wrist (radiocarpal, intercarpal and radioulnar joints) (dorsal), elbow (anterior and posterior), shoulder (glenohumeral and acromioclavicular joints) (anterior, posterior and upper), hip (anterior), knee (anterior and lateral), ankle (talocrural joint) (anterior), four major foot joints (talonavicular, subtalar, calcaneocuboidal and cuneonavicular) (anterior and lateral), tarsometatarsal (TMT) 1–5 (dorsal), metatarsophalangeal (MTP) 1–5 (dorsal) and the interphalangeal (dorsal) joint of the first toe (a total of 78 joints). All joints were scored according to OMERACT criteria for BM (presence of synovitis and joint fluid) and PD (presence of vascularisation): 0 = none, 1 = minor, 2 = moderate or 3 = major presence (giving a total separate range for the sum scores of 0–234 for BM and PD). All US examinations were performed in one room in the morning, after at least half an hour of acclimatisation to the room temperature, and pressure of the probe was as low as possible to have optimal PD signals. The hands were assessed while resting on a small table and lower limbs were examined with the patients lying on a bench. The US examiner was blinded for previous US results as well as for the results from the same day clinical joint assessments and laboratory tests. Images were stored from all the examinations, and the US scoring reliability was examined by assessing the BM and PD scores on 78 joints from one of the visits for six randomly selected patients at the end of the study.

One of two study nurses (MS, AKK), both with more than 5 years' experience with joint counts in clinical studies, assessed 40 joints for tenderness and swelling (PIP 1–5, MCP 1–5, wrist, elbow, shoulder, knee, ankle and MTP 1–5). They were blinded for the results of the US examinations. For comparisons between clinical and US examinations, a BM score ≥1 was used to define a joint as inflamed. The patients as well as the study nurse (assessor) evaluated the global disease activity on a visual analogue scale. The laboratory tests included erythrocyte sedimentation rate and C-reactive protein (in-house standard methodology). The Disease Activity Score 28 based on erythrocyte sedimentation rate,16 Simplified Disease Activity Index17 and Clinical Disease Activity Index18 were computed.

The patients gave written consent according to the Declaration of Helsinki, and the study was approved by the local ethics committee (the Regional Committee for Medical and Health Research Ethics (REK), South-East).


Correlations were analysed by Spearman's rank correlations. A Wilcoxon signed rank test was used to examine changes in US, clinical or laboratory assessments during follow-up as well as to compare the number of inflamed joints (out of 40) detected by US versus clinical examination at each of the follow-up examinations. A p value <0.05 was considered significant. Responsiveness was explored by calculation of the standardised response mean (SRM) as the mean change divided by the SD of the change. Intraobserver reliability was assessed by intraclass correlation coefficients based on the two-way mixed-effects model and use of single measurements.


Increasing correlation coefficients between the US assessments and the composite indexes as well as the clinical and laboratory variables were found during the study (table 1).

Table 1

Spearman rank correlations between 78-joints sum scores of BM or PD and DAS28, CDAI, SDAI, assessor's global evaluation of disease activity (VAS), number of swollen and tender joints, and the laboratory markers ESR and CRP

Of the 78 joints examined, only three of the TMT joints were not affected in any patient during the study. Sensitivity for detecting inflammation was compared for the 40 joints assessed by both US and clinical examination. Significantly more inflamed joints were detected by US (defined as BM ≥1) than by clinical assessments (swollen joints) at all examinations (median (range) numbers of joints detected by US/clinical assessments were 14 (2–29)/8 (1–19) at baseline (p=0.006), 13 (1–26)/5 (0–22) at 1 month (p=0.003), 10 (1–20)/6 (0–16) at 3 months (p=0.01), 7 (0–25)/4 (0–16) at 6 months (p=0.003) and 7 (0–23)/4 (0–14) at 12-month examination (p=0.001)).

The patients improved across all variables (table 2). During follow-up the sum score PD had higher percentage improvement than the sum score BD at the 3-month examination (p=0.01), while there were no significant differences between these two at the other time points. Table 3 shows the SRMs for the US scores, the composite indexes as well as the clinical and laboratory variables at the different time intervals from baseline, and both BM and PD were found to have large SRMs.

Table 2

Median (range) sum scores of 78 joints for BM (B-mode or grey scale US) and PD (power Doppler US), DAS28, SDAI, CDAI, assessor's global evaluation of disease activity, number of swollen joints, number of tender joints, ESR and CRP at baseline and follow-up examinations

Table 3

SRM from baseline of the sum scores of BM and PD from 78 joints, DAS28, CDAI, SDAI, assessor's global evaluation of disease activity (VAS), number of swollen and tender joints and the laboratory markers ESR and CRP after 1, 3, 6 and 12 months

The median (range) intraobserver intraclass correlation coefficients (95% CI) were 0.97 (0.96 to 0.98) for BM scores and 0.98 (0.97 to 0.99) for PD scores.


To our knowledge, this is the most comprehensive US assessment of joints performed in a longitudinal study. One of the objectives was to assess whether comprehensive US examinations would have higher associations with traditional assessments than reported with US examinations of a smaller number of joints. The US scores were found to have low correlations with the traditional assessments of disease activity at baseline, while at the 12-month follow-up the US assessments had correlation coefficients that were of a higher magnitude than previously described.4 12

Significantly higher numbers of inflamed joints were found by use of US than by clinical assessments at all the examinations. Similar findings have been reported previously. US was found to be more sensitive for detection of arthritis when several joints were examined2 as well as when detection of inflammation in single joints was assessed.1

The most important objective was to analyse the sensitivity to change for the comprehensive US assessments. The sum scores of BM and PD were significantly reduced after 1 month, with additional reductions at the 3-month examination. Other studies with US assessments of smaller numbers of joints have also found significant improvement 3 months after the start of an anti-tumour necrosis factor drug9 10 and significant reduction of PD was found already after 2 weeks6 with a plateau of improvement seen at 22 weeks.7 The sum scores of BM and PD had SRMs comparable to the composite indexes, and the US assessments were more sensitive to change than the clinical assessments. The composite indexes include several variables reflecting disease activity and this may be the reason for their high sensitivity for improvement.

This study has several strengths: a longitudinal design with five examinations during 1 year, only one sonographer performing standardised US assessments throughout the study, only two people performing joint counts and all patients starting the same biological treatment with methotrexate as co-medication. In addition, no examinations were missing. However, an obvious weakness of this study was the relatively small number of patients included. For this reason we were not able to use this dataset to identify an US joint count with the best trade-off between feasibility and responsiveness.

The average time spent for each US examination of all joints was about 70 min, including labelling and storing of images. The comprehensive BM and PD scores were associated with clinical measures and were as responsive as clinical composite indexes. Thus US could be used to follow-up patients with RA during treatment, but a smaller and optimal number of joints to be examined should be explored to make US a feasible tool in the clinical setting.


View Abstract


  • Funding HBH was partly supported by an unrestricted grant from Abbott.

  • Competing interests Johannes WJ Bijlsma was the handling editor for this article

  • Ethics approval This study was conducted with the approval of Regional Committee for Medical and Health Research Ethics (REK), South-East.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Patient consent Obtained.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.