Article Text

Download PDFPDF

Extended report
Monitoring anti-TNFα treatment in rheumatoid arthritis: responsiveness of magnetic resonance imaging and ultrasonography of the dominant wrist joint compared with conventional measures of disease activity and structural damage
  1. E A Haavardsholm1,2,
  2. M Østergaard3,
  3. H B Hammer1,
  4. P Bøyesen1,2,
  5. A Boonen4,
  6. D van der Heijde1,5,
  7. T K Kvien1,2
  1. 1
    Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
  2. 2
    Faculty of Medicine, University of Oslo, Norway
  3. 3
    Department of Rheumatology, Copenhagen University Hospitals at Hvidovre and Herlev, Copenhagen, Denmark
  4. 4
    Department of Rheumatology, University Hospital Maastricht, Maastricht, The Netherlands
  5. 5
    Leiden University Medical Center, Leiden, The Netherlands
  1. Correspondence to Dr E A Haavardsholm, Department of Rheumatology, Diakonhjemmet Hospital, Box 23 Vinderen, N-0319 Oslo, Norway; e.a.haavardsholm{at}


Objectives: To evaluate the responsiveness of magnetic resonance imaging (MRI) and ultrasonography (US) compared with conventional measures of disease activity and structural damage in patients with rheumatoid arthritis (RA) during the first year of treatment with anti-tumour necrosis factor α (TNFα).

Methods: A cohort of patients with RA (N = 36, median age 53 years, disease duration 7.6 years and disease activity score (DAS28) 5.7) was evaluated by core measures of disease activity, US (one wrist), MRI (one wrist) and conventional radiography (CR, both hands and wrists) at initiation of treatment with anti-TNFα agents and after 3, 6 and 12 months. Responsiveness was assessed by standardised response means (SRM). Accepted thresholds were applied to classify responsiveness as trivial, low, moderate or good.

Results: MRI synovitis (SRM between −0.79 and −0.92) and the MRI total inflammation score comprising synovitis, tenosynovitis and bone marrow oedema (SRM between −1.05 and −1.24) were highly responsive. Moderate to high responsiveness was found for MRI tenosynovitis and bone marrow oedema, all the composite indices (DAS28, simplified disease activity index (SDAI) and clinical disease activity index (CDAI)) and the 28-swollen joint count. US displayed low to moderate responsiveness. The MRI erosion score displayed low responsiveness but was more responsive than CR measures at 3 and 6 months follow-up. MRI and CR measures of annual progression rates of damage performed similarly and were highly responsive.

Conclusions: The most responsive measure of inflammation when evaluating anti-TNFα medication was a composite measure comprising MRI synovitis, tenosynovitis and bone marrow oedema, and this may be a promising outcome measure in clinical studies.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In the past decade the development of biological agents—particularly those which target tumour necrosis factor α (TNFα)—has started a new era in the management of rheumatoid arthritis (RA). Data from recent studies in patients with RA show that these drugs are very effective in improving clinical and functional outcomes, and have demonstrated the ability to arrest—or even reverse—radiographic progression.1 2 Structural damage in RA has traditionally been assessed by conventional radiography (CR) and inflammation/disease activity by individual core set variables as well as different composite indices such as the disease activity score (DAS), simplified disease activity index (SDAI) and clinical disease activity index (CDAI).3 4 5 6

In recent years, magnetic resonance imaging (MRI) and ultrasonography (US) have increasingly been used as outcome measures in clinical trials of RA.7 8 9 10 11 MRI has been shown to be more sensitive than radiography for the detection of destructive (erosive) joint lesions and a sensitive tool for detection of inflammatory lesions.12 13 14 15 16 17 18 Several studies have confirmed the relationship between MRI-detected inflammatory disease (synovitis and/or bone marrow oedema) and subsequent damage (CR and/or MRI erosions).13 16 19 20 21 MRI and US theoretically have many advantages over CR in measuring the response to therapeutic agents, as both modalities can reflect a change in inflammatory activity while at the same time assessing progression of damage.

The Outcome Measures in Rheumatoid Arthritis Clinical Trials (OMERACT) has developed procedures to reach a consensus on which measures to apply in clinical trials.22 The main objective in clinical trials is the measurement of change (eg, monitoring treatment effect) and, in this setting, the concept of responsiveness may be the most important characteristic of an outcome measure when deciding which particular instrument to use in a clinical trial.23 24 A more responsive measure has obvious advantages including reduction of sample sizes for clinical studies.

Several studies have shown that MRI measures of synovitis, tenosynovits, bone marrow oedema and erosions are valid and reliable, but less is known about the responsiveness of these measures during an intervention with a potent therapeutic agent.14 25 26 27 28 The main objective of this study was to assess inflammatory changes in patients with RA during the first year of anti-TNFα treatment by MRI measures and US, and to evaluate the responsiveness of these measures compared with conventional measures of disease activity. We also compared the responsiveness of MRI and conventional radiographs with regard to structural damage.


Patient selection

During the period from February 2002 to June 2004 we consecutively enrolled 45 patients fulfilling the American College of Rheumatology (ACR) 1987 classification criteria for RA29 who started receiving anti-TNFα treatment at the Department of Rheumatology, Diakonhjemmet Hospital. Nine patients were excluded from the analyses since all follow-up examinations were missing (one due to surgery, one due to an allergic reaction to infliximab, two due to lack of efficacy and five for unknown reasons).

The patients received treatment according to clinical practice and were referred to the study by the treating rheumatologist when they were prescribed anti-TNFα. Ten patients received etanercept (5 monotherapy/5 concomitant methotrexate), 14 infliximab (3 monotherapy/11 concomitant methotrexate) and 12 received adalimumab (3 monotherapy/9 concomitant methotrexate).

The patients were assessed at baseline and after 3, 6 and 12 months. The same protocol was employed at every examination and is described below.

Clinical examination and questionnaires

A trained research nurse performed 28-swollen joint count (28-SJC) and 28-tender joint count (28-TJC). Perceived pain, patients’ and investigators’ global assessment were recorded on 100 mm visual analogue scales (VAS). Disability was assessed using the Modified Health Assessment Questionnaire.30 The DAS28 was computed using the erythrocyte sedimentation rate (ESR).3 We also computed the SDAI and CDAI composite scores.5 6 31 The drug response was assessed by the EULAR response criteria based on the DAS28.32

Biochemical and immunological markers

Antibodies to cyclic citrullinated peptide (anti-CCP) were analysed by a second generation ELISA (INOVA Diagnostics, San Diego, California, USA) and IgM and IgA rheumatoid factors were measured by an in-house ELISA.33 C-reactive protein (CRP) was measured by high sensitivity CRP nefelometry (Dade Behring, Deerfield, Illinois, USA) and ESR was measured by the Westergren method.

Imaging (MRI)

MRI of the dominant wrist was performed using a GE Signa 1.5 Tesla MRI scanner (General Electric (GE) Signa, Milwaukee, Wisconsin, USA) with a dedicated high-resolution wrist phased array coil. The MRI sequences in this study included the OMERACT recommended MRI core set of sequences (axial and coronal T1 series pre- and post-contrast with Gd-DTPA, and STIR).34 Details of the sequences have previously been described elsewhere.26

The OMERACT MRI group consensus on MRI definitions and the EULAR OMERACT RA MRI reference image atlas were applied.34 35 Images were scored according to the semi-quantitative RA MRI score (RAMRIS) and a novel tenosynovitis scoring system.34 36 The RAMRIS includes scoring of erosions (15 areas, range 0–10), synovitis (3 areas, range 0–3) and bone marrow oedema (15 areas, range 0–3) with possible maximum scores of 150, 9 and 45, respectively. The maximum tenosynovitis score is 30 (10 areas, range 0–3). The RAMRIS erosion annual progression rate before treatment was calculated as the baseline score divided by disease duration for each individual patient, and the annual progression rate after treatment as the change during follow-up divided by the time of follow-up. We computed the MRI total inflammation score by adding the components of the MRI synovitis, tenosynovitis and bone marrow oedema scores, with a possible maximum score of 234.

Imaging (ultrasonography)

All the US measurements were performed by an experienced ultrasonographer (HBH) using an 8–16 MHz linear array transducer (Diasus, Dynamic Imaging, Livingstone, UK). Patients were seated with their forearm resting on a small table, and the dominant wrist was assessed with a longitudinal scan of the dorsal part of the radiocarpal joint and transverse scans of the extensor (divided into ulnar, dorsal and radial compartments) and flexor tendons. Each of these five anatomical sites was assessed separately for synovitis/tenosynovitis and effusions using a semi-quantitative score of 0–4 (where 0 = none, 1 = uncertain, 2 = minimal, 3 = medium and 4 = high amount of hypoechoic material). The scores for each site were added and a US total inflammation score was calculated, with a maximum possible score of 40.

Imaging (conventional radiographs)

Digital conventional radiographs (CR) of the hands in the posteroanterior view were scored according to the van der Heijde modified Sharp score (vdHSS) by a trained observer (AB).37 Sixteen joint areas were scored in each hand for erosions (score range 0–5 for each area) and 15 areas were scored for joint space narrowing (score range 0–4 for each joint area), giving a possible maximum score of 280 units. The vdHSS erosion and total score annual progression rates before and after treatment were calculated using the same method as described above for MRI erosions.

Image evaluation and reliability of readers for MR and CR images

The assessors were blinded to the clinical findings. All MRIs and CRs were read on large-screen (21-inch) radiological workstation monitors grouped per patient with unknown chronological order using a standard PACS software program (SECTRA IDS5, Sweden). The MRI and CR images were read independently by one reader (EAH and AB, respectively), both having a documented high inter-reader agreement with experienced readers and high intra-reader single measure ICCs.26 38

Statistical methods

All statistical analyses were undertaken using the Statistical Package for the Social Sciences for Windows Version 14 (SPSS, Chicago, Illinois, USA). Group comparisons were performed using Mann-Whitney U tests for continuous variables and χ2 statistics for counts, including exact tests when appropriate; p values <0.05 (two-tailed) were considered significant.

The standardised response mean (SRM) was calculated as the ratio of the mean change in the measure and the standard deviation of the mean change scores, and 95% confidence intervals for the SRMs were calculated by applying bootstrapping techniques with 5000 replications. We also computed the relative efficiency (RE) in relation to the 28-TJC for the measures reflecting inflammation and disease activity (the square of the ratio of the t statistic, which corresponds to squaring the ratio of the SRM for the outcome to the SRM for the 28-TJC).39 40 An RE of >1 implies that the outcome is more efficient than the 28-TJC in detecting change. For the measures reflecting damage, the vdHSS erosion score was used as reference for calculating the RE.


Baseline findings

The demographic characteristics, immunological status, clinical measures of disease activity and imaging measures at baseline are summarised in table 1. The initial median DAS28 value was 5.7. There was a tendency to differences in some of the baseline measures of disease activity in the excluded versus included patients, only reaching statistical significance (two-tailed Mann-Whitney U test) for disease duration (median 14.5 vs 7.6, p = 0.01) and MRI tenosynovitis score (median 4.0 vs 10.0, p = 0.01).

Table 1

Baseline demographic characteristics, immunological status, imaging parameters and clinical measures of disease activity at baseline

Changes in measures reflecting inflammation and disease activity

The mean changes from baseline at 3, 6 and 12 months are presented in table 2. At the 3-month follow-up there was a marked treatment response in all parameters, and the same tendency was observed at 6 months and 12 months, although the response was less pronounced at 12 months for some of the measures. The proportions with EULAR good/moderate/no response were 0%/60%/40% at 3 months, 6%/47%/47% at 6 months and 14%/29%/57% at 12 months.

Table 2

Mean change with 95% confidence intervals (CI), standardised response means (SRM) with 95% confidence intervals and relative efficiencies (RE) for all measures at 3, 6 and 12 months

The SRMs with 95% confidence intervals are also reported in table 2. MRI synovitis and the MRI total inflammation score were highly responsive with an SRM >0.80, and fig 1 provides an example of a patient with a rapid response with reduction of MRI synovitis and tenosynovitis. Moderate to high responsiveness (SRM >0.50) was found for MRI tenosynovitis and bone marrow oedema, all the composite indices (DAS28, SDAI and CDAI) and the 28-SJC. The remaining outcome measures displayed low to moderate responsiveness.

Figure 1

Pre-gadolinium (top) and post-gadolinium (bottom) T1 axial MR images at baseline (left) and 3 months (right), illustrating a marked reduction of synovitis and tenosynovitis after 3 months of treatment with anti-tumour necrosis factor α.

The REs in relation to the 28-TJC at all time points are provided in table 2 and the 3-month data for the measures reflecting inflammation and disease activity are shown in decreasing order of magnitude in fig 2. We also computed the SRMs and REs of all possible combinations of two out of the three components of the MRI total inflammation score (data not shown), and the responsiveness was for all two-item combinations inferior to the three-item MRI total inflammation score, although still quite responsive (SRM >0.90 at all time points).

Figure 2

Relative efficiencies (RE) to 28-tender joint count (28-TJC) of the various outcomes reflecting inflammation and disease activity at 3 months follow-up (28-TJC  =  reference with an RE of 1.00). CDAI, clinical disease activity index; DAS28, disease activity score based on 28 tender and swollen joint count; ESR, erythrocyte sedimentation rate; MHAQ, modified health assessment questionnaire; MRI, magnetic resonance imaging; SDAI, simplified disease activity index; 28-SJC, 28-swollen joint count; US, ultrasonography; VAS, visual analogue scale.

Changes in measures reflecting structural damage

There was a small increase in the RAMRIS erosion score at all time points, whereas the CR measures initially showed a small decrease in scores at 3 months, virtually no change at 6 months and a small increase at 12 months. The RAMRIS erosion score displayed low responsiveness at all time points (SRM 0.23–0.32), whereas the responsiveness of CR measures were trivial (SRM <0.20) at 3 and 6 months follow-up, increasing to low (SRM 0.23–0.33) at the 12-month follow-up. The REs in relation to the vdHSS erosion score are shown in table 2, and at 3 and 6 months the RAMRIS erosions score from one wrist performed better than the CR measures from both wrists and hands, while the results were comparable at 12 months.

The annual progression rates after 12 months of treatment for the RAMRIS erosion score and the vdHSS erosion and total scores are shown in table 2, together with the SRM for this change. There was a marked and significant reduction in the annual progression rates of all measures, and the vdHSS erosion score was moderately responsive (SRM −0.61) whereas the RAMRIS erosion score (SRM −0.89) and the vdHSS total score (SRM −0.94) were highly responsive.


This study has shown that MRI measures of inflammation provide superior responsiveness to conventional measures of disease activity in patients with RA treated with anti-TNFα medication. For measures of structural damage, MRI of one wrist displayed better responsiveness of absolute scores at 3 and 6 months than CR of both wrists and hands. When evaluating structural damage by change in annual progression rates, MRI (the RAMRIS erosion score) and CR (the vdHSS total score) were both highly responsive at the 12-month follow-up.

A marked reduction in disease activity and annual progression rates of erosions both on MRI and CR were observed during this intervention trial with anti-TNFα medication (table 2). During a powerful intervention like this, it is possible to evaluate and compare the responsiveness for different outcome measures, in particular those reflecting inflammation. Measuring change in order to evaluate the efficacy of therapeutic interventions requires the outcome measure to be sensitive to detecting change in improvement after the intervention. If an instrument is not sufficiently responsive, even clinically important changes may go undetected. In a clinical trial setting, the greater the responsiveness of the outcome instrument used, the smaller the sample size that is required. Also, the length of trials can be shortened by using more responsive outcome measures.

The term “responsiveness” denotes the magnitude of change or sensitivity to change over time. To quantify responsiveness, several effect sizes have been proposed as estimates of the amount of change detected with an instrument, resulting in a wide variety of effect size indices.41 42 43 It is not yet known which of these statistics is the better for assessing responsiveness, although there is some evidence to suggest that it is better to estimate the magnitude of the change by using the SD of the change score (ie, SRM) in the denominator compared with the SD at baseline (ie, effect size).44 45 46 We therefore chose to assess responsiveness using SRM, and this is probably also the most widely used responsiveness statistic. There is no universal agreement on how to interpret the magnitude of the SRMs, but in most cases the thresholds introduced by Cohen for effect sizes (ES) is applied: “trivial” (ES <0.20), “small” (ES ⩾0.20<0.50), “moderate” (ES ⩾0.50<0.80) or “large” (ES ⩾0.80).43

Traditional treatment goals for patients with RA include reduction of inflammation (ie, relieving signs and symptoms), improving physical functioning and inhibiting progression of joint damage. Evaluation of disease activity in RA is not easy, and no single marker can reflect all these aspects. Over the last decade, composite disease activity instruments (eg, DAS28, SDAI, CDAI) have significantly improved the ability to evaluate the course of RA. However, these are all surrogate markers of the underlying pathological processes in the disease—namely, the synovitis—which is the cardinal consequence of the inflammatory processes in the RA joint. This synovitis leads to bone and cartilage destruction, causes pain and ultimately leads to joint destruction and physical impairment. Both MRI and US are imaging modalities capable of providing a non-invasive measure of the load of synovitis (and tenosynovitis) in RA, and MRI is the only imaging modality that can visualise bone marrow oedema. Bone marrow oedema has long been recognised as being important in the pathology of RA and has been assumed to represent inflammatory activity within the bone.47 Until recently, bone marrow oedema had no known histopathological correlate, but this has recently been examined in two studies.48 49 Their findings suggest that MRI bone marrow oedema is due to formation of inflammatory infiltrates in the bone marrow of patients with RA, and thus may represent an additional target structure for anti-inflammatory treatment. Tenosynovitis is another important pathology in RA and is observed in a large proportion of patients with RA.50 51 The tenosynovium produces pro-inflammatory cytokines and proteolytic enzymes that are important in the tissue degradation seen in RA, and the proliferation of the tenosynovial lining can lead to impaired function due to scarring and adhesions52 and the ongoing tenosynovial inflammation may ultimately lead to tendon rupture and reduced hand function.53 The scoring of MRI tenosynovitis is not included in the RAMRIS, but we recently described a novel tenosynovitis scoring system that demonstrated high reliability and was feasible.36 In this study we found that MRI synovitis was the most responsive single marker and, in combination with tenosynovitis and bone marrow oedema, the MRI total inflammation score provided superior responsiveness.

We also evaluated a US total inflammation score consisting of synovitis, tenosynovitis and joint effusion. The applied scoring system is not validated and did not include evaluation by power Doppler. It still displayed SRMs around 0.50, and it is possible that new validated US scoring systems including power Doppler will lead to increased responsiveness. Furthermore, it is possible that other US scoring systems which assess more joints may be more responsive.54 55 56 The same may apply to MRI approaches which assess more joints than one wrist.15

In general, all the composite indices (DAS, SDAI, CDAI) were more responsive than single core set measures, as has been found in other studies, and this finding is not surprising as combining measures reduces scatter.57 The responsiveness of most single measures was low to moderate and 28-SJC was the most responsive.

Joint damage in RA has traditionally been assessed by CR and, more recently, with MRI. The erosive progression during the therapeutic intervention in this study was minimal. The RAMRIS MRI erosion score was somewhat more responsive than CR measures at 3 and 6 months follow-up, although the responsiveness of the RAMRIS erosion score was still low. For the CR measures the SRMs were trivial to low (table 2). Six-month follow-up may be the most relevant time frame in modern randomised placebo controlled trials and, in such short-term trials, MRI may perform better than the CR vdHSS total score.

Low responsiveness (low SRMs) of measures reflecting structural damage is to be expected during treatment that slow down, or even reverse, progression of damage. Structural damage is a reflection of the cumulative disease activity in the past and, while the absolute score is likely to be less sensitive to change, the annual progression rate might better capture the immediate effect of suppressing inflammation on further progression of damage. We computed the annual progression rates at baseline and after 12 months of treatment for the RAMRIS erosion score, the vdHSS erosion score and total score, and computed the SRMs for these measures. The annual progression rate of erosions after 12 months showed a significant decrease, as expected, judged both by MRI and CR (table 2). MRI erosions of the dominant wrist showed somewhat higher responsiveness than CR erosions of both hands and wrists, but when the joint space narrowing was included for the total vdHSS, this measure was also highly responsive. It should be emphasised that in this study we only evaluated radiographs of both hands and wrists as radiographs of the feet were not available, and the responsiveness of CR measures may perform better if both hands and feet are included. A similar argument also applies to MRI, as we only evaluated the dominant wrist. Inclusion of the metacarpophalangeal joints would be expected to lead to higher responsiveness for the MRI measures.15

The present study included a rather small number of patients and had no control/placebo group. However, all patients had active disease, and this powerful intervention with a potent drug, combined with a very extensive data collection of outcome measures, allowed comparisons of the responsiveness of these measures in a real life setting. Also the fact that both MRI and CR images were read in unknown chronological sequence strengthens the results, as reading in known sequence is known to increase the sensitivity in detecting differences.38 We chose to read images in unknown sequence to avoid expectation bias and to be certain that we did not overestimate the sensitivity to change of these measures.

Composite indices rely on changes of a group of variables rather than on individual signs and symptoms of RA, and it has been shown that composite indices mirror RA disease activity more effectively than individual variables.57 This seems to be the case also for composite imaging measures. In this study we have shown that composite indices in general are more responsive than single measures, that a combined MRI measure of total inflammation is more responsive than the individual MRI measures, and that the total vdHSS is more responsive than the single CR components alone. This observation has implications when evaluating the effect of treatment because early identification of responders to anti-TNFα therapy is important to optimise treatment and avoid prolonged exposure to costly and potentially harmful side effects.

In this cohort of patients receiving anti-TNFα treatment, we have shown that the MRI total inflammation score displayed superior responsiveness to conventional measures of disease activity, and may be a promising outcome measure in clinical studies and for clinical practice. However, further validation in larger studies and during placebo controlled clinical studies is needed. The responsiveness of measures of structural damage were trivial to low during this intervention and, for short-term follow-up, MRI of one wrist was more responsive than CR of both hands and wrists. By applying annual progression rates as a measure of structural damage, the RAMRIS erosion score and the total vdHSS were both highly responsive.


The authors thank research nurse Margareth Sveinsson for collecting clinical data, research coordinator Tone Omreng for organising the data collection, Halvor Gilboe for practical assistance with case record forms, Mohammad Rizvi for assistance with preparing MRI and CR images, Marianne Ytrelid for technical assistance and Dr Inge Olsen for statistical advice.



  • Funding This study was supported in part by grants from The Research Council of Norway, The Norwegian Rheumatism Association, The Norwegian Women Public Health Association, Grethe Harbitz Legacy and Marie and Else Mustad’s Legacy.

  • Competing interests None.

  • Ethics approval The study was conducted according to the principles of the Declaration of Helsinki. All patients gave written informed consent before participation in the study. The regional ethics committee evaluated the study, the storage of data was licensed from the Norwegian data inspectorate and approval for the collection of biological material was obtained from the Department of Health.