Article Text

Download PDFPDF

Test–retest reliability of disease activity core set measures and indices in rheumatoid arthritis
  1. T Uhlig1,
  2. T K Kvien1,2,
  3. T Pincus3
  1. 1
    Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
  2. 2
    Faculty of Medicine, University of Oslo, Norway
  3. 3
    New York University, Hospital for Joint Diseases, New York City, New York, USA
  1. Dr T Uhlig, National Resource Center for Rehabilitation in Rheumatology, Department of Rheumatology, Diakonhjemmet Hospital, Postbox 23 Vinderen, N-0319 Oslo, Norway; till.uhlig{at}


Aim: To examine the test–retest reliability of the rheumatoid arthritis (RA) core disease activity measures and derived composite indices.

Methods: A total of 28 stable patients with RA had 2 complete assessments within 1 week, which included the 7 RA core disease activity measures and derived disease activity indices (28-joint Disease Activity Score (DAS28), Simplified Disease Activity Index (SDAI), Clinical Disease Activity Index (CDAI), RA Disease Activity Index (RADAI) and Routine Assessment of Patient Index Data (RAPID3)). The intraclass correlations (ICC), the smallest detectable difference (SDD) and minimal detectable change as percentage of the maximum score (MDC%) were estimated as measures of test–retest reliability.

Results: Correlations for the disease activity indices were high. SDDs (MDC%) to detect a true improvement or deterioration with 95% confidence were: DAS28 1.32 (14.4%), SDAI 8.26 (9.6%), CDAI 8.05 (10.6%), RAPID3 1.48 (14.8%) and RADAI 1.49 (14.9%). Thus, SDDs were rather high, and the MDC% values were of a similar magnitude of 10% to 15% for all seven core data set measures.

Conclusions: SDDs of the DAS28, SDAI and CDAI were close to limits to detect important improvement. Clinicians should be aware of measurement error. Nonetheless, RA core data set measures and indices obtained from a health professional, laboratory and patient self-report had similar reliability.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Major advances in quantitative assessment of rheumatoid arthritis (RA) have been seen over the last decades. The American College of Rheumatology (ACR) core data set1 of seven disease activity measures has become a standard in clinical research, and the 28-joint Disease Activity Score (DAS28) of four measures is the most widely used composite index.2 As calculation of the DAS28 requires a computer, a web site or calculator, a Simplified Disease Activity Index (SDAI)3 and a Clinical Disease Activity Index (CDAI)4 to be calculated directly by the clinician without a calculator have been developed. Further, composite indices of only patient self-report measures have been proposed, including the RA Disease Activity Index (RADAI)5 and Routine Assessment of Patient Index Data (RAPID3).6

Few studies have examined the test–retest reliability of individual core data set measures and indices derived from these measures in clinical practice in stable patients. All measures have some inherent variability that may be considered as a “measurement error”. This variability is closely related to whether a change can be interpreted as to exceed variability or “measurement error” with confidence. The objective of this study was to examine the test–retest reliability of the RA core data set measures and composite indices in stable patients with RA.


Subjects and sampling

The data were from a study designed to compare the performance of patient reported outcomes between completion by paper/pencil or on a personal digital assistant (PDA).7 Patients had been recruited at random from a county RA register if they were between 50 and 70 years of age, had stable disease with no change in drug treatment and no surgical procedures in the previous 4 weeks. Patients were assessed on two occasions (T1 and T2) 5 to 7 days apart. A total of 28 patients had complete assessments with the paper/pencil format, which were analysed in this report.

Outcomes measures

The seven RA core data set measures1 were assessed. The three patient-reported outcome measures include physical function, pain and patient global estimate of status on visual analogue scales (VAS) on a modified health assessment questionnaire (MHAQ). Further, fatigue VAS was assessed. The three doctor/assessor measures, 28-swollen and tender joint counts and global estimate of status, were assessed on both occasions by the same trained study nurse. Blood samples were drawn for erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP).

Indices calculated for disease activity were DAS28, SDAI, CDAI, RAPID3 and RADAI. DAS28 is computed from 28-swollen and tender joint counts, patient global assessment and ESR (maximum 100 mm/h),2 SDAI employs a linear sum of five untransformed, unweighted variables, including 28-swollen joint count (SJC) and 28-tender joint count (TJC), patient and investigator global assessments of disease activity on a VAS and CRP,3 whereas CDAI is a modification of SDAI not including a laboratory evaluation (CRP), to facilitate immediate quantitative clinical assessment.4 RAPID3 includes the three patient self-report core data set measures, physical function, pain and patient global estimate, each scored after normalisation 0–10 and the sum (0–30) is adjusted to 0–10.6 RADAI is a self-reported questionnaire including questions on disease activity, joint tenderness, pain, morning stiffness and perceived joint pain in 16 joint areas.6 The scores from the five items are summarised into a disease activity index with a range from 0 to 10.

Statistical methods

Spearman rank order correlations and intraclass correlation coefficients (ICC), using a two-way mixed effects model, were estimated as a measure of test–retest reliability of each measure and index. Mean differences between the test and retest values, standard deviations and 95% CIs around the difference were calculated. Test–retest reliability for individual indices was examined by the Bland–Altman approach,8 to test for random error of each variable. In this approach the differences between the first and second measurement were plotted against their means. The mean difference ±1.96×SD with its resulting interval represents “95% limits of agreement”. The smallest detectable difference (SDD) should be exceeded if a clinician can be 95% confident that a change exceeds the variability or “measurement error” for improvement or deterioration. The minimal detectable change (MDC%) expresses SDD for the selected outcome as a percentage of the maximum score. The MDC% provides opportunities to compare the reliability across outcome measures and represent the percentage change which should be exceeded to exclude “measurement error”. The level of statistical significance was set to p = 0.01. The maximum scores for DAS28, SDAI, CDAI, RAPID3 and RADAI were set to 9.07, 86, 76, 10 and 10, respectively.


Descriptive characteristics of the patients

A total of 28 patients with stable RA were evaluated. Mean (SD) age was 61.1 (6.2) years and mean (SD) disease duration 16.6 (10.4) years; 64% were females, 77% had erosive disease and 64% were rheumatoid factor positive.

Test–retest reliability

Correlations of individual measures and indices between T1 and T2 were all quite significant, and in a similar range for the assessor, laboratory and patient self-report measures (table 1). The ICC varied between 0.78 and 0.96 for the seven individual ACR core set measures1 and between 0.85 and 0.92 for the disease activity indices. The SDDs for the measures and indices also were similar, although high, and close to minimal clinically important differences (eg, MHAQ physical function 0.22, DAS28 1.32). For individual measures, MDC% was lowest for CRP and highest for pain intensity. For indices, MDC% also were similar and in the range of 9.6% (SDAI) to 14.9% (RADAI) (table 1).

Table 1 Levels of core set of disease activity measures and derived disease activity indices at both time points and test–retest reliability

Figure 1 presents Bland–Altman plots for each of the five disease activity indices, demonstrating the difference between measurements with limits of agreement according to the baseline value.

Figure 1

A–E. Agreement between scores at T1 and T2 using Bland–Altman plots for each of the five disease activity indices with limits of agreement.

Correlations between all disease activity indices are shown in table 2. At both time points correlation coefficients were high between disease indices and generally lower between indices of disease activity and core measures, whereas no statistically significant correlation was seen with age or disease duration (table 2).

Table 2 Correlations (Spearman r) between indices of disease activity and between disease activity indices and core measures, age and disease duration at both time points


RA core data set measures and their derived indices were assessed twice within 1 week in a test–retest design and the reliability of measures was in the same range, with minimal detectable change at about 10% to 15% of the maximum score.

The test–retest reliability of patient-reported measures was satisfactory when considering the ICCs. However, the magnitude of the measurement errors is also important when clinical decisions are based on changes in scores in individual patients. The 95% SDDs provide clinically useful information, as they represent the cut-off values that must be exceeded for a clinician to be 95% confident that a change reflects a true improvement or deterioration. These values were, for example, 4.8 for tender joint count, 26.2 mm for patient global assessment, 0.22 for physical function, 1.49 for RADAI and 1.32 for DAS28 (table 1). The value for physical function on the MHAQ was close to the change that has been considered clinically important with the original health assessment questionnaire (HAQ).9 A SDD of 1.32 for DAS28 is in the same range as the change required for EULAR moderate or good response,10 and SDD of 8.26 and 8.05 for SDAI and CDAI, respectively, are in the range required for a moderate response.11

In this study, the test–retest reliability of five indices of disease activity in RA was estimated for the first time. The DAS28 is documented as valuable to monitor RA in order to achieve “tight control” of inflammatory activity.12 CDAI has the advantage of performing a calculation right at the time of a patient visit since it does not require a laboratory test, and RAPID3 does not require a formal joint count. RADAI includes RA core data set and patient self-reported painful joints. Our data indicate similar test–retest results for the different disease activity indices, including those including only patient self-report measures, RAPID3 and RADAI.

The test–retest reliability of patient-reported measures was satisfactory when considering the ICCs. In another study the DAS28 produced high ICC values among and within observers in routine practice,13 about in the same range as in our study. This variability should be taken into account when interpreting changes in disease activity measures, in clinical practice and in clinical trials. More robust estimates may be obtained by repeated measures.14

In several countries a minimum level of disease activity is required for reimbursement of biological therapies, and good therapeutic decisions thus demand reproducible instruments for measuring disease activity in patients. The size of the SDDs in this study indicates caution when interpreting changes in disease activity scores. Changes below the SDDs may simply reflect inherent measurement variability.

In conclusion, clinically important or meaningful change must be differentiated from inevitable variation in patients and measurement error,15 allowing different approaches as outlined by Lassère et al.16 Our approach was purely mathematical, based on estimation of the measurement error. Clinical decisions are made after considering a number of variables, and will are better informed with quantitative measures from health professionals, laboratory analyses, or patient self-report.


The authors thank all patients who participated in the study.



  • Competing interests: TKK: Hans Bijlsma was the handling editor for this article.

  • Funding: This project was supported by Grethe Harbitz’s legacy and the Norwegian Rheumatism Association.

  • Ethics approval: The regional ethics committee for Health Region East approved this study.