Article Text

Download PDFPDF

The rheumatoid arthritis articular damage score: first steps in developing a clinical index of long term damage in RA
  1. T R Zijlstra1,
  2. H J Bernelot Moens1,
  3. M A S Bukhari2
  1. 1Department of Rheumatology, Medisch Spectrum Twente, Enschede, The Netherlands
  2. 2ARC Epidemiology Unit, The University of Manchester, United Kingdom
  1. Correspondence to:
    Dr T R Zijlstra, Medisch Spectrum Twente, Secretariaat Reumatologie, Postbus 50000, 7500 KA Enschede, The Netherlands;


Objective: To design and validate a clinical method for scoring irreversible long term articular damage in rheumatoid arthritis (RA).

Methods: The rheumatoid arthritis articular damage score (RAAD score) is based on examination of 35 large and small joints. Concise definitions were formulated to score each joint on a three point scale (0, no irreversible damage; 1, partially damaged; 2, severe damage, ankylosis, or prosthesis). The RAAD score was determined for 121 patients with RA with a large range of disease duration. Interobserver agreement was studied in 39 patients scored by three observers. Data on disease duration, Health Assessment Questionnaire, disease activity score, and Larsen score were collected for 121, 78, 47, and 45 patients, respectively.

Results: The RAAD score correlated well with the Larsen score (rs=0.81) and disease duration (rs=0.68) and (as intended) not with disease activity (rs=0.10). Good interobserver agreement was found for total scores and individual joints. The wide range of RAAD scores for patients with the same disease duration suggested good discriminating power, especially after >10 years.

Conclusion: The RAAD score is a quick and feasible method for measuring the long term articular damage in large RA populations. It has good reliability and construct validity and deserves further study to assess its discriminant validity.

  • rheumatoid arthritis
  • outcome
  • joint damage
  • DAS, disease activity score
  • HAQ, Health Assessment Questionnaire
  • MCP, metacarpophalangeal
  • MTP, metatarsophalangeal
  • PIP, proximal interphalangeal
  • RA, rheumatoid arthritis
  • RAAD, rheumatoid arthritis articular damage

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Disease status in rheumatoid arthritis (RA) can be expressed in terms of inflammatory activity or of damage. Indices of disease activity are reversible, whereas measures of damage should represent the irreversible results of disease activity over time. Although damage in RA can occur in skin or organs by amyloidosis or vasculitis, joint damage is the most prominent feature of disease outcome. Articular damage is generally assessed by radiographs, which may show the destruction of bone and cartilage. Several radiological damage scores have been developed, each having specific characteristics for reproducibility and sensitivity to change.1 Despite their usefulness in studying disease progression, there are some drawbacks. Firstly, radiographs represent mainly osseous changes, whereas part of the articular damage in RA is in the soft tissues surrounding the bones. Secondly, methods for scoring radiographic damage concentrate on the hands and feet, whereas damage in larger joints may be of equal importance for a patient's functional ability. Thirdly, the cost of measuring radiographic damage makes these methods less suitable for studying large numbers of patients or for use in developing countries.

Plant et al found that a rheumatologist could predict the Larsen radiographic score by clinical examination with surprising accuracy in the small hand joints (though less so in the feet).2 Kuper et al showed that radiographic damage in large joints was significantly related to the damage in hands and feet, a physical disability index, and cumulative disease activity.3

Observations like these suggest that it is possible to develop a score for irreversible articular damage, based on clinical examination of large and small joints, which may be useful for measuring long term damage in large patient groups. Such a score would be helpful in comparing the effects of different treatment strategies or the results in different rheumatology centres or in different countries, particularly after longer disease duration.

Attempts to design a clinical damage score in RA have been published before, but until now these have not been widely used. Symmons et al developed the OSRA, a simple measure of overall status.4 In its section on damage, the number of destroyed large joints, and the need for splints, collar, special shoes or surgery on small joints are used as a measure of articular damage. Recently Cranney et al reported their deformity index,5 a measure of limited joint motion and deformity, adapted from the joint alignment and motion scale6 and the Escola Paulista de Medicina-range of motion scale7. However, the deformity index was not formally validated.


We set out to design a score that is quick and easy to obtain, using information obtained by physical examination. It should measure irreversible damage, which implies that the score can only increase over time. Based on our clinical experience and common sense we formulated the rheumatoid arthritis articular damage (RAAD) score. In this method 35 joints or joint groups are scored on a three point scale (0, no irreversible damage; 1, partly damaged; 2, severe damage, ankylosis, or prosthesis). The definitions for scoring each joint are concise in order to make the method accessible to inexperienced assessors (table 1). The only tool needed is a goniometer, although most joints can be assessed without one. Metatarsophalangeal (MTP) joints of each foot are scored as a single joint.

Table 1

The RAAD score: definitions for scoring damage in individual joints. Contractures and other deformities should only be scored when they are expected to be irreversible without surgery

Patients fulfilling the 1987 ACR criteria for RA8 were selected from our outpatient clinic and rheumatology ward to obtain a sample with a wide range of disease duration and activity. Forty seven patients (17 male, 30 female) gave informed consent. Their mean age was 63 years (range 27–84), and mean disease duration was 16 years (range 1–48). In these patients, an RAAD score was determined by three observers on the same day: an experienced rheumatologist (HBM), a rheumatology trainee (TZ), and a rheumatology nurse specialist with little experience in joint examination, who had a brief training in using the score. Apart from the original score (RAAD-1), we also computed two alternative RAAD scores. In RAAD-2 only the number of damaged joints was counted. In RAAD-3 all joints were scored 1 or 2, and metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints of each hand were scored as a single joint.

On the same day, the 28 joint counts for tenderness (T) and swelling (S) were done, patients completed a visual analogue scale for general wellbeing (G), and the erythrocyte sedimentation rate (ESR) was measured. From these variables the disease activity score (DAS28) was calculated, using the formula: DAS28 = 0.56T + 0.28S + 0.70lnESR + 0.014G.9

All patients completed a Dutch version of the Health Assessment Questionnaire (HAQ).10 For each patient, recently taken radiographs of hands and feet were collected, or a new set of radiographs was obtained. These radiographs were scored using the Larsen method,11 in which the first metatarsal joint was left out and the wrist was scored as one joint and multiplied by five (maximum score 190). All radiographs were scored by two observers with any initial disagreement finally agreed by consensus.

A random selection of 121 patients with RA of known disease duration, visiting our outpatient clinic, were scored by HBM using the RAAD score. In 78 of them, an HAQ score was obtained on the same day. These cross sectional data were used to get an impression of the RAAD scores over time.

For statistical analysis SPSS was used. To assess interobserver variability, RAAD scores of three observers were analysed with Spearman's rank correlation and Friedman's test. Spearman's rank correlation was also used to assess correlation between the RAAD score, DAS28, disease duration, and the Larsen score. κ Statistics12 were used to assess the interobserver agreement of the RAAD score for individual joints.


The RAAD score appeared to be easily applicable. After a short learning period, it took about two minutes for each patient, depending on the amount of damage.

Table 2 shows the mean scores in 47 patients. All patients were assessed by observer 1, 41 by observer 2, and 45 by observer 3. Thirty nine patients were seen by all three observers. In 45 patients radiographs of hands and feet were available.

Table 2

Results in 47 patients, showing means, standard deviations, minimum and maximum values

Figure 1 shows the relation between the RAAD scores of two different observers. Spearman's rs for rheumatologist v rheumatology nurse was 0.89. For rheumatologist v trainee it was 0.90, for trainee v rheumatology nurse 0.95. Different ways of computing the RAAD score (RAAD-2 and -3) showed similar degrees of correlation. Because correlation is not the same as agreement, we also compared the RAAD scores of three observers. Using Friedman's non-parametric test for multiple related samples, we found no statistically significant difference between observers.

Figure 1

Scatterplot showing correlation of RAAD scores by two different observers. The diagonal line indicates perfect agreement. For these observers, Spearman's rs=0.89 (n=45, p<0.01).

Table 3 shows the κ values for interobserver agreement in individual joints. There was moderate to good agreement for most joints, although agreement for MCP 1 and the PIP joints was less favourable. κ Values for the ankle joint could not be computed for observer 2 because he never scored a value of 1 for this joint.

Table 3

Value of weighted κ and percentage of joints in which observers agreed, for separate joints in 39 patients. Left and right side of the body were summed. Agreement is considered poor if κ<0.20, fair if 0.21–0.40, moderate if 0.41–0.60, good if 0.61–0.80, very good if 0.81–1.00

In table 4 correlation between the RAAD score and other measures is shown using Spearman's rs. The RAAD score correlated well with the Larsen score. There was no significant correlation between the RAAD score and DAS28, whereas HAQ and DAS28 did correlate (Spearman's rs=0.42, p=0.003).

Table 4

Spearman's rank correlation of RAAD score with disease duration, radiographic damage (Larsen), disease activity (DAS), and functional capacity (HAQ)

Figure 2 shows data on disease duration v RAAD score for 121 patients assessed by observer 1. There was a wide range of RAAD scores for patients with the same disease duration, reflecting the large variation in disease outcome in RA. Spearman's rank correlation for RAAD-1 with disease duration was 0.68, for RAAD-2 was 0.67, and for RAAD-3 was 0.69 (p<0.001). This indicates that alternative methods of computing the score did not influence correlation with disease duration in this group of patients.

Figure 2

Box plot showing results of RAAD score v disease duration for 121 patients scored by observer 1. Boxes with horizontal lines represent interquartile range and median. Outliers and extremes are indicated separately. Note that the boxes cover various ranges of disease duration. The numbers of patients are displayed on the x axis.


Tugwell and Bombardier described the quality of scoring methods in terms of feasibility, reliability, validity, and responsiveness.13 The RAAD score is a cheap and quick method of assessing damage. We think it is feasible for studying long term damage in large groups of patients—for instance, in comparing outcome between hospitals or countries, or long term (>5 years) treatment strategies.

We tested interobserver variability and found little difference between the results of experienced and inexperienced observers. Interobserver variability was low for the total score as well as for most individual joints. The rather low level of agreement for the first MCP joint may have been caused by an inadequate definition: ulnar deviation was proposed for all MCP joints. For MCP 1, impairment of normal extension would be more appropriate as an intermediate (grade I) damage score. In the PIP joints, grade I damage may be difficult to assess because of pre-existing osteoarthrosis, and should perhaps be omitted. In the ankle joint, a damage score of 1 or 2 occurred only in a minority of patients. We believe this reflects the actual low occurrence of damage in this joint.

In some joints, particularly the cervical spine, shoulder, and hip, it may be difficult to distinguish irreversible damage from reversible impairment due to inflammation. If, for instance, shoulder movement is impaired, the observer has to decide if this is a fixed impairment or one that might improve after a corticosteroid injection.

Clinical assessment of cervical instability may be difficult. If from earlier radiographs a patient is already known to have significant cervical instability, we do not object to using this information in the RAAD score. However, no new radiographs should be taken for this purpose.

More detailed definitions or more extensive training and consensus meetings may reduce interobserver variability. However, both strategies make the method less easily applicable for wide scale use. Therefore, we prefer to keep definitions as simple as possible.

From the rather high level of interobserver agreement in our sample, we do not expect much intraobserver variability to occur. However, further study should focus on this aspect, especially since variation in swelling and inflammation within a single patient may influence the score in some joints.

Five aspects of the validity of outcome measures in RA can be distinguished: face validity, content validity, construct validity, criterion validity, and discriminant validity. Face validity means credibility. We believe that our clinical definitions are a sensible way of describing articular damage and have good face validity, but we welcome any suggestions for improvement.

Content validity deals with the question of whether a measure covers all aspects of the subject. We think that assessing clinical damage in large and small joints will render better content validity than assessing radiographic changes in small joints only.

To assess construct validity (that is, does this method correspond to theoretical concepts in articular damage?) we studied the correlation of the RAAD score with a number of other variables. As we expected, there is a positive correlation with disease duration and HAQ score (convergent validity) and no correlation with actual disease activity (divergent validity).

Criterion validity (does the score correlate with the gold standard?) is difficult to assess because there is no gold standard for articular damage. We chose the Larsen score as a substitute and found good correlation with our damage score. A simple damaged joint count (RAAD-2) lacks information on severity of damage and offered no substantial time saving. We also studied a more or less weighted score (RAAD-3), because the PIP and MCP joints seem to be overweighted in RAAD-1. In our study group this modification did not change the properties of the score significantly, but we are currently collecting data on larger numbers of patients to study this item properly. For the time being, we recommend the original RAAD score, not its modifications.

We developed the RAAD for measuring long term damage. Assessing its discriminant validity (responsiveness or sensitivity to change) is best done in a prospective design, but this would take at least 10 years. Instead, one could compare RAAD scores from subsets of patients with RA in whom a difference in outcome (for example, rheumatoid factor positive and rheumatoid factor negative patients) is expected.


We developed the RAAD score as a clinical method for scoring long term articular damage in large groups of patients with RA. It is easy to perform and showed good interobserver reliability, even when used by an inexperienced observer. It correlates well with Larsen scores and with disease duration but not with disease activity, demonstrating its criterion and construct validity. Before recommending its use for research or follow up of patients with RA, its inter- and intraobserver variability and discriminant validity need to be assessed.


Part of this study was performed while TR Zijlstra visited the ARC Epidemiology Unit in Manchester. He wishes to thank Professor D Symmons and other staff members for their hospitality and useful advice.

This study was supported by a Novartis Rheumatology Grant.


View Abstract