Article Text
Abstract
Plain film radiography is the preferred method for evaluating disease progression in rheumatoid arthritis and for establishing the efficacy of new disease modifying antirheumatic agents. However, the relative efficacy of these agents cannot be determined by comparing radiographic data from different studies, and a standardised system is needed.
- rheumatoid arthritis
- radiography
- disease modifying antirheumatic drugs
- DMARDs, disease modifying antirheumatic drugs
- FDA, Food and Drug Administration
- MTX, methotrexate
- NSAIDs, non-steroidal anti-inflammatory drugs
- RA, rheumatoid arthritis
Statistics from Altmetric.com
- DMARDs, disease modifying antirheumatic drugs
- FDA, Food and Drug Administration
- MTX, methotrexate
- NSAIDs, non-steroidal anti-inflammatory drugs
- RA, rheumatoid arthritis
Several new disease modifying antirheumatic drugs (DMARDs) for the treatment of rheumatoid arthritis (RA) have recently been introduced, including etanercept, infliximab, leflunomide, and anakinra, and more will soon be available.1,2 These agents come at a time when early, aggressive treatment to modify the course of the disease has gained wide acceptance, and the question of whether to use a combination of agents to increase efficacy is increasingly being asked.2,3 Quantitative plain film radiography has been the most common imaging modality used to follow disease progression and to assess drug efficacy.4,5
An ideal method for monitoring the progression of RA has several characteristics. Firstly, the method must have a high sensitivity for detecting early disease, when DMARDs have their greatest potential benefit.6,7 The ability to define single point damage, quantify severity, and monitor progression with accuracy, precision, and high reproducibility are also important characteristics of an ideal monitoring method.8 Other features of an ideal method include widespread availability, cost effectiveness, lack of patient harm, ease of use, and rapid generation of results. Creation of a permanent record that can be easily randomised and blinded is also desirable.5,9 A final important characteristic of an ideal method for monitoring the progression of RA is correlation with clinical disease course, which can often be fluctuating.10–13
The current “gold standard” for monitoring the progression of RA is plain film radiography, which has both advantages and disadvantages.5,11,14 The goals of this manuscript are to examine the application of quantitative radiographic evaluation to DMARD research and to explore the comparability of radiographic data across different trials, with a focus on data from the most recently approved DMARDs.
IMAGING TECHNIQUES IN RA
Plain film radiography is the most commonly used imaging modality for initial clinical evaluation and monitoring of RA. Plain radiographs can confirm a diagnosis of RA or even allow the diagnosis to be made when the clinical and laboratory information is equivocal, contradictory, or non-diagnostic. This modality can also be used to document, quantify, and monitor the amount and location of joint disease.
The structural damage of RA disclosed in plain film radiographic findings usually first occurs in the hands, wrists, and feet. Soft tissue swelling, marginal erosions, juxta-articular osteopenia, and uniform joint space loss are often seen fairly early in the course of RA. In the hand, the changes of RA are primarily seen in the metacarpophalangeal and proximal interphalangeal joints, while the intercarpal, radial, and ulnar styloid joints are most commonly affected in the wrist. In the feet, changes are most commonly seen in the metatarsophalangeal and proximal interphalangeal joints.15
“Plain film radiography has many advantages but also some disadvantages”
Using the reference characteristics of an ideal method for monitoring RA discussed above, many of the advantages of plain film radiography are clear. Plain radiographs are inexpensive, easy to generate, and widely available and accepted. They also give rapid results and provide a permanent record that can be easily studied in a randomised and blinded fashion. Radiographs are reproducible, allow measurement of severity, and can identify single point damage and progression with fairly high precision and accuracy.16 In addition, the overwhelming majority of reported data related to radiological progression of RA comes from plain film radiography.
However, there are several disadvantages to plain film radiography. Firstly, technical limitations include the need for proper technique and positioning, which can falsely obscure or enhance various findings on an initial film or follow up radiographs.11 Secondly, there are “floor” and “ceiling” effects related to detection and scoring of RA induced disease seen on conventional radiographs.16 The floor effect stems from the fact that the hallmark radiographic findings of bony erosions and joint space narrowing may occur late in the pathophysiology of the disease.17 The ceiling effect refers to the fact that radiographic progression of disease can continue even after the highest damage score has been assigned. Another somewhat controversial issue with plain film radiography is the relationship of x ray findings with clinical disease course. Multiple studies have been performed in an attempt to correlate commonly used clinical indices, such as the Health Assessment Questionnaire, grip strength, and the Ritchie index, with radiographic findings. The results have been variable.16 More recent data from clinical trials, however, indicate a clear relation between disease activity (particularly inflammation) and structural damage (reviewed by van der Heijde).18
Other modalities, such as magnetic resonance imaging, ultrasound, and radionuclide imaging, have been used to assess joint changes in RA.13,19–21 Of these, magnetic resonance imaging holds the most promise for future use, but cost effectiveness standards22 and predictive value have yet to be established.
USING PLAIN FILM RADIOGRAPHY TO EVALUATE RA PROGRESSION
After reviewing 60 published reports and four data sets, Scott and colleagues recently published an analysis of the link between radiographic findings in RA and disability over the course of the disease.23 The authors concluded that a relatively strong, and most likely causal, relationship exists between joint damage and later disability. Both increase over time, as does the correlation between the two. x Ray damage and disability are not correlated in early RA; once radiological scores exceed 33% of maximum damage, the relationship between damage and disability becomes more linear. In late RA (>8 years), joint damage and disability are most strongly correlated (Pearson r=0.30–0.70). Joint damage accounts for about 25% of disability in established RA.23 Radiographic progression is also associated with income loss and work disability in patients with RA.24 These findings support the notion that increased joint damage results in severe, long term consequences.
Because of the importance of radiographic progression in determining long term outcomes, a standardised, systematic method to evaluate and quantify the amount and progression of radiographic damage caused by RA is desirable. A quantitative approach to characterising joint damage and damage progression offers several advantages over a system that relies on a qualitative evaluation, such as “better,” “the same,” or “slightly worse.”
Data have shown that quantitative, systematic approaches to evaluating the status of RA induced joint damage can result in a high inter- and intraobserver correlation.25 Such methods also allow population means to be created, which enables more accurate comparisons between groups, possibly even across different studies.26
“Radiographic scoring systems differ in the joints examined and the method of scoring damage”
Using empirical reasoning that a systematic approach would be desirable, in 1949 Steinbrocker et al developed the first scoring system to quantify radiographic evaluation of RA.10 This system involved classifying RA induced damage into one of four gross stages. Since that time, many scoring systems have been developed (table 1).10 Most scoring systems look at certain joints within the hands and wrists or selected joints within the hands, wrists, and feet, primarily because these joints are easy to evaluate and a reasonable correlation between disease in these joints and total disease burden in other joints has been demonstrated.10 For clinical trials, investigators have increasingly advocated evaluation of the hands, wrists, and feet.9 Studies have shown that the joints of the feet usually become eroded earlier than the joints of the hands27–29; including feet thus may help improve the sensitivity of joint damage assessment in early RA.
The scoring systems that have been designed to evaluate radiographic changes in RA can be divided into two main groups, global and detailed.5,10 Global scoring systems assign one score to the entire joint, taking into account all the abnormalities seen, whereas detailed systems assign scores on at least two separate variables for each joint evaluated. The most widely used detailed scoring system is the modified Sharp method and its variations, and the most widely used global scoring system is the Scott modification of the Larsen score.16
COMPARISON OF THE MODIFIED SHARP AND LARSEN SCORES
Table 2 highlights the major differences between the two major variations of the Sharp method (modified Sharp method and Genant/Sharp method) and the modified Larsen system for scoring joint damage. Although this section will focus primarily on the differences between these approaches, it is important to note that radiographic scores obtained by the modified Sharp and Larsen methods have been shown to be significantly correlated.30 Nevertheless, some studies have found that compared with the Larsen method and its modifications, the Sharp method and its variations are more sensitive, particularly with respect to change over time.31–33
The original Sharp method assessed 27 joints in each hand and wrist, with each joint being given a separate score for joint space narrowing and erosions.34 Sharp and colleagues subsequently identified 17 areas for erosions and 18 areas for joint space narrowing that resulted in a high degree of intra- and interobserver accuracy.35 Van der Heijde later added feet to the radiographic analyses,36,37 a modification that has also been used by Sharp.38 Because of their similarities, these radiographic scoring systems will be referred to as “modified Sharp methods.” In modified Sharp scoring systems, each joint is given a separate score for joint space narrowing and erosions. Variations of these methods exist, each with subtle differences in the scales used and the criteria for scoring. In one commonly used scheme, 15 sites in each hand and wrist and six joints in each foot are examined for joint space narrowing on a scale of 0 to 437: 0 indicates no narrowing, 1 represents minimal narrowing, 2 indicates loss of 50% of the joint space, 3 indicates loss of 75% of the joint space, and 4 represents complete loss of the joint space.39 The erosions are counted individually, usually at 16 sites in each hand and wrist and six sites in each foot, with a maximum score of 5 given for a destroyed hand or foot joint. For joints in the feet, the van der Heijde version of the Sharp scoring system has a maximum score of 10 for a destroyed joint.37,40 A total Sharp score is generated based on the sum of the joint space narrowing and erosion scores.
The Genant modification of the Sharp method (Genant/Sharp method) focuses on 14 sites for erosions and 13 sites for joint space narrowing. Erosion scores range from 0 to 3.5 for each joint, and joint space narrowing is scored on a scale ranging from 0 to 4. Half grade scores (for example, 3+) are allowed to improve sensitivity. The total erosion score and the total joint score are each normalised based on a maximum score of 100, and these two normalised scores are added to give a joint total score in which erosions and joint space narrowing are evenly weighted.41
The original Larsen method assessed all limb joints.42 Although comprehensive, obtaining the needed x rays was not always practical. In the Scott modification of the Larsen method,43,44 multiple joints in the hands, wrists, and feet (in some studies) are evaluated. Each joint is given a grade between 0 and 5, with 0 representing a normal joint. Grade 1 reflects slight, early, or non-specific findings of RA. Periarticular osteopenia/joint swelling must be major, and/or suggested erosions/cysts at two sites in the joint must be smaller than 1 mm. Grade 2 reflects a definite early abnormality; one or more erosions larger than 1 mm must be present, with a break in the cortical margin. Grade 3 reflects medium destructive abnormality; erosions at both sides of the joint must be of significant size with preservation of some joint surface. Grade 4 reflects severe abnormality; subluxation must be present. Grade 5 reflects mutilating abnormalities. The original articular surfaces must have disappeared and gross bone deformation must be present in the weightbearing joints. To assign a grade, the joints are compared with radiographic standards.
In addition to the type of scoring system used, the way in which the films are viewed may have an important impact on the sensitivity and reliability of radiographic scoring. Three basic methods are available: single readings, paired readings that are blinded to sequence, and chronologically ordered paired readings. Studies of these three approaches suggest that single readings may provide less reliable data than paired readings.45–47 A study of one year radiographic data found that the two paired reading methods were comparable, but that the paired readings that are blinded to sequence may have greater power to test treatment effect.46 However, a three year study found that chronologically ordered paired readings provided the most sensitive assessment of damage, and that this advantage increased over time.45
“Scoring methods must be sufficiently sensitive to detect changes in early RA”
Several issues in developing and selecting an ideal scoring system should be considered.9 Reader disagreement and inter/intraobserver variation are important issues, though they can often be minimised by a training period to ensure familiarity with the scoring method.9,11,48 These problems are compounded by the need to assign a discrete number or score to a continuum of damage.49 Questions about the sensitivity of scoring methods in detecting change over time have also been raised.48 A recent report by an international panel of experts found that with the modified Sharp method, the smallest detectable difference (5.0 units) corresponded closely with the minimal clinically important difference (defined as radiographic progression that makes a rheumatologist change treatment). In contrast, the smallest detectable difference by the Larsen/Scott method was too insensitive to use as the threshold for clinically relevant change. As a result, changes in patients with early RA and patients with late RA and high disease activity in some cases went undetected.33 The ability of scoring systems to assess radiological healing may also be an issue.50 Healing phenomena can be seen in about 6% of joints51; this figure may be increased by the new DMARDs, particularly the biological agents. Accordingly, radiographic scoring systems used in clinical trials should be able to take potential healing into account.
STUDIES OF JOINT DAMAGE/DISEASE PROGRESSION IN RA
Multiple studies using variations of the Sharp or Larsen scoring systems have shown that joint damage occurs early in the course of RA, often within the first two years after presentation.19,28,52 Clinical predictors of radiographic damage and/or disease progression include positive rheumatoid factor titre status, raised erythrocyte sedimentation rate or C reactive protein level, joint swelling, and duration of disease.28,53,54 The reported rate of RA induced radiographic damage over time varies in different studies and in different populations. Some studies have found that the rate of disease progression is most rapid early in the disease, and then gradually decreases to a steady rate.55,56 In one recent prospective study that followed 183 patients with early RA (mean duration of symptoms, 11 months), the rate of disease progression was found to be threefold higher during the first two years after study entry compared with subsequent years.56
The rate of progression becomes a more important variable in light of several studies that have shown that aggressive treatment can delay joint damage.6,39,57 These data have suggested a possible “therapeutic window” early in the course of disease during which medical intervention with DMARDs may have a more significant impact than DMARD treatment given later.58
To understand the significance of a clinical trial that tests the effects of an agent on RA induced radiographic joint damage, one must be aware of the method used to quantify radiographic damage and the context within which it is applied. Comparing the effectiveness of two agents based on anything besides a head to head trial is currently difficult. Comparisons are less likely to be meaningful if trials used different scoring systems, examined films in a chronological versus random order, or had experienced versus naïve readers.26 Also, subtle differences may exist between studies that use the same general method of radiographic scoring, such as the number of joints examined or the total scoring range. Additional factors, particularly differences in patient populations, such as the amount of baseline radiographic damage at trial entry and the amount of disease activity and/or progression at baseline, also prevent accurate comparisons between trials.26 The different ways of reporting data can add to the confusion. Adopting specific standards for reporting radiographic data may help correct this. A round table discussion on this topic involving a panel of experts from academia, the pharmaceutical industry, and the US Food and Drug Administration (FDA) was held in December 2000, and the guidelines for reporting radiographic data developed by this group were recently published.59 Some of the most important of these guidelines include the reporting of absolute numbers for radiographic change along with the maximum score possible, mean change values together with the standard error or standard deviation for group comparisons, and the percentage of patients with disease progression. Radiographs of both hands and feet are preferred, and the smallest detectable difference for the study should be reported in order to provide quality control and to allow the reader to gauge the clinical significance of the data presented.59
KEY FINDINGS FROM RADIOGRAPHIC STUDIES
Effects of early, aggressive treatment
Clinical trial experience with modified Sharp scoring systems has been extensive. An example is a comparison study between “aggressive” early treatment with methotrexate (MTX) or sulfasalazine and conventional stepwise treatment (that is, starting treatment with non-steroidal anti-inflammatory drugs (NSAIDs) and progressing to DMARD treatment only if response to NSAIDs is insufficient).6 This investigation used the modified Sharp scale to monitor radiographic progression. A single observer unaware of the patients’ clinical and laboratory data analysed radiographs taken every six months in sequential order. Progression of radiographic damage was expressed as the difference between the absolute Sharp scores at 6, 12, 18, and 24 months and at study entry. The study found a significant reduction in radiographic progression after six months in the high risk group who received early MTX or sulfasalazine compared with the high risk group who received traditional stepwise treatment. This trial helped to establish the value of starting aggressive DMARD treatment early in the course of RA.
Comparison of monotherapy and combination therapy
A second important trial that used the modified Sharp scoring system was the 80 week European study by Boers et al, which compared sulfasalazine monotherapy with combination therapy with sulfasalazine, MTX, and prednisolone. In the combination therapy group the MTX and prednisolone doses were gradually tapered.60 The mean Sharp score, determined by two trained and blinded observers who viewed the films in chronological order, was used to analyse radiographs of the hands and feet at baseline, 28, 56, and 80 weeks. The study found a statistically significant improvement in radiographic progression for the group treated with combination therapy compared with those who received sulfasalazine monotherapy.60 The results of this study are often used to support the use of combination therapy in RA.
RADIOGRAPHIC STUDIES OF NEW DMARDS
Etanercept, infliximab (in combination with MTX), leflunomide, and anakinra are recently introduced agents that have shown efficacy in the treatment of RA39,57,61,62 and have been approved by the FDA for the treatment of the signs and symptoms of this disease. Additionally, etanercept, infliximab (in combination with MTX), and leflunomide are approved for inhibiting or retarding structural damage associated with RA. Anakinra has not yet received this additional indication.
Attempting to compare these four agents, based on the trials that examined their ability to inhibit radiographic progression, highlights the problems that arise in trying to interpret radiographic data in RA. Here we review data from trials in which these agents were compared with MTX, the conventional gold standard DMARD. For anakinra, the only radiographic data currently available are from a trial comparing anakinra with placebo. Table 3 shows the relevant trial characteristics.
Leflunomide
Two randomised, controlled trials examined radiographic progression in patients treated with leflunomide versus MTX. A 12 month trial reported by Strand et al used a modified Sharp scoring method to determine the effects of these DMARD treatments on radiographic disease progression.63 Radiographs of the hands and feet were obtained at baseline and 12 months (or early exit) for 352 of the 482 patients enrolled in the study. Forty six joints were evaluated for erosions (five point scale) and 48 for joint space narrowing (four point scale). Possible total scores ranged from 0 (no damage) to a maximum score of 422 (severe damage).4 Patients enrolled in the trial had RA for at least six months. Patients could not have previously received MTX treatment; treatment with other DMARDs was discontinued at least 30 days before study entry.63 Compared with placebo, monotherapy with MTX or leflunomide significantly slowed radiographic progression as assessed by total Sharp score, erosion and joint space narrowing scores. In patients receiving leflunomide, total Sharp scores increased by a mean of 0.53 over 12 months, compared with 2.16 (p=0.007) in the placebo group and 0.89 (p=0.05) in the MTX group.4
In the second trial, reported by Emery et al, patients received leflunomide or MTX for one year; at this point, patients could choose to continue to receive double blind treatment for a second year.61 All patients had active RA (at least six swollen and six tender joints) of between four months’ and 10 years’ duration. DMARDs were discontinued at least 28 days before trial enrolment; approximately 66% of patients had failed to respond to at least one DMARD. Radiographs were obtained at baseline and at one and two years, and were assessed by the modified Larsen technique. Forty joints in the hands and feet were scored on a scale of 0 (no damage) to 5 (severe damage); the total score was divided by 40 to give a mean Larsen score for each joint. The number of eroded joints was also counted. Although the publication by Emery et al does not state the proportion of patients for whom radiographs were available,61 a subsequent reanalysis using the Sharp method reported that baseline and one year films were available for 64% of the 999 patients in the study (the two year data were not analysed, and therefore this figure was not reported).4 The leflunomide and MTX groups had comparable overall mean Larsen scores at baseline, and both groups showed a small (0.03) increase in Larsen score during the first year of the trial. During the second year, there was no further increase in joint damage in the leflunomide group and a slight improvement in the MTX group, resulting in a small but statistically significant treatment difference in change in radiographic scores in favour of MTX at two years.61 An analysis of one year radiographic data by the modified Sharp method (same scoring as in the Strand leflunomide study discussed above) also concluded that both leflunomide and MTX slowed disease progression and were comparable during the first year of treatment (6.74 increase in total Sharp score in the leflunomide group v 6.47 increase in the MTX group).
Etanercept
Bathon et al compared etanercept treatment and MTX treatment over a 12 month period.39 The modified Sharp scoring system was used to evaluate radiographic progression of RA every six months in the hands, wrists, and feet. A total of 46 joints were evaluated for erosions on a six point scale (0–5), and 42 joints were examined for joint space narrowing on a 5 point scale (0–4), with a total possible score ranging from 0 (no damage) to 398 (severe joint destruction). Each image was scored by two blinded radiologists or rheumatologists, who were first trained in the scoring method. Interobserver correlation was good (Pearson r=0.85). Patients in the study had been diagnosed with RA for less than three years and had never been treated with MTX. If patients had been receiving other DMARDs, these were discontinued four weeks before the study began.39
Radiographs were available for 583 of the 632 randomised patients. Compared with MTX, etanercept treatment resulted in less radiographic progression over the course of one year, as reflected by both the erosion score and the total Sharp score. A significant difference was found between etanercept 25 mg and MTX in the mean change in erosion scores at six months (0.30 v 0.68; p=0.001) and one year (0.47 v 1.03; p=0.002), while the total Sharp score showed a significant difference at six months (0.57 v 1.06; p=0.001), but not at one year (1.00 v 1.59; p=0.11). The most profound differences between patients receiving these two agents were seen during the first six months, during which the mean erosion score increased by more than twice as much in the MTX group as it did in those treated with etanercept 25 mg.39 No significant difference in joint space narrowing scores was found between these two treatment groups, with both groups showing only minor increases (<0.5 units) in joint space narrowing over the course of one year. This finding may relate to the early RA population enrolled in this study. A prospective study of radiological progression in patients with RA of less than three years’ duration found that erosions progressed more rapidly than joint space narrowing early in the disease.28
Two year data from this study, in which patients were allowed to continue to receive treatment in an open label manner, have recently been reported.64 The differences between the etanercept and MTX groups seen at earlier time points became more evident with a longer study duration. At two years, patients taking etanercept 25 mg showed significantly less radiographic progression than patients in the MTX group, as indicated by the mean change from baseline in the total Sharp score (1.3 v 3.2; p=0.001), and significantly less erosion, as indicated by the mean change from baseline in the erosion score (0.7 v 1.9; p=0.001). Joint space narrowing scores did not show a significant difference between treatment groups.64
Infliximab
A trial comparing MTX alone with MTX plus infliximab also used the modified Sharp scoring system to evaluate radiographic changes. In this study, radiographs of the hands and feet were obtained at baseline, 30, and 54 weeks and were scored by two blinded readers.57 For erosions, 32 joints in the hand (5 point scale) and 12 joints in the feet (10 point scale) were examined. A total of 40 joints were examined for joint space narrowing (4 point scale). Possible total scores ranged from 0 (no damage) to 440 (severe damage). The mean of the two scores was used in the analyses. Patients in this study had active, longstanding disease despite MTX treatment of at least 12.5 mg per week.57
Patients were randomly allocated to receive the same dose of MTX as they had been receiving before the study, plus either placebo or infliximab (3 or 10 mg/kg body weight every four or eight weeks).57 Radiographs were examined for 349 of 428 patients. Compared with placebo plus MTX, infliximab plus MTX resulted in significantly less progression of joint damage as judged by the total radiographic score, erosion score, or joint space narrowing score at 54 weeks. Total Sharp scores increased by a mean of 7.0 in the placebo plus MTX group, compared with 1.3 for the currently recommended infliximab plus MTX regimen (3 mg/kg every eight weeks; p<0.001).
Anakinra
Both the modified Larsen method and the Genant/Sharp method were used to assess data from a trial of anakinra. Patients were randomly allocated to receive placebo or a single daily dose of subcutaneous anakinra at 30, 75, or 150 mg for six months. Radiographs of the hands and wrists were obtained at weeks 0 and 24 and scored by two radiologists according to the Larsen method, with the lower of the two scores chosen to provide a conservative assessment. Fifteen areas in each hand and wrist were scored on a global scale of 0 to 5 by comparison with standardised images, with total scores ranging from 0 (no damage) to 150 (severe damage in all joints).65 Feet joints were not assessed.62 Patients with active, severe RA and a disease duration of less than eight years were enrolled in this study. Treatment with DMARDs was discontinued at least six weeks before enrolment.62
Radiographs for 347 of 472 patients were assessed. Compared with placebo, the total Larsen score of the pooled anakinra treatment groups was significantly lower (6.4 v 3.8; p=0.03). Although other clinical parameters, such as swollen and tender joints, showed a clear dose response, the lowest dosage group (30 mg/day) had the best radiographic results and was the only treatment group that by itself showed a significant difference from placebo.62
Radiographs from this study were subsequently analysed by the Genant modification of the Sharp method for 333 of the 472 patients.65 A single observer scored 14 sites for each hand for joint erosions on a scale of 0 to 3.5, with half grades allowed, and 13 sites for each hand for joint space narrowing on a scale of 0 to 4, with half grades allowed.66 Using this scoring system, anakinra was found to reduce progression in the total Genant/Sharp score by about 50% compared with placebo (3.52 v 1.85; p=0.0004) during the six month trial. All active treatment groups showed significantly reduced scores compared with placebo; in this analysis the highest dosage group (150 mg) showed the best results.66 Both joint space narrowing scores and erosion scores were significantly higher in the placebo group than in the active treatment group.
Comparing new agents: which is most effective in stopping radiographic progression?
All four of the new agents discussed here have demonstrated DMARD activity and have shown the ability to inhibit the progression of structural damage (although it should be noted that the FDA has requested more data for anakinra before approving this indication). Having new DMARDs for the treatment of RA will certainly be a boon to both clinicians and patients. But which agent is the best?
All clinicians want to help their patients choose the “best” treatment, and therefore, it is tempting to try to use these trial data to determine the most efficacious agent for retarding radiographic progression. However, these data were designed only to answer questions of efficacy within a particular trial. The differences among the trials are too great to extrapolate their data to answer other questions. For instance, although three of the trials used the modified Sharp method, different possible scoring ranges were reported and, in some cases, different scales were used for scoring erosions. These trial-specific modifications produce non-comparable test results.
Perhaps the most important differences among the trials were the baseline patient characteristics. In the etanercept trial, average disease duration was roughly one year39; in the anakinra trial, it was 3.7–4.3 years62; in the leflunomide trials, it was about 3.561 and 7.0 years63; and in the infliximab trial, 9 to 12 years.57 This may be particularly important in studies of radiographic progression, as some studies have suggested a more rapid rate of progression in patients with early RA.55,56 Radiographic damage at baseline was also markedly different, consistent with the different duration of disease in enrolled patients. Finally, patients in the infliximab trial showed an inadequate response to MTX at study entry. Accordingly, the MTX arm in this study was more similar to a placebo arm than to an active treatment group. The impressive impact on radiographic progression of infliximab plus MTX relative to MTX must thus be interpreted with caution. In contrast, both etanercept and leflunomide (in one study) were compared with MTX in an MTX-naïve population. Data comparing the effect of anakinra on radiographic progression relative to MTX have not yet been reported.
Many questions about these agents are thus left unanswered. How do infliximab and anakinra compare with MTX in MTX-naïve patients? Will all of these agents be equally effective both in early RA and in later stages of the disease? And how do these agents compare with each other in halting radiographic progression? Until head to head radiographic studies among these agents are performed, the relative merit of these different DMARDs must remain unknown.
CONCLUSION
Although it has limitations, the current “gold standard” for radiological evaluation of disease progression in RA is plain film radiography. This standard has been used in multiple studies, and the World Health Organisation, American College of Rheumatology, and others recommend using joint imaging as one of the standards for classifying and investigating DMARDs.67,68 Radiographic scoring systems provide valuable information on the ability of different treatment regimens to impact joint destruction. However, in many cases, data from one study cannot be directly compared with data from other studies, particularly if different scoring methodologies or different patient characteristics are involved. Creating a standardised system for radiographic scoring in clinical trials will help to improve the comparability of data in some cases, but even such a system may not allow comparison of data from diverse patient populations.
Acknowledgments
This research was supported by an unrestricted educational grant from Wyeth.