Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
We read with interest the report of the WOSERACT trial1 that compared the addition of 7 mg daily prednisolone or placebo to sulfasalazine in early rheumatoid arthritis (RA). A number of important aspects of the trial have been dealt with well: the sample size is adequate; appropriate attention has been paid to confounders; two separate and independent readers scored the radiographs; the 2 year trial was of adequate length; and the completeness of the data is satisfactory. Given these strengths, it is all the more disappointing that the results of the main outcome, radiographic damage, cannot be adequately interpreted as they are reported. Indeed, the validity of these results is open to serious doubt. We feel there is a real possibility of a type II statistical error (missing a true difference between treatment arms). There are two (possibly three) reasons for this.
Firstly, there is an absolute difference between the x ray scores of the two readers of about 40 Sharp points. This raises strong doubts over the proficiency of either or both readers. In early RA Sharp scores are typically very low, with most patients scoring 0 and only a few with higher scores. Scores of 80, let alone 159, after one year of RA are without precedent in the literature, and even the baseline medians of 6 and 8 recorded for the conservative reader are quite high. In contrast with the authors, we cannot be “reassured” by their assertion that “the change in x ray score was consistent between the two readers”: these data are simply not provided in the report. All we have is an unsatisfactory correlation of absolute scores between readers of 0.8 (whereas in most trials the intraclass correlation coefficient (the recommended and more severe test of reliability between readers) exceeds 0.9), and the comparison between readers of differences between the median start and end scores in the two study groups. Unfortunately, the difference between medians at baseline and end point is not the same as the median change.
Secondly, most trials choose two readers who read either with sequence known or unknown (the jury is still out on which is the preferred method), and report the mean of these two readings. This report has two readers, each of whom uses a different one of these options, and this makes it impossible to pool the results. Also, even with sequence unknown, films should be read as sets (all films belonging to one patient assessed simultaneously), not totally at random. Reading totally at random strongly decreases the signal to noise ratio.2 Which method did the “random” reader apply exactly?
A third concern is with the analysis, although this may only be a question of the way in which the data are presented. Although the authors state that the main outcome measure is the change in radiographic damage, they only report medians and ranges of the absolute scores in the groups. From our reading of the report, we fear the analysis has (statistically) compared the distributions of these absolute scores rather than their changes.
This is an important study, and has the potential to add valuable information to our understanding of the best way to treat RA, but in its present form the radiographic results are more likely to cloud the issues than clarify them. We suggest that the radiographs are made available to be re-read by two new, experienced readers, with either the sequence known or unknown to both. We also suggest that the analysis should present the median, range, etc, of the changes in each group, and the test of the difference between these. (If they have a skewed distribution, then either transformation before parametric analysis or the use of non-parametric methods would be the best way to compare the groups.)
There are other difficulties with the study, although these are less important than the essential concerns noted above. For example, we are baffled by the statement in the introduction that the COBRA combination3 “showed radiological advantage over sulfasalazine alone but the study was not powered to detect differences in x ray change”. In fact, the differences in x ray change were among the key findings of the COBRA study, and have since been shown to increase over time.4 So the study was not only adequately powered but also showed an unexpectedly large effect.
The authors diminish the value of the report by inappropriate interpretation of their secondary data, especially on the adverse effects. In the discussion they comment, “While observed toxicity from corticosteroids in terms of hypertension, weight gain, and osteoporosis could be reduced by active assessment and prompt intervention, there is no room for complacency”. However, in their results section they report that, “Low dose aspirin and treatment for ischaemic heart disease remained similar, whereas the use of antihypertensive agents increased in both groups, as did prescription of lipid lowering agents. The use of any treatment for osteoporosis also increased in both groups” In fact there was no difference between the groups and thus there was no observed toxicity from glucocorticoids in their study. Further, the authors make no comment on their observation that (many) more patients in the placebo group than in the glucocorticoid group stopped sulfasalazine treatment owing to side effects.
In relation to weight gain, inappropriate attention to within-group changes leads the authors to conclude that body weight “increased significantly” in the glucocorticoid group (median gain 4 kg), with only a “borderline increase” in the placebo group (median gain 3 kg). Body mass index is handled in the same way. However, the only really relevant comparisons, those between groups, do not even show a trend to significance (all p values ⩾0.10). As with the radiographic findings, the presentation of the table suggests end point results were compared rather than change scores.
Our interpretation of the clinical results contradicts that of the investigators, and we conclude that the effects on symptoms are in line with previous reports of limited and temporary advantages for disease activity, blunting of sulfasalazine toxicity, and extremely limited side effects when appropriate caution is applied. It is not possible to assess adequately the main results on x ray progression, which are at variance with several previously published studies,3–8 and we urge the authors to allow a second read of the radiographs so that their important dataset can be added to the existing evidence.
We agree with John Kirwan and Maarten Boers that the assessment of radiographic damage in the WOSERACT study is of importance.1 The method of reading radiographs has evolved since this study was planned in 1995.2 Because we consider that “the jury is out”, on the optimal way to read radiographs in studies the films were read (a) at random by one reader and (b) in sequence by the other reader, and the same conclusion was reached. This strengthens rather than weakens the case for a true result.
The study of Paulus et al,3 in which there was no beneficial effect on radiographic outcome in 197 patients with rheumatoid arthritis (RA), supports the WOSERACT study findings. It was unfortunate that in the Arthritis and Rheumatism Council (ARC) low dose corticosteroid study the two groups were not well matched at the outset, making interpretation of the true effect of prednisolone at 2 years and of the subsequent report difficult.4,5
The COBRA study6 used a high initial corticosteroid dose and the effects contributing to prompt disease control were multifactorial. It is similarly not possible to extrapolate from the study of van Everdingen et al,7 because they used a protocol of steroid without initial disease modifying antirheumatic drug, which is not a practice supported by current guidelines on RA management.
There was considerable discussion among the WOSERACT investigators about the approach to glucocorticoid side effects. It was decided that management of these would be the responsibility of the individual consultant, who remained unaware of the treatment assignment. For this reason there is likely to be a great deal of background noise. This issue was not a primary end point of our study but information is available from other studies.8 We do agree that the blunting of sulfasalazine toxicity in the active group is of interest, although it would be inappropriate to advocate the use of prednisolone for this reason alone.
At a time when multiple treatments are increasingly used in early RA it is vital to be certain what contribution, if any, oral corticosteroids might make. The fact that both the ARC study and ours showed no sustained clinical benefit, makes x ray interpretation all the more important.
Thus we suggest that with John Kirwan and Marten Boers an approach is made to the ARC (the original sponsors of the 1995 study), or to EULAR, for sufficient funding to allow independent readers and statisticians to evaluate all appropriate datasets. This would allow films from relevant studies to be copied and made available as a central repository for future study. The films from the study of Rau et al9 would also be useful for this initiative.