Background: Because an increasing number of clinical trials evaluating disease-modifying antirheumatic drugs in rheumatoid arthritis (RA) emphasise radiographic outcomes as a primary outcome, using a reproducible radiographic measure should be placed at a premium.
Aim: To evaluate the reporting of radiographic methods in randomised trials assessing radiographic outcomes in RA.
Methods: Medline was searched for randomised controlled trials assessing radiographic outcomes published between January 1994 and December 2005 in general medical and specialty journals with a high impact factor. One reader extracted data (radiographic acquisition, assessment and reproducibility) using a standardised form.
Results: A total of 46 reports were included in the analysis. The mean (SD) methodological quality scores on the Jadad scale (range 0–5) and the Delphi list (0–9) were 2.9 (1.2) and 6.4 (1.3), respectively. Use of a standardised procedure for the acquisition of the radiographs was reported in 2 (4.3%) articles. 2 (4.3%) reports indicated that the quality of the radiographs was evaluated. In 65.2% of the reports, ⩾2 radiographic scores were used. Reporting of radiographic assessment was well detailed for number of readers (91.3%), information on readers (71.7%), blinding (91.4%) and how films were viewed (74.0%). The reproducibility of the reading was reported in 39.1% of the articles.
Conclusion: The reporting of results of randomised controlled trials of radiographic outcomes in RA shows great variability in radiographic scores used. Reporting of radiographic methods could be improved upon, especially the acquisition procedure and the reproducibility of the reading.
- RA, rheumatoid arthritis
Statistics from Altmetric.com
Rheumatoid arthritis (RA) is the most common chronic inflammatory joint disease and is responsible for symptomatic manifestations (eg, functional status and pain) and structural damage (ie, damage of the articular cartilage and bone).1 The use of disease-modifying antirheumatic drugs has increased for RA.2 Assessing such treatments requires the measurement of structural outcomes in randomised controlled trials to demonstrate a reduction or a retardation of disease progression.
Radiography provides an objective measure of the extent of anatomical joint damage. It can be used to assess the severity of the structural destruction, to follow the course of the disease and to establish effects of treatment.3 This highly accepted technique is widely available and provides a permanent record of the structural image, allowing for comparison over time and re-reading if necessary.4,5 The assessment of radiographic outcomes for evaluating drug efficacy was recommended for the management of RA in controlled trials,6,7 and the radiographic outcome is often used as a primary end point for assessing structural severity.8
The reproducibility (ie, the extent to which repeated measurements on the same subject yield the same results) is one of the prerequisites for a primary outcome.9–,11 With radiography, measurements can be biased and their precision compromised by two well-identified sources of variability—image acquisition and assessment—which can be a serious limitation in its use. The image variability due to differences in acquisition processes is a major concern. Because many parameters can affect the appearance of the radiographs (ie, positioning for radiographs, film exposure and resolution, and reproducibility), standardisation of the acquisition procedure is needed.12,13 Similarly, radiographic assessment could be influenced by many known parameters (eg, the scoring method, the number of readers, films grouped by patient or not, and films chronologically scored or not).14
Evaluating the reproducibility of an outcome measure theoretically supposes a detailed reporting of the methods used to measure it. The Consolidated Standards of Reporting Trials statement has recommended the reporting of methods used “to enhance the quality of measurements”, especially when considering the primary outcome,15 including how the outcome was measured and what steps were used to increase the reliability.
The purpose of this study was to evaluate the reporting of radiographic methods in randomised controlled trials assessing radiographic outcomes in RA. We focused on investigating radiographic acquisition, radiographic assessment and how the reproducibility was determined.
MATERIALS AND METHODS
Search strategy for identification of studies
The search method has been detailed elsewhere.16 Briefly, we searched Medline and the Cochrane Central Register of Controlled Trials for randomised controlled trials assessing radiographic outcomes in RA published between January 1994 and December 2005 in general medical and specialty journals with a high impact factor (appendix A). We chose these journals because a high impact factor is a good predictor of the high methodological quality of journal articles17 and because our goal was not to be exhaustive but, rather, to raise awareness of the reporting of radiographic assessment and acquisition. Potentially relevant articles were selected on the basis of the title, abstract and keywords by one reader (GB) using the following criteria: study population of adults aged ⩾18 years, randomised controlled trial and presence of at least one radiographic outcome evaluated by scheduled radiography. When duplicate publications of a trial existed, only the main publication was included (ie, when a report of the same trial appeared twice or more, the report of the principal analysis planned by the protocol was selected). Reports of extended follow-up trials, analyses of multiple trials and subgroup analyses were excluded. If the abstract indicated that the article might be relevant, the entire paper was included.
One author (GB) extracted all the data using a standardised form. From each article, data were obtained for the following:
Characteristics of the included articles: Data were searched for number of centres, sample size and methodological quality scores (Jadad scale18 and the Delphi list19). The reviewer also determined whether the radiographic outcome was clearly defined as the primary end point.
Radiographic acquisition: The reviewer extracted data on regions assessed (eg, hand and feet), number of radiographic sessions, time between baseline session and the last session, and time between two radiographic sessions. The reviewer checked whether methods used to take the radiographs were detailed: radiographic view (eg, dorsovolar and anteroposterior), use of a standardised procedure to improve positioning, description of the image quality (film exposure and use of resolution films), and use of digitised image or not.
Radiographic assessment: The reviewer collected data on which score was used for primary and secondary radiographic analysis, and determined whether information was included on the reader’s experience (eg, it was explicitly stated that radiographs were read by a well-known expert), background (eg, radiologist) and identity (eg, initials reported); and whether there were multiple observers (ie, methods of consensus to obtain a single score described when appropriate). The reviewer also checked whether the assessment of the radiographs was blinded (ie, to treatment assignment or patient identity and clinical data). He also determined how films were read (paired reading with chronological or random order).
Reproducibility: The reviewer recorded whether intrareader and inter-reader reproducibility was evaluated and how.
Categorical variables were described with frequencies and percentages, and quantitative variables with means (SD) or medians (minimum, maximum). Two ancillary analyses were also performed. The first describes characteristics of the radiographic assessment in reports published before 2000 versus reports published in 2000 and later, and the second compares reports with radiographic outcome defined as a primary end point and reports with a non-radiographic primary outcome. All data analyses involved use of SAS for Windows, V.9.1.
Characteristics of the included articles
The search strategy generated 1004 citations. In all, 46 studies were relevant according to the title, abstract and complete retrieval of the article (fig 1⇓). Detailed references of the selected articles are listed in appendix B.
In all, 40 (87.0%) reports concerned multicentre studies. The median sample size of the studies was 183 (20, 1446) patients. The mean (SD) methodological quality scores were 2.8 (1.0) for the Jadad scale and 6.5 (1.3) for the Delphi list. The radiographic outcome was clearly defined as the primary end point in 26 (56.5%) reports.
Hand and feet joints were the most frequently assessed areas (75.0%; table 1⇓). In many reports (n = 20, 43.5%), two radiographs (included baseline radiographs) were obtained (table 1⇓). The median time between the baseline session and the last session was 12 months (5.5, 48 months) and that between two sessions was 53 weeks (24, 104 weeks).
The radiographic view (eg, dorsovolar) was described in 11 (23.9%) reports. Use of a standardised radiographic procedure for optimal positioning was reported in 2 (4.3%) articles. The x-ray films were described as digitised in 3 (6.5%) reports. The assessment of radiographic quality was reported in only 2 (4.3%) articles and details on film resolution (eg, single emulsion film) were provided in 2 (4.3%) reports.
In 65.2% of the reports, ⩾2 radiographic scores were used (table 2⇓). Among scores combining erosions and joint space narrowing, the Sharp score and the Larsen score (including their modified versions) were the most represented methods (47.8% and 34.7%, respectively).
Information on reader(s) (ie, readers’ experience in evaluating radiographs, readers’ background or identity) were described in 33 (71.7%) articles. Readers’ experience, background and identity were reported in 17 (37.0%), 25 (54.3%) and 23 (50.0%) reports, respectively (table 3⇓).
In 4 (8.7%) reports, it was impossible to determine the number of readers. Radiographs were read by at least 2 (65.2%) readers in 30 reports and by a single reader in 12 (26.1%).
Of the reports describing at least 2 readers, 22 (61.1%) described readers reading all the radiographs; in 1 (3.3%) report all the readers did not read all the radiographs, and in 7 (23.3%) the reading was unclear. When all the radiographs were read (n = 22), a consensus method to produce a single radiographic score was described in 8 reports (consensus with the same readers for all disagreements (n = 5), consensus for only important disagreements (n = 2) or use of another method of measurement (n = 1)); in 10 reports the radiographic score was obtained from the mean score of the readers, and in 2 reports from the lower score of the readers (in the two remaining reports, it was unclear how a single score was obtained).
Masked assessment of radiographs was described in 43 (93.5%) reports (table 3⇑). Blinding was described to concern treatment assignment or patient identity (n = 37, 86.0%) and clinical data (n = 11, 23.9%); four articles reported that the assessor was blinded, with no other details given.
In 12 (26.1%) articles, no details were provided about the reading of the radiographs. When reading was described (73.9%), paired reading in random order and paired reading in chronological order were used in 18 (52.9%) and 15 (44.1%) articles, respectively. One article (2.9%) used both random and chronological paired reading.
Reproducibility of the reading
Reproducibility of the reading was described in 18 (39.1%) articles. Intrareader reproducibility was reported in 9 (19.6%) articles and inter-reader reproducibility in 13 (28.3%; 4 articles (8.7%) reported both intrareader and inter-reader reproducibility). In all, 5 (10.8%) articles reported two reproducibility measures. Use of the intraclass correlation coefficient (n = 8) and coefficient of correlation (n = 7) and reporting the smallest detectable difference (n = 6) were the most common methods used to assess reproducibility. Other methods were use of median with minimum and maximum (n = 1).
When comparing reports with radiographic outcome defined as a primary end point versus other reports, we found a major difference only in the reporting of reproducibility (53.8% vs 20.0%) (table 3⇑).
No difference was found in the reporting of radiographic assessment between reports published before 2000 and those published in 2000 and later (table 3⇑).
Because an increasing number of clinical trials in RA emphasise radiographic outcome as a primary outcome, investigating the reporting of the radiographic methods is important. Our results show a great variability of the radiographic scores used and that radiographic assessment was better described than radiographic acquisition. Measures of reproducibility were reported in almost 40% of the assessed reports.
Reducing measurement error is an important objective, so use of a reproducible radiographic measure should be placed at a premium. Reporting reproducibility measures and describing methods of measurement are an essential step to confirm the validity of a measure.15 Controlling the reproducibility of a radiographic measurement concerns two steps: acquisition and assessment.
Although variability of acquisition can lead to measurement error,12,13 only a few articles described radiographic acquisition in detail. The standardisation of the acquisition is facilitated by centralised acquisition, and entails training in radiographic acquisition and similar conditions of assessment (eg, positioning for radiography20). Even if centralisation of the acquisition is not always possible (eg, because of multicentre studies or financial reasons), the training of radiologists could be centralised to decrease variability. Radiographic assessment also depends on the technical quality of the radiographs, such as film exposure or resolution.21 So, evaluation of radiographic quality could be a control element that allows for limiting the number of non-assessable radiographs.
When designing a trial, the choice of a radiographic score for assessing structural destruction can directly influence reproducibility.21,22 When focusing on radiographic scores used in assessed reports, we found no consensus in assessment scores used, probably because use of the radiographic score has evolved over the years.21 However, in the past few years, the modified Sharp and Larsen scores seem to have been preferentially used.23
When considering other radiographic assessment characteristics, we noted that blindness was well respected and described in our study. Similarly, recommendations on reporting the number of observers and describing how to obtain a single measurement are followed (eg, mean radiographic score from two assessors to decrease measurement error24). Specific training, as a calibration exercise, could be requested in order to reach an agreement when there is more than one reader. Some other parameters could be better described. For instance, experienced readers have been shown to have better agreement.22 A number of reports did not state in what order radiographs were assessed, even though this parameter can influence the measurements.23,25
Almost 40% of the articles described the reproducibility of reading, even though reporting agreement between observers is essential to assessing the quality of observations.14,26 If reproducibility was low or not assessed, then the use of radiographic outcomes as a primary end point might be questionable. Most of the methods reported such use of intraclass correlation coefficient or smallest detectable difference that were adapted for evaluating reproducibility. However, the use of the correlation coefficient as a measure of agreement was found in a surprising number of articles. Correlation coefficients measure the strength of an association, not the concordance, and thus should be avoided in this indication.27
Van der Heijde et al14 give recommendations for improving the reporting of radiographic methods in studies of radiographic outcomes. Because the optimal interpretation of a radiograph also depends on conditions of acquisition, we suggest that the radiographic acquisition should be more detailed or referenced in reports. Such a description could detail whether a standardised protocol was followed, whether the technicians were educated, whether the radiographs were digitised, whether the quality of films was assessed, which view was chosen and what kind of film was used.
Our study is limited in that our search was restricted to articles published in high-impact-factor journals. However, articles in low-impact-factor journals may have had the same or lower methodological quality. Second, we pooled phase II and III studies with those of a more epidemiological nature, even if they were also randomised controlled trials. The degree of conformance with regulatory requirements and, consequently, required details and rigour are probably more important for reporting phase II and phase III trials. Third, the radiographic outcome was not always defined as a primary end point in our selected studies. We could have presumed a more detailed description of the conditions of measurement if we considered only reports in which the radiographic outcome was the primary end point, but results were quite similar. We also decided to include all the studies in all the fields evaluated because the primary outcome has been shown to differ between protocols and reports of studies. In fact, Chan et al demonstrated that primary outcome differed between protocol and publication in 40–62% of trials.28,29 Finally, some discrepancies may exist between the real methods used and methods reported. Some deficiencies may appear simply because of poor reporting, which does not necessarily mean that the methods were not applied.30,31 In fact, some details may not have been reported because the authors regarded them to be standard and not necessary to mention (eg, radiographic view or details about film resolution). Similarly, authors are often forced by referees or editors to abbreviate the report, and so important information such as details on acquisition technique are removed from manuscripts. However, online appendices, supplemental information or longer versions of a paper could be provided—for example, the US Food and Drug Administration provided in its website most of the required details of the radiographic methods used in reporting results of a trial.
In conclusion, our results have highlighted that the reporting of results of randomised controlled trials of radiographic outcomes measured by scheduled radiography in RA showed a great variability of radiographic scores used, and that reporting of radiographic methods could be improved upon, especially the acquisition procedure and the reproducibility. Investigators are encouraged to follow guidelines on reporting radiographic data in randomised controlled trials in RA.
APPENDIX A. JOURNALS INCLUDED IN THE SEARCH STRATEGY FOR RANDOMISED CONTROLLED TRIALS WITH RADIOGRAPHIC OUTCOMES IN RHEUMATOID ARTHRITIS
Ten general and internal medicine journals (New England Journal of Medicine, Journal of the American Medical Association, The Lancet, Annals of Internal Medicine, Annual Review of Medicine, Archives of Internal Medicine, British Medical Journal, American Journal of Medicine, Medicine and Proceedings of the Association of American Physicians);
Six rheumatologic journals (Arthritis and Rheumatism, Seminars in Arthritis and Rheumatism, Annals of Rheumatic Diseases, Rheumatology [Oxford, England], Journal of Rheumatology, and Arthritis Care and Research);
Six orthopaedic journals (Osteoarthritis and Cartilage/OARS, Arthroscopy, Journal of Orthopaedic Research: Official Publication of the Orthopaedic Research Society, Journal of Bone and Joint Surgery, American Volume, Spine, and Gait & Posture);
Six rehabilitation journals (Archives of Physical Medicine and Rehabilitation, Supportive Care in Cancer: Official Journal of the Multinational Association of Supportive Care in Cancer, Journal of Electromyography and Kinesiology: Official Journal of the International Society of Electrophysiological Kinesiology, Physical Therapy, Journal of Rehabilitation Research and Development, and Scandinavian Journal of Rehabilitation Medicine).
APPENDIX B. DETAILED REFERENCES OF SELECTED ARTICLES
Published Online First 11 December 2006
Competing interests: None declared.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.