OBJECTIVE To compare radiographic reading procedures and evaluate their impact on sample size in hip osteoarthritis (OA) longitudinal studies.
METHODS Pelvic radiographs performed twice, three years apart, in 104 patients with hip OA were read by a single reader using the Kellgren and Lawrence system, joint space narrowing scale, and joint space width (JSW). Reading procedures were (a) films read as single radiographs, (b) films grouped by patient but read in random order, (c) films grouped by patient and chronologically ordered, all with landmarks for JSW measurements, (d) films read as single radiographs, without landmarks for JSW measurements. JSW was measured at the narrowest point with a 0.1 mm graduated magnifying glass.
RESULTS More Kellgren and Lawrence or joint space narrowing grades were modified respectively with the single (42% and 37%) than with the paired (32% and 23%) or chronologically ordered (34% and 29%) reading procedures. Variability of JSW progression was principally related to mean progression (88.3%) and landmarks (almost 10%). Standardised response means were −0.71 with the paired reading procedure with landmarks, −0.68 with the single reading procedure with landmarks, −0.65 with the single reading procedure without landmarks. With landmarks, 10% more patients would be needed using single than paired reading. Using single reading, 10% more patients would be needed without landmarks than with landmarks.
CONCLUSION Kellgren and Lawrence grading seems to be influenced by the reading procedure, as is joint space narrowing grading, for assessing hip OA. Paired reading procedure with landmarks for JSW should be recommended in longitudinal studies.
- hip osteoarthritis
- pelvic radiographs
- reading procedures
Statistics from Altmetric.com
Structural morphological changes on radiographs are considered the primary outcome variables for assessing the progression of osteoarthritis (OA).1-3 Depending on the joint studied, several indices are currently used for assessing radiological progression of OA, including individual radiographic features, composite indices, and quantitative measures.4 The Kellgren and Lawrence grading system is often used despite its limitations.5-7 Joint space narrowing, recorded both as measurements of interbone distance or by visual grading, is presently one of the most common variables recommended.1 2 However, methodological problems remain when using this index. Firstly, when measuring joint space width (JSW) for assessing OA radiological progression, landmarks for measurements may be drawn on radiographs7 8 or not.9 10 The implications of drawing landmarks have not yet been studied. Secondly, it has been recommended that the date and identification of the patient of radiographs should not be known when assessing the progression of OA,11 and the type of blinding differs in studies of OA progression.7-10 12 13 Thus radiographic OA progression might be assessed with the reader aware of neither the patient's identity nor the chronological order of the radiographs (single reading procedure), aware of the patient's identity but unaware of the chronological order of the radiographs (paired reading procedure), or aware of both the patient's identity and the chronological order of the radiographs (chronologically ordered reading procedure). Yet these radiographic reading procedures seem to have different implications on the power of studies.14 15 In clinical trials on rheumatoid arthritis, 11.8% and 38% more patients would be needed to detect the same progression difference of joint damage and joint erosion scores, respectively, when reading paired films chronologically ordered rather than reading paired films in random order.14
These different reading procedures have not been compared in the assessment of OA progression. Additionally, the impact on the design of OA studies of blindness or drawing landmarks for measuring radiological progression has not been evaluated. The aims of this study were (a) to compare different reading procedures and (b) to evaluate the impact of these reading procedures and using landmarks to measure JSW on sample size, for longitudinal evaluation of hip OA.
Patients and methods
A sample of 104 patients fulfilling the American College of Rheumatology clinical and radiographic criteria for the diagnosis of hip OA including hip pain with at least two of the following: joint space narrowing, osteophytes, and erythrocyte sedimentation rate <20 mm/1st h,16 were selected from a three year randomised, controlled trial. In this randomised trial, other inclusion criteria were: age between 50 and 75 years, presence of daily pain for at least one month in the past three months, absence of secondary hip OA (presence or past history of hip fracture, inflammatory rheumatic disease, osteonecrosis, Paget's disease, etc),17 a JSW larger than 1 mm at the narrowest point, absence of medial or axial femoral head migration, or both, on radiographs, obtaining written informed consent. To enter our study, patients were selected using the following criteria: available radiographs for three years and absence of hip prosthesis on the target joint.
Each patient had plain pelvic radiograph at entry into the study and at three years. Anteroposterior radiograph was performed with the patient standing on both legs. The patient's feet were 15° ± 5° internally rotated. The x ray beam was horizontal, perpendicular to the table. The source to film distance was 100 cm. Thus 208 radiographs—that is, two radiographs for each patient were obtained.
Radiographs were assessed by one reader (GRA). For quantitative measurement, the interbone distance at the narrowest point was measured in millimetres using a 0.1 mm graduated magnifying glass laid directly over the radiograph. In addition, joint space narrowing was graded 0–3 using a radiographic atlas.18 Overall severity of OA was graded using the Kellgren and Lawrence grading scale19defined as follows: 0 = normal; 1 = doubtful narrowing of joint space and possible osteophytic lipping; 2 = definite osteophytes and possible narrowing of joint space; 3 = moderate multiple osteophytes, definite narrowing of joint space, some sclerosis, and possible deformity of bone contour; 4 = large osteophytes, marked narrowing of joint space, severe sclerosis, and definite deformity of bone contour.
Four reading sessions were performed one week apart. At each session the reader used a different reading procedure and was unaware of the results of the previous sessions. Radiographs were read as single, paired, or chronologically ordered depending on whether the patient's identity or time sequence, or both, were known or unknown (table 1).
For the first three reading procedures, hip OA progression was assessed using the Kellgren and Lawrence grading scale, joint space narrowing grading scale, and JSW measurement. JSW was measured with landmarks. Landmarks were drawn before the first reading session. To draw landmarks, the two radiographs of each patient were placed side by side on a light box and landmarks were immediately drawn. The landmarks consisted of two points, one on the distal margin of the condylar cortex for the femoral surface and the other on the margin of the bright radiodense band of the subchondral cortex in the floor of the articular fossa for the acetabulum.
For the single reading procedure all 208 radiographs were read as single. Patient's identity and date of radiographs were masked with adhesive tape using a randomisation list.
For the paired reading procedure, radiographs performed at entry and at three years were grouped by patient. Only the date of radiographs was masked with adhesive tape using another randomisation list. Pairs of radiographs of the same patient (n=104) were read side by side in random order, the chronological order being unknown to the reader.
For the chronologically ordered reading procedure, radiographs were grouped by patient and read by pairs chronologically ordered, the chronological order being known to the reader.
After these first three reading sessions, landmarks were erased.
For the single reading procedure without landmarking the 208 radiographs were read blindly as single and JSW measurements were made without previous landmarks.
To compare the reading procedures when assessing hip OA progression by Kellgren and Lawrence and joint space narrowing grading scales, we constructed histograms of changes in grades (values at three years minus values at entry).
The effects of reading procedures when assessing hip OA progression by JSW measurements were assessed using the same methods. Comparisons of paired, chronologically ordered, single reading procedures with landmarks and single reading procedure without landmarks were performed using descriptive statistics. These included means and standard deviations of the differences between the values of JSW measured on the last and the first radiographs of the same patient (values at three years minus values at entry) and histograms showing the progression of JSW.
A principal component analysis was also performed. The object of this analysis was to take the four values of change in JSW measurements (that is, one value for each reading procedure) for each patient and to find combinations of these to produce mutually uncorrelated indices (named principal components). This lack of correlation between principal components then allowed better understanding of the differences emerging from the four reading procedures. Moreover, principal components were ordered so that the first component explained the largest amount of total variability, the second component explained the second largest amount of total variability, and so on. Estimations of intraclass correlation coefficients (ICCs) of pairs of reading procedures were derived in the framework of a two way random effect model. Approximate 95% confidence intervals were estimated by the Fleiss and Shrout result.20
Standardised response mean (SRM)—that is, a responsiveness statistic indicating the magnitude of change (mean difference) in comparison with the standard deviation of change,21 was estimated for chronologically ordered, paired, and single reading procedures with landmarks or single reading procedure without landmarks. As it has been shown that high responsive instruments lead to lower sample size requirements,22 the effects of these reading procedures on sample size requirements when using a paired Student's t test were evaluated by comparing the corresponding SRM.
At baseline, patients had a mean (standard deviation) age of 62.3 (7.2) years, weight 69.9 (11.8) kg, and height 165.5 (9.1) cm.
When assessed with the single reading procedure, the Kellgren and Lawrence grade changed in 44 patients (42%) compared with change in 35 (34%) and 33 (32%) patients when assessed with the chronologically ordered or paired reading procedures, respectively (fig 1). Of note, on the Kellgren and Lawrence scale 14 patients had an improvement with the single reading procedure against only one with the chronologically ordered reading procedure. When assessed with the single reading procedure, the joint space narrowing grade changed in 38 patients (37%) compared with change in 30 (29%) and 24 (23%) patients when assessed with the chronologically ordered or paired reading procedures, respectively. Although an improvement in Kellgren and Lawrence or joint space narrowing grading scales by more than one grade (change towards a lesser grade) was rarely seen (<10%), the frequency of progression differed according to the radiographic feature and reading procedure. Improvement on the Kellgren and Lawrence and joint space grading scale was more often seen with the single reading procedure than with the other reading procedures. Except for two patients, those in whom improvement was seen on the Kellgren and Lawrence grading scale differed from those in whom improvement was seen on the joint space narrowing grading scale. For these two patients, improvement was seen only with the single reading procedure.
Table 2 and fig 2 show JSW progression according to the reading procedure. The progression was less with the single reading procedure without landmarks (−0.47 mm) than with the other reading procedures (at least −0.58 mm for the single with landmarks). Progression standard deviation was also lower with the single reading procedure without landmarks (0.73 mm) than with the other reading procedures (0.85 mm for the single with landmarks).
Figure 3 shows the results of principal component analysis. The two first principal components represented almost 98% of the total variability of JSW progression. The first principal component was responsible for 88.3% of the total variability and corresponded to the progression of JSW. The second principal component contrasted the reading procedures using landmarks with the procedure without landmarks and was responsible for about 10% of the total variability.
Table 3 gives details of the ICCs and their corresponding 95% confidence intervals. There was a high agreement between reading procedures performed with landmarks as ICCs reached 0.96. The agreement between each of the three reading procedures with landmarks and the one without landmarks was less, though the corresponding ICCs remained at about 0.70.
The paired reading procedure of JSW measurements with landmarks had the highest SRM (−0.71), whereas the reading procedure without landmarks had an SRM of −0.65 (table 2). The ratio between estimated sample sizes when comparing two reading procedures actually amounts to the square of the ratio of the corresponding SRM. This ratio equalled 1.10 or 1.06 when comparing the paired reading procedure with landmarks or the chronologically ordered reading procedure with landmarks with the single reading procedure with landmarks, respectively. Therefore, 10% or 6% more patients would be needed when using the single reading procedure with landmarks than when using the paired reading procedure with landmarks or the chronologically ordered reading procedure with landmarks, respectively. When comparing the paired reading procedure with landmarks with the chronologically ordered reading procedure with landmarks, this ratio equalled 1.03. That is, when using the chronologically ordered reading procedure with landmarks, 3% more patients would be needed.
The ratio between estimated sample sizes when using the single reading procedure with and the single reading procedure without landmarks equalled 1.10. That is, 10% more patients would be needed than when using the single reading procedure without landmarks in longitudinal studies.
In this study we assessed the radiographic progression of hip OA on pelvic plain radiographs taken three years apart in 104 patients with hip OA. Single, paired, and chronologically ordered reading procedures resulted in a difference in the number of patients changing grades on Kellgren and Lawrence or joint space narrowing grading scales. The single reading procedure more often showed change on the Kellgren and Lawrence scale (42% of patients) and on the joint space narrowing scale (37% of patients) than each of the other reading procedures. Measurement of JSW progression on single radiographs without landmarks would require 10% more patients than on single radiographs with landmarks in longitudinal studies.
The data reported in this study emphasise the consequences of using insensitive methods for measuring radiological OA progression in terms of sample size and, therefore, of the cost of studies. For example, in longitudinal studies, measurement of JSW progression on single radiographs with landmarks would necessitate 10% more patients than on paired radiographs with landmarks.
Although several recommendations for assessing OA progression exist,1-3 the need for standardised methodology remains a challenge.23 The Kellgren and Lawrence grading scale has been shown to be poorly responsive in knee OA.24 Clearly, the results of this study suggest that this index is also dependent on the reading procedure in assessing hip OA progression. The change in the Kellgren and Lawrence scale was seen more often with the single reading procedure than with the paired reading procedures. Such findings have been reported elsewhere for other progression scores when reading paired radiographs was compared with reading single radiographs.25
Individual features, such as the joint space narrowing scale, have been presumed to be more informative in prospective studies.26Altman et al rated joint space narrowing on pairs of radiographs as most important in determining progression of hip OA by identifying the correct time sequence of radiographs.7 We did not find that the results with the joint space narrowing scale were less influenced by reading procedures than results with the Kellgren and Lawrence scale for assessing hip OA progression. Our results actually show that the change in the joint space narrowing scale is also seen more often with the single reading procedure. Differences in methodologies used in these studies may explain these contradictory conclusions. While Altmanet al used the average score of joint space narrowing rated as a percentage of narrowing by three different readers,7 we, in contrast, used a single reader who graded joint space narrowing using an atlas.18
For the reader there was no advantage in having information on patient identity or even on time sequence for measuring JSW in this study. Perhaps, such information would have been important if there had been a dramatic progression in JSW. However, measuring JSW on paired films would reduce the sample size required compared with the other reading procedures with landmarks. Although the purpose of blindness in clinical studies is, as far as possible, to reduce bias introduced by the investigator's knowledge of a previous measurement for a given subject,27 28 the blindness effect seems smaller with a more objective outcome measure, such as JSW measurements, than with outcome measures requiring observers' judgment, such as Kellgren and Lawrence or joint space narrowing scales.
More interesting is the influence of drawing landmarks when assessing JSW progression. Landmarks have been recommended for measuring JSW in OA clinical trials.3 According to descriptive statistics, reading procedures with landmarks clearly seemed similar in our study. The presence of landmarks accounted for almost 10% of the variability in JSW progression, and the reading procedures with landmarks clearly contrasted with the reading procedure without landmarks. We considered SRM for comparison of the respective effect on sample size requirements because not only the SD of JSW progression but also the mean of JSW progression differed according to the reading procedure. JSW progression was more homogeneous and at the same time the progression was less with the reading procedure without landmarks than with reading procedures with landmarks. It may be that JSW measurements are not performed in the same sites with these respective reading procedures. Therefore, 10% more patients would be needed in longitudinal studies with a single reading procedure without landmarks than with a single reading procedure with landmarks.
A limitation of this study is that we did not evaluate paired and chronologically ordered reading procedures without landmarks for JSW measurements. We considered that evaluating these reading procedures without landmarks would have a limited interest and would give results close to those with landmarks. Nor were differences between repeated readings of each procedure by the same reader assessed. For reading procedures using landmarks, these differences may be expected to be almost the same if these reading procedures are compared by pairs.
Although other studies are needed to confirm these findings, the paired reading procedure with landmarks seems to be the most appropriate procedure and should be recommended for measuring JSW progression in hip OA clinical studies.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.