Background Several large epidemiologic osteoarthritis studies including magnetic resonance imaging (MRI) are currently ongoing. A large proportion of these MRI datasets is being assessed in semiquantitative fashion by teams of expert radiologist readers using validated scoring instruments. For meaningful data interpretation it is paramount to ensure both cross-sectional and longitudinal reliability between all readers. While cross-sectional reliability results between two trained and calibrated readers have been presented for all MRI scoring systems, data on longitudinal reliability of detection of change over time and agreement among more than two readers has not been presented to date.
Objectives The aim of this study was to determine reliability among four different readers in cross-sectional and longitudinal fashion in the MOST study.
Methods The Multicenter Osteoarthritis (MOST) study is a longitudinal cohort study of subjects with or at high risk of knee OA. MRI was performed at a 1.0 T extremity system using axial and sagittal proton-density weighted sequences and a coronal STIR sequence. 10 randomly selected subjects were included in substudy A that had 60 months and 84 months MRIs available. Another 10 participants were included is substudy B that had baseline, 60 and 84 months MRIs. Cases were selected to represent a spectrum of disease severity and longitudinal change. MRIs were read by four radiologists separately with the chronological sequence known to the readers. MRIs were assessed semiquantitatively using a modified WORMS system. For substudy B, readers were aware of the baseline images and scores, which they could change when needed.
Assessed were cartilage, osteophytes, bone marrow lesions, subchondral cysts, bone attrition, meniscus damage, meniscal extrusion, Hoffa-synovitis, effusion-synovitis, cruciate and collateral ligaments, popliteal cysts, tibio-fibular cysts, loose intra-articular bodies and anserine and pre-patellar bursitis. Weighted kappa statistics were applied to determine reliability between readers.
Results Subjects were on average 65.4 years old (SD ±7.4) with 12 (60%) women and mean BMI of 29.8 (SD ±5.0). Two, 7, 6 and 5 knees had baseline Kellgren-Lawrence grades of 0, 1, 2 and 3 respectively. For substudy A, the ranges for inter-reader weighted kappas for cross-sectional and longitudinal reliability, respectively, were 0.77 to 0.87 and 0.62 to 0.78 for cartilage, 0.80 to 0.89 and 0.75 to 0.88 for BMLs, 0.92 to 0.96 and 0.75 to 0.92 for meniscal tears, and 0.47 to 0.80 and 0.43 to 0.76 for osteophytes (Table 1). Results for substudy B were similar (Table 2).
Conclusions Semiquantitative OA assessment on MRI shows good reliability for up to four trained and calibrated readers. Cross-sectional reliability seems to be slightly superior compared to scoring of change. Reliability did not differ for readings of three time points with baseline known to the readers or for two time points without knowledge of baseline scores, although direct comparability was not possible due to the different reading design.
Disclosure of Interest : F. Roemer Shareholder of: Boston Imaging Core Lab, LLC., M. Nevitt: None declared, D. Felson: None declared, M. Crema Shareholder of: Boston Imaging Core Lab, LLC., M. Marra Shareholder of: Boston Imaging Core Lab, LLC., J. Niu: None declared, J. Lynch: None declared, I. Tolstykh: None declared, C. Lewis: None declared, J. Torner: None declared, A. Guermazi Shareholder of: Boston Imaging Core Lab, LLC., Consultant for: Astra Zeneca, Genzyme, Novartis, Stryker, Merck Serono