Objective: Quantitative MRI (qMRI) of cartilage morphology is a promising tool for disease-modifying osteoarthritis drug (DMOAD) development. Recent studies at single sites have indicated that measurements at 3.0 Tesla (T) are more reproducible (precise) than those at 1.5 T. Precision errors and stability in multicentre studies with imaging equipment from various vendors have, however, not yet been evaluated.
Methods: A total of 158 female participants (97 Kellgren and Lawrence grade (KLG) 0, 31 KLG 2 and 30 KLG 3) were imaged at 7 clinical centres using Siemens Magnetom Trio and GE Signa Excite magnets. Double oblique coronal acquisitions were obtained at baseline and at 3 months, using water excitation spoiled gradient echo sequences (1.0×0.31×0.31 mm3 resolution). Segmentation of femorotibial cartilage morphology was performed using proprietary software (Chondrometrics GmbH, Ainring, Germany).
Results: The precision error (root mean square coefficient of variation (RMS CV)%) for cartilage thickness/volume measurements ranged from 2.1%/2.4% (medial tibia) to 2.9%/3.3% (lateral weight-bearing femoral condyle) across all participants. No significant differences in precision errors were observed between KLGs, imaging sites, or scanner manufacturers/types. Mean differences between baseline and 3 months ranged from <0.1% (non-significant) in the medial to 0.94% (p<0.01) in the lateral femorotibial compartment, and were 0.33% (p<0.02) for the total femorotibial subchondral bone area.
Conclusions: qMRI performed at 3.0 T provides highly reproducible measurements of cartilage morphology in multicentre clinical trials with equipment from different vendors. The technology thus appears sufficiently robust to be recommended for large-scale multicentre trials.
Statistics from Altmetric.com
Quantitative magnetic resonance imaging (qMRI) of cartilage morphology (cartilage volume, thickness, surface areas, etc.) represents a powerful tool in cartilage and osteoarthritis (OA) research, and shows great promise for evaluating the treatment response of structure/disease-modifying OA drugs (S/DMOADs).1 2 Radiography, the currently accepted method for assessment of structural change of joints and cartilage thinning by regulatory agencies, has important limitations. These include potential positioning errors, limited precision, susceptibility of joint space narrowing to meniscal extrusion,3 4 joint laxity, a small dynamic range of the measurement (floor and ceiling effects), and others.5 Recently, it has been shown that at a field strength of 3.0 Tesla (T) precision errors of quantitative measurement of cartilage morphology are smaller than at 1.5T, if a slice thickness of 1.0 mm is selected.6 3.0T measurements were found to be consistent with 1.5T measurements that have been previously validated vs external standards, for example in patients with total knee arthroplasty.7 8 Precision errors for measurement of cartilage morphology have been reported in numerous studies at 1.5 T,2 but these have generally been performed at single sites, most commonly with test/retest exams being acquired in one imaging session on the same day.
The objective of this study was to assess precision errors and stability of femorotibial cartilage morphometry in participants without radiographic OA (Kellgren–Lawrence grade (KLG) = 0), in which no change with time was expected over 3 months, under conditions of a multicentre clinical trial, at 3.0 T. Equipment from two vendors and three scanner types were used, and the dependence of the precision errors on the scanner manufacturer and scanner type was analysed.
The second objective was to test whether a significant change in cartilage morphology could be observed in participants with definite radiographic femorotibial OA (KLG 2 and 3) over a short period of 3 months. If this was not the case, the alternative objective was to report the precision errors of the measurements also in this subsample with radiographic OA and to compare these with the non-OA participants. Repeat scans were acquired with a 3-month interval, to assess the stability of the measurements under clinical trial conditions and to achieve a reasonable compromise between assessing long-term measurement variability without (or with minimal) intrusion of systematic disease related changes.
A total of 180 female participants, aged ⩾40 years, were recruited at 7 clinical centres through advertisements in the hospitals, adjacent clinics and print media, or through patient lists of consenting doctors. To recruit healthy volunteers, the patients were asked if they had a friend of similar age without knee OA or complaints who might also be willing to be screened. At some sites, non-OA participants were recruited from previous studies that included patients with OA and unaffected controls. In all, 22 participants were not included in the analysis, 13 because they withdrew from the study after the baseline acquisition, 2 because they skipped the 3-month visit, 1 because there was a protocol violation, 2 because motion was apparent in the 3-month acquisitions and 4 because they had a KLG 1 score in the adjudicated x ray reading (see below). Eventually 158 subjects (between 15 and 30 at each of the 7 sites; table 1) were analysed.
Conventional weight-bearing extended anterior–posterior knee radiographs were obtained at each site to establish the KLG status of the knees. Inclusion criteria for OA participants were frequent symptoms, mild to moderate radiographic OA (KLG 2 or 3) in the medial compartment, a body mass index (BMI) of ⩾30 and a medial tibiofemoral joint space width of ⩾2 mm in a PA modified Lyon–Schuss view.9 In patients who had bilateral radiographic knee OA the study knee was defined as the more symptomatic knee. If pain scores were identical in both OA knees, the knee with more advanced radiographic changes was selected. If pain scores and radiographic severity of OA in the two knees were identical, the knee in the dominant leg was chosen as the study knee. Healthy control participants had to show a complete absence of knee symptoms, no evidence of radiographic knee OA (KLG 0) and a BMI of ⩽28. In these participants, the knee of the dominant leg was chosen as the study knee. An experienced central reader who was blinded to the KLG assigned at the clinical centres re-read each radiograph for standardisation of the KLG status after the enrolment process was completed. If the grade assigned at the clinical centre differed from that of the central reader, a third reader adjudicated the difference. The intrareader reproducibility (determined using 30 radiographs exhibiting KLG from 0–3) showed an intraclass correlation coefficient of 0.91 and a κ of 0.66. After adjudicated central reading, 97 participants were found to display KLG 0, 31 KLG 2 and 30 KLG 3. The 97 healthy participants with KLG 0 had an age of 56.1 (8.7) years, a body height of 165.5 (6.9) cm, a body weight of 68.0 (13.6) kg and a BMI of 24.8 (4.5). The 62 participants who displayed mild to moderate radiographic medial femorotibial OA (KLG 2 to 3) had an age of 57.6 (8.3) years, a body height of 163.1 (6.8) cm, a body weight of 98.0 (16.1) kg and a BMI of 36.8 (5.3). All participants were permitted to receive standard of care medications for pain (acetaminophen, non-steroidal anti-inflammatory drugs (NSAIDs), cyclo-oxygenase (COX)-2-selective inhibitors) and corticosteroids, but no intra-articular injections of corticosteroids and hyaluronic acid in the study knee, or pharmacological therapy suspected to alter the rate of OA progression (ie, doxycycline). Subjects receiving glucosamine or chondroitin were allowed to participate when having been on stable therapy for the last 3 months. This was the case for 13 of the 61 OA participants (6 KLG 2; 7 KLG 3) and for 13 of the 97 non-OA participants.
The study was conducted in compliance with the ethical principles derived from the Declaration of Helsinki and in compliance with local Institutional Review Board, informed consent regulations and International Conference on Harmonization Good Clinical Practices Guidelines.
Three of the seven sites used Siemens Magnetom Trio magnets (Siemens AG, Erlangen, Germany), two Signa Excite/Genesis Signa MRI long bore magnets and two GE Signa Excite/Genesis Signa short bore magnets (GE Healthcare Technologies, Waukesha, Wisconsin, USA). Birdcage CP coils (Transmit/Receive) with a “split top” design were manufactured specifically for the project (Clinical MR Solutions Brookfield, Wisconsin, USA) and were used at all seven imaging sites. Double oblique coronal spoiled gradient recalled acquisition at steady state (SPGR) sequences with selective water excitation (we) were acquired (fig 1), as described previously.6 Images were collected with 16 to 17 ms repetition time (TR), 7 to 8 ms echo time (TE), 12° flip angle (α), 160 mm field of view (FOV); 512×512 matrix; 120 partitions, 1 mm partition thickness, 0.31 mm×0.31 mm inplane resolution, 100% phase and slice resolution, 1 average, 130 Hz pixel bandwidth, elliptical filter on, asymmetric echo off, at an acquisition time of 8:44 to 13:01 min. A second scan was acquired 3 months later, with parameters identical to the baseline acquisition. The knees were not systematically unloaded (rested) before acquisitions of either the baseline or the 3-month follow-up scan. Whether the participants were seated or not prior to their scanning depended on their arrival time at the unit and the actual acquisition time; all participants walked to the scanner for their knee to be imaged.
A phantom (Data Spectrum Corp., Chapel Hill, North Carolina, USA) was scanned at each patient visit, to evaluate the accuracy of the FOV in three dimensions, spatial linearity and the stability of the MR scanner in terms of geometric measures over time. The phantom was filled with a solution of saline and Magnevist (Schering AG, Berlin, Germany), and with a small amount of surfactant (to avoid air bubbles) and sodium acid (to retard bacterial growth). The phantom contained 280 holes in the x and y directions and 156 in the z direction (each 3 mm in diameter and spaced at 10 mm, centre to centre) and a total of 16 spheres (22 mm in diameter, filled with defined T1 and T2 standards). All MRI images and phantom data were sent to the Duke Image Analysis Laboratory (DIAL; Durham, North Carolina, USA) for initial quality control. On the phantom data, the holes and spheres were sampled by pattern using a computer program that fits the data to the known spacing and determines potential FOV errors for each gradient axis, in order to ensure geometric accuracy and stability of the measurements.
The MRI data of the participants were then sent to the image analysis centre (Chondrometrics GmbH, Ainring, Germany), where they were processed using proprietary software. Segmentation of the femorotibial cartilages was performed by seven technicians with formal training and thorough experience in cartilage segmentation. Images were read in pairs, but with blinding to the time point of the acquisition. Segmentation involved manual tracing of the total subchondral bone area (tAB) and the cartilaginous joint surface area (AC) of the medial tibia (MT), the lateral tibia (LT), the central (weight bearing) medial femoral condyle (cMF) and the central lateral femoral condyle (cLF).10 Femoral cartilages were analysed in a region of interest between the intercondylar notch and 60% of the distance to the posterior end of the femoral condyles in the coronal views.11 Quality control of all segmentations was performed by a single person (FE), reviewing all segmented slices of each data set.6 11 Additionally, automatic QC procedures were used to exclude mislabelling of medial vs lateral cartilage plates, tibial vs femoral cartilage plates and AC vs tAB contours, the software checking the distance vectors between different plates/contours and a fibular marking. The segmentations were used to compute the tAB, the AC, the cartilage volume (VC) and the mean cartilage thickness over the total area of subchondral bone (ThCtAB).6 10 11
To determine the variability of the measurements at baseline and 3 months, for each cartilage morphology parameter the SD between visits averaged across all subjects was estimated for the different KLG groups, the different imaging sites, and divided by the mean value of the participants (RMS CV%). To determine the random and systematic component of the variability, the SD of the differences (between baseline and 3 months) and the mean differences were computed additionally. Mixed effects models were used to assess the impact of OA status (KLG), knee compartment and imaging site on differences between baseline and 3-month measurements. Clinical site and subject were nested random effects in the model. Each combination of knee compartment and KLG was allowed to have its own error variance. Correlations between measurements of knee compartments in individual patients were assumed to be non-zero and estimated using restricted maximum likelihood estimation. To address whether mean differences between baseline and 3-month measurements were statistically significant, paired Student t tests were applied to the total cohort and to those subgroups found significantly different based on the mixed effect models, to avoid excessive parallel testing.
No statistically significant change in ThCtAB or VC was observed in the KLG 2 and 3 participants over the 3-month period (tables 1–3).
The precision error (RMS CV%) for measurements of ThCtAB ranged from 2.1% in MT and LT to 3.0% in cLF (table 1). The average precision error across the four femorotibial cartilage plates was 2.4% for all participants (n = 158), 2.3% in KLG 0, 2.1% in KLG 2 and 3.0% in KLG 3 participants. There was little difference in the measurement variability between the imaging sites (table 1), the average across the four cartilage plates varying between 2.0% and 2.6%. The tAB precision error (table 2) was 1.2% across all participants and cartilage plates, and varied from 1.1% in KLG 0 to 1.4% in KLG 3 participants (table 2). Again there were little differences between the sites (0.9% to 1.3%). AC errors were somewhat higher than tAB errors (1.4% on average) and VC errors somewhat higher than ThCtAB errors (2.8% on average) (table 3).
The SD of differences (tables 1–3) varied significantly between cartilage plates and KL scores, as observed in the mixed effects models. Mean differences in ThCtAB (3 month vs baseline) were +0.13% in MT, +0.09% in cMF, +0.58% in LT and +0.98% in cLF (table 1). Laterally, the increase in ThCtAB was significantly higher than medially (p<0.01; mixed effects model) and was significantly different from zero (p<0.01; t test). By contrast, the mean difference in ThCtAB between the first and second measurement was not significant in MT and cMF (table 1). The mixed effect models found no significant differences (in mean differences) between KLGs and, between the tibial and femoral plates.
The tAB increased significantly in all cartilage plates (0.42% across the 158 participants), values ranging from 0.27% in cMF and cLF to 0.32% in MT (table 2). The mean difference was significantly different from zero (p<0.02; t test) for tAB and AC (tables 2 and 3). No significant differences (in the magnitude of the mean tAB or AC differences) were observed between KLGs, the tibia and femur, or between the medial and lateral compartment in the mixed effect models. No significant differences in the AC changes and borderline significant differences in the tAB changes were observed between imaging sites. VC increased significantly in the lateral, but not in the medial femorotibial compartment (table 3). As for ThCtAB, there was no significant difference (in mean differences) between KLGs and between the tibia and femur.
The phantom measurements did not indicate any systematic drift at any of the imaging sites in x, y or z directions of the scanners, or for measurements made at 45° angles to these directions.
In this study we have investigated, for the first time, the precision error for measurements of cartilage morphology at 3.0T under conditions of a multicentre clinical trial, and with three different scanners from two vendors. The precision errors reported here are approximately 0.5% higher than those in our previous single site trial at 3.0 T,6 but it must be borne in mind that in the current study, repeat scans were acquired at a 3-month interval and not on the same day.6 The precision errors reported here compare favourably to precision errors reported at 1.5 T at single sites,2 despite the longer interval between repeat acquisitions. Two previous studies12 13 reported that cartilage volume and thickness measurements on at 1.5 and 3.0T scanners from different manufacturers were comparable. The current study extends these findings in showing that in a multicentre clinical trial precision errors are similar between different scanner manufacturers as well as across participants with KLG 0, 2 or 3. Although statistical testing failed to show significant differences in the precision errors between KLGs, the RMS CV% values tended to be larger for KLG 3 than for KLG 0 and 2 participants. Since the KLG 3 participants displayed joint space narrowing in radiography, it is reasonable to assume that these participants displayed more advanced cartilage and other structural damage, which renders segmentation somewhat more difficult.
Over the 3-month period, a significant increase in cartilage thickness was observed in the lateral (LT and cLF), but not in the medial femorotibial compartment (MT and cMF). Since the phantom measurements did not indicate any drift at any of the imaging sites and since the finding was not consistent between the medial and lateral compartment, it is difficult to explain these observations. Increased matrix production (hypertrophy) or swelling of the cartilage in the lateral compartment are potential explanations, swelling, for example, having been observed in “early” disease in animal models.14–16 However, the increase was also observed in healthy participants and was not observed in the medial compartment. Potentially these findings may be a statistical artefact, as a p<0.05 test result allows for a 5% chance of a false conclusion that a real change occurred, although it did not.
The increase in tAB was small in magnitude (0.32%), but statistically significant and consistent across all cartilage plates and KLGs. The observed difference translates into an annual increase of approximately 1.2% assuming a linear change over time. A cross-sectional17 and a longitudinal study18 described an increase in tibial bone area in patients with OA, the latter reporting mean (SD) increases of 2.2 (6.9)% and 1.5 (4.3)% per annum in MT and LT, respectively. A later study by the same authors19 also found an increase of tibial bone area also in healthy women, with a rate of change (1.2% per annum in the medial and 0.8% in the lateral tibia) very similar to the increase in tAB observed in our current study. Age-related expansion of bone cross-sectional area has also been observed at other skeletal locations, in particular at the femoral neck,20–22 and has been found to be larger in women than in men. The current results suggest that, due to the low precision errors involved in qMRI, significant changes of tAB can be measured over periods as short as 3 months. The increase in tAB with aging may be a confounder when measuring changes of cartilage volume in OA, as decreases in cartilage thickness and increases in tAB may offset each other. For this reason, measuring changes in cartilage thickness (ThtAB) rather than volume (VC) may be more efficient when monitoring cartilage loss in OA. This is also supported by the observation in the current study that precision errors are somewhat lower for ThCtAB than for VC.
In conclusion, our findings suggest that 3.0 T qMRI provides highly reproducible and stable measurements of cartilage morphology in multicentre clinical trials with equipment from different vendors, if appropriate image analysis strategies of central data processing and quality control are applied. The technology thus appears sufficiently robust to be recommended for large-scale multicentre trials that aim at observing structural changes of cartilage in OA and at evaluating the treatment response of potential DMOADs.
We are grateful to the dedicated group of study coordinators whose skills were essential in assuring the successful conduct of this study: Manal Al-Suqi, Emily Brown, Janie Burchett, Sandra Chapman, Wandra Davis, Eugene Dunkle, Susan Federmann, Kristen Fredley, Donna Gilmore, Joyce Goggins, Sasha Goldberg, Robert P. Marquis, Thelma Munoz, Bruce Niles, Norine Hall, Scott Squires and Kim Tally. We would also like to express our thanks to the dedicated MRI technologists, the Duke Image Analysis Laboratory staff: Maureen Ainslie, April Davis, Allison Fowlkes, Mark Ward and Scott White, the Pfizer A9001140 Team: Lydia Brunstetter, Peggy Coyle, Yevgenia Davidoff, Charles Packard, Ann Remmers, Mark Tengowski, Jeff Evelhoch (now Amgen, Thousand Oaks, California, USA) and John Kotyk (now Washington University, St Louis, Missouri, USA) and the Chondrometrics GmbH readers: Gudrun Goldmann, Linda Jakobi, Manuela Kunz, Dr Susanne Maschek, Sabine Mühlsimer, Annette Thebis and Dr Barbara Wehr for dedicated data segmentation. Kenneth Brandt is to be thanked for adjudicating the radiographic readings.
Competing interests: FE is CEO of Chondrometrics GmbH, a company providing MRI analysis services. FE provides consulting services to Pfizer, MerckSerono, AstraZeneca and Wyeth. RJB is employed by Pfizer Inc. DB receives grant support from Pfizer, Stryker, Gelita and Genzyme. HCC receives grant support from Pfizer. JC receives grant support from Pfizer. MH has a part-time appointment with Chondrometrics GmbH. DJH receives grant support from Pfizer, Merck and DonJoy. GH receives grant support from Pfizer. CJ receives research grants from Pfizer. VBK receives research grants from Pfizer. TML receives research grants from Pfizer, GlaxoSmithKline and Merck. SMaj receives research grants from Pfizer. SMaz receives grant support from, and provides consulting services to, Pfizer Inc. PVP receives research grants from Pfizer. TJS receives research grants from Pfizer. MST receives research grants from Pfizer. AV receives research grants from Pfizer. BW is employed by Pfizer Inc. M-PHLeG is employed by Pfizer Inc.
Funding: Funding was provided by Pfizer Inc.
Ethics approval: The study was conducted in compliance with the ethical principles derived from the Declaration of Helsinki and in compliance with local Institutional Review Board, informed consent regulations and International Conference on Harmonization Good Clinical Practices Guidelines.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.