Objectives MRI scoring systems for hand osteoarthritis (HOA) are currently not available. The present work proposes the Oslo HOA MRI (OHOA-MRI) score and examines the intrareader and inter-reader reliability.
Methods Relevant HOA features were included in the initial version of the OHOA-MRI score after literature review and informal group discussions. After a training session and two calibration exercises (with three readers), features with low reliability and/or low prevalence were excluded, and feature definitions/gradings were improved. In the reliability exercise 3 readers independently evaluated MRI scans of distal interphalangeal (DIP) and proximal interphalangeal (PIP) joints in 10 patients with HOA according to the final proposed score. The reading was repeated after 1 week. Intraclass correlation coefficients (ICCs), percentage exact agreement/percentage close agreement (PEA/PCA) and smallest detectable difference were calculated.
Results The final proposed OHOA-MRI score includes assessment of synovitis, flexor tenosynovitis, erosions, osteophytes (OPs), joint space narrowing (JSN) and bone marrow lesions (BMLs) on a 0–3 scale, and absence/presence of cysts, malalignment (frontal/sagittal plane), collateral ligaments (CLs) and BMLs at CL insertion sites. Inter-reader reliability was very good for synovitis, erosions, OPs, JSN, malalignment (frontal) and BMLs (ICCs ≥0.83, PCA ≥89%), and good for flexor tenosynovitis (ICC 0.64, PCA 80%) and CL presence (ICC 0.79, PEA 63%). Cysts, malalignment (sagittal) and BMLs at CL insertion sites showed high PEA (≥85%), but poor to moderate ICCs (0.00–0.59). Intrareader reliability was similar. The reliability was generally highest in PIP joints.
Conclusions The proposed OHOA-MRI score could reliably assess HOA features. However, further validation is needed.
Statistics from Altmetric.com
Hand osteoarthritis (HOA) has traditionally been considered as a non-inflammatory disease showing a distinctive set of radiographic features, including osteophytes (OPs), joint space narrowing (JSN), sclerosis and central erosions.1 2 However, the associations between radiographic features and clinical symptoms are weak to moderate,3 which may indicate that other features than those seen by conventional radiographs (CRs) contribute to pain and physical limitations. Osteoarthritis (OA) is indeed increasingly recognised to involve the whole joint, including articular cartilage, subchondral bone, synovium, capsule, ligaments and menisci if present.4 Consequently, MRI has a unique advantage with three-dimensional demonstration of all joint components.
MRI has gained acceptance as an outcome measure in knee OA and in inflammatory joint diseases. Reliable semiquantitative scoring systems have been developed for outcome assessment in clinical trials of knee OA5,–,7 as well as inflammatory joint diseases such as rheumatoid arthritis (RA)8 and peripheral psoriatic arthritis (PsA).9 The RA MRI score (RAMRIS) and the PsA MRI score (PsAMRIS) were both developed through Outcome Measures in Rheumatology (OMERACT) multistep consensus processes.
In HOA, CR is still the method of choice, but more sensitive imaging techniques such as ultrasound (US) and MRI have recently been introduced also for this disease. The literature concerning pathological MRI features is still sparse for HOA,10,–,13 and studies have been performed without standardised methods. A standardised scoring system, which incorporates important HOA features, could be a valuable tool for increased understanding of HOA and assessment of the burden of disease.
The aim of this study was to propose definitions and grading of features for the Oslo HOA (OHOA)-MRI score, and determine the intrareader and inter-reader reliability of the proposed score.
Materials and methods
Development of the initial OHOA-MRI score
The first step in the development of the OHOA-MRI score included selection of pathological features based on literature review and informal group discussions. We included structural key HOA features such as OPs, JSN, erosions, cysts and malalignment, which are traditionally assessed on CR (eventually with use of the Osteoarthritis Research Society International (OARSI) atlas).1 In analogue to PsAMRIS, we included inflammatory features such as synovitis, flexor tenosynovitis and bone marrow oedema in addition to structural features such as erosions and OPs. We added extensor tendinitis, as this feature was included in the initial exercises of PsAMRIS. Based on recent studies by Tan et al we also included assessment of collateral ligament (CL) pathology,11,–,13 such as presence/abscence, non-thickened/thickened and non-inflamed/inflamed.
We defined and graded synovitis, flexor tenosynovitis, bone marrow oedema (0–3 scales) and erosions (0–10 scale) in analogue to the PsAMRIS. ‘Bone marrow oedema’ was changed to ‘bone marrow lesion’ (BML) due to an assumption of different pathological content in OA than in RA/PsA.14 We defined and graded OPs (0–3 scale), JSN (0–3 scale) and malalignment (absence/presence) in analogue to the OARSI atlas. The definition of cyst was similar to the definition of erosion (derived from PsAMRIS), and graded as absent/present as in the OARSI atlas. The definitions of CL absence, thickening and inflammation were descriptive, and the features were graded as absent/present.
In analogue to PsAMRIS, synovitis and tendinitis/tenosynovitis were scored on joint level and the remaining features (except malalignment and CL absence) in the distal and proximal part of the joint separately. The assessed area extended from the articular surface to a depth of 1 cm for erosions, cysts and OPs, and to the middle of the phalanx for BMLs. Further, we scored CL pathology in the radial and ulnar part of the joint, and malalignment in the frontal and sagittal plane.
Reading exercises and adjustment of the OHOA-MRI score
The next step included practical testing of the score. A short demonstration of MRI features followed by a training session was arranged. In the training session, five randomly selected patients were scored by three readers (IKH, SL, PB) separately according to the initial version of the score followed by a common demonstration and discussion of discrepancies between the readers.
Two exercises were then performed to test the score and calibrate the readers. Each exercise included 10 patients from the Oslo HOA cohort with a variety of radiographic severity.15,16 All patients were examined by a rheumatologist at inclusion, and those with clinical or radiographic findings suspect of inflammatory arthritis were excluded. The inter-reader reliability was assessed after each exercise, and additional training was performed for features with low reliability. Features such as CL thickening and inflammation and extensor tendinitis were excluded due to no/infrequent appearance and/or low reliability. The metacarpophalangeal (MCP) joints were excluded from the score after the first exercise, due to incomplete/varying coverage by the field of view. Grading of erosions was changed from a 0–10 (based on volumetric bone loss as in PsAMRIS) to a 0–3 scale (based on size and number of erosions), as the original scale was not able to capture the severity of erosions (bone loss <10% in most cases).
Subsequently, we performed the intrareader and inter-reader reliability exercise, from which we present the results in this paper. We used MRIs from another sample of 10 patients with HOA (9 women, mean (SD) age 69.5 (6.1) years), that is, 2 patients from each quintile of radiographic severity (estimated by summed Kellgren & Lawrence score of the interphalangeal joints in the dominant hand). All patients fulfilled the American College of Rheumatology criteria for HOA.17 The 3 readers scored the 10 MRIs over a period of 2–3 days, and repeated the scoring after 1 week. A preliminary atlas was used in the reliability exercise, and we added further example images to the atlas after the exercise (online supplementary figure 1).
MRI of the hand
MRI was performed as part of the follow-up examination of the Oslo HOA cohort.15 16 The MRI sequences were chosen in close collaboration with a MRI technician and musculoskeletal radiologist. The second to fifth distal interphalangeal (DIP) and proximal interphalangeal (PIP) joints of the dominant hand were examined using a high-field extremity 1.0 T MRI unit (ONI, GE Healthcare, Waukesha, Wisconsin, USA). During the examination, the patients rested in a comfortable chair with the hand resting in a cylindrical coil (diameter 10 cm). The hand was fixed to a plate and the space around the plate and hand was filled with rubber sponge to ensure extended fingers and reduce motion artefacts. Coronal, sagittal and axial T1-weighted (T1w) fat-suppressed (fs) pre/postintravenous gadolinium (Gd) (0.1 mmol Gd/kg body weight; Magnevist, Bayer Schering Pharma AG, Leverkusen, Germany) images were acquired from a three-dimensional dual-echo Dixon technique18 (repetition time (TR) 20 ms, echo time (TE) 5 ms, 1 mm slice thickness with overlap), in addition to coronal and axial short T1 inversion recovery (STIR) images (TR 2850 and 3150 ms, TE 16.3 and 21 ms, 2 and 3 mm slice thickness, respectively). Total acquisition time was approximately 30 min.
The readers had different experience. SL and PB were familiar with PsAMRIS and/or RAMRIS, and had participated in OMERACT exercises. IKH had experience in reading radiographs in HOA, but no experience with MRI.
All readers evaluated the images independently on large screens (24–27 inches). The training session, calibration exercises and first part of the reliability exercise were performed unaware of clinical data (not anonymous). Prior to the second part of the reliability exercise, the images were anonymised and recoded with rearrangement in a different order. We used PACS Sectra (IDS5; SECTRA, Linköping, Sweden) and OsiriX (OsiriX, Geneva, Switzerland) software, of which the latter was used in the second part of the reliability exercise only.
Each MRI feature was analysed stratified for joint groups and as aggregated scores (ie, DIP and PIP). We calculated the mean (minimum/maximum) scores for each feature with all readers combined. Reliability was assessed by three statistical methods: intraclass correlation coefficient (ICC), percentage close/exact agreement (PCA/PEA) and smallest detectable difference (SDD).
Single and average measure ICCs (SmICCs and AvmICCs) were calculated using two-way mixed effect models. ICCs were expressed as the mean (95% CI) for inter-reader reliability (AvmICC) and as the median (minimum/maximum) of the three readers for intrareader reliability (SmICC). Interpretation of ICC was 0–0.20: poor, 0.21–0.40: fair, 0.41–0.60: moderate, 0.61–0.80: good, 0.81–1.00: very good agreement (similar to κ).19
PEA was calculated as the percentage of occasions of which the scoring value was identical between all readers (ie, inter-reader) or between the first and second reading (ie, intrareader), and PEA=100% is perfect agreement. PCA was similarly calculated as the percentage of occasions of which the difference was ≤1 (not applicable for features scored as absent/present), and should approach 100%.
The calculation of SDD was based on the Bland and Altman's 95% limits of agreement method. Intrareader SDD was calculated as 1.96 multiplied with the SD of the mean difference between the two status scores,20 and inter-reader SDD as the pooled within-subject standard error of measurement (SEM) multiplied with (1.96*√2) and then divided by the square root of the number of readers (√3).21 22 The SDD represents the smallest difference that can be discriminated from the measurement error, and SDD=0 is perfect agreement. However, there is no convention regarding any upper limit.
The data collection was approved by the regional ethics committee and the Data Inspectorate. All patients signed informed consent.
The proposed OHOA-MRI score
Table 1 provides the definitions and scaling of MRI features. Some features are shown in figure 1. The final atlas includes examples of each grade of pathology for all features, and is provided in the supplementary material together with the scoring sheet.
The reliability exercise
There was a large range in severity for most MRI features among the 10 patients (table 2). Most features were present in all patients, except for cysts, sagittal malalignment and BMLs at the CL insertion sites, which were not present in five, nine and one patient(s), respectively.
Table 3 provides the inter-reader reliability measures for the DIP and PIP joints combined. Synovitis, erosions, OPs, JSN, frontal malalignment and BMLs had very good AvmICCs, high PCA (PEA for malalignment) and acceptable SDD values. The inter-reader reliability was slightly lower for flexor tenosynovitis and CL absence. Cysts, sagittal malalignment and BMLs at CL insertion sites were infrequent features with high PEA, but poor-to-moderate ICCs. The SDD values were acceptable for cysts and sagittal malalignment. The inter-reader reliability was similar in the second part of the exercise.
The intrareader reliability for the DIP and PIP joints combined was similar to the inter-reader reliability (table 4). The intrareader SDD values were generally higher than the inter-reader values.
The inter-reader and intrareader ICC values were generally higher in the PIP joints than in the DIP joints. The inter-reader ICCs for BMLs in the phalanx and CL insertion sites were poor to fair in the DIP joints, and good to very good in the PIP joints (supplementary table 1). Similarly, the intrareader ICCs for BMLs in the phalanx and CL insertion sites, synovitis and flexor tenosynovitis were poor to moderate in the DIP joints and good to very good in the PIP joints (supplementary table 2).
This study is the first to propose a MRI score in HOA. The development of the score was a multistep process inspired by previous work by OMERACT. In a multireader exercise we found that HOA key features in the proposed MRI score had good to very good reliability.
OA has traditionally been described as a non-inflammatory disease, and inflammatory/erosive HOA has been considered as a subset of HOA.23 However, MRI of knee OA has provided important insights into the role of synovitis in the symptomatology and progression of disease.24 25 US studies have similarly suggested that inflammation is common in HOA,26 27 and grey scale synovitis and vascularisation (ie, power Doppler signal) are included in a preliminary US scoring system for HOA.28 MRI has the benefit of being less operator dependent and provides a three-dimensional demonstration of the joint, which is less affected by overlying OPs than US. In the proposed MRI score we defined synovitis based on Gd enhancement, suggesting an active inflammatory process. We found synovitis in varying degree in all patients, and the exercise indicated that a reliable assessment of synovitis is possible. Assessment of the PIP joints was more reliable than the DIP joints, which was probably due to the smaller size of DIP joints with lower potential synovial volume and therefore less distinction between categories.
A recent MRI study reported that erosions, and in particular marginal erosions, were more common than previously indicated by CR, which may question whether the distinction between erosive and non-erosive HOA is artificial and possibly incorrect.10 Although erosions also were frequent in our study, the volumetric bone loss was in most cases less than 10% (grade 1 in PsAMRIS). Hence the 0–10 scale derived from PsAMRIS, which was used in the first exercises, was not able to adequately distinguish the severity of bone damage. In addition, the MRI definition of erosions did not capture subchondral bone collapse. In light of these limitations, we adjusted the definition and scaling of erosions (0–3 scale), which provided very high reliability and seemed feasible in HOA. Whether subchondral collapse is pathologically different from marginal erosions and should be assessed as a separate feature needs to be further investigated.
BMLs have been associated with pain and disease progression or joint destruction in knee OA and RA,29,–,32 but the role in HOA symptomatology and pathogenesis is unclear. Recent high-resolution MRI studies demonstrated CL pathology and BMLs at CL insertion sites in addition to the usually recognised subchondral BMLs.11,–,13 With images from a conventional MRI extremity scanner we were able to reliably assess the absence/presence of CLs, while inflammation and thickening of CLs were excluded during the calibration exercises due to difficulties in evaluation, infrequent appearance and low inter-reader reliability (data not shown). We were able to reliably assess the BMLs in the subchondral bone and at the CL insertion sites in the PIP joints. The anatomically smaller DIP joints are more prone to partial volume effects, which may be misinterpreted as BMLs. The relatively thick STIR slices may further have complicated the assessment of BMLs at CL insertion sites, especially in the DIP joints, which may explain the poor reliability.
Intra-articular endochondral ossifications at the cartilage margins represent true OPs,33 and are considered as one of the radiographic hallmarks of HOA. Other types of OPs such as extra-articular bone formation at the insertions of tendons and ligaments reflect physiological responses to traction or inflammation,33 and can be better visualised by three-dimensional demonstration of the joint than with CR. Although possibly different underlying mechanisms, we did not emphasise the type of OP for feasibility reasons. Our category definitions were similar to those in the US scoring system28 and the OARSI radiographic atlas,1 and we were able to reliably capture the burden of OPs in HOA.
Sclerosis is in addition to OP, JSN and erosions considered as a key feature in HOA. Cortical bone is poorly visualised by MRI especially in smaller joints such as the DIP and PIP joints, and subchondral sclerosis was therefore not tested as part of the OHOA-MRI scoring system. Cartilage-specific sequences were not used in this study, and JSN as an indirect measure of cartilage damage was therefore assessed.
Developing a MRI score also include recommendations about MRI sequences. The T1w images are required for demonstration of CLs, OPs, JSN, erosions and cysts. Post-Gd T1w images were used to assess synovitis and flexor tenosynovitis, as omitting Gd contrast has shown decreased sensitivity for synovitis in RA.34 However, due to contraindications and possible serious side effects in older people with a non-life-threatening disease such as HOA,35 comparative studies with and without Gd should also be performed in HOA. The sequences must be obtained in at least two planes or as in this study by using a three-dimensional technique with small isometric voxels in one plane and subsequent reconstruction. In this study we used fat-suppressed (fs) T1w images. Thus, the distinction between bone and soft tissue such as tendons may be more distinct on non-fs images. Lastly, the assessment of BMLs requires a T2w fs or a STIR sequence in two planes. In this study, we used a 1.0 T MRI scanner, which has lower signal-to-noise ratio and possibly poorer visualisation of anatomical and pathological structures compared to higher field strength.
Some study limitations are noteworthy. The readers had varying degree of MRI experience, but the reliability was overall very good and similar to ICCs from previous OMERACT exercises.36 37 Due to technical difficulties with the anonymisation, we introduced new software prior to the second reading, which may have affected the intrareader reliability. The interval between the readings was only 1 week, which may be considered as a short interval for intrareader reliability. However, due to the comprehensive scoring system we considered it unlikely that the readers remembered the scoring values from the first reading. The reliability was tested in a small hospital-based sample, and the external validity of the score to the general HOA population has not yet been determined. Finally, the reliability of the proposed OHOA-MRI score was tested for the DIP and PIP joints only, as the field of view of the extremity coil in most patients did not permit to scan the MCP joints and visualisation of the carpometacarpal (CMC) joint would have required a separate MRI acquisition. Although reliability has not yet been tested, we assume that the proposed MRI definitions can be applied to the MCP and CMC joints.
In conclusion, this study is the first to present a reliable MRI scoring system for assessment of inflammation and joint damage in HOA. An atlas is presented, which facilitates implementation of the score. Key features such as synovitis, flexor tenosynovitis, erosions, OPs, JSN, malalignment and BMLs showed good to very good intrareader and inter-reader reliability. The score is extensive and time consuming, and infrequent and/or less reliable features such as cysts, sagittal subluxation and CL pathology could possibly be excluded from the score. However, we first recommend that future studies confirm the reliability of the OHOA-MRI score. The responsiveness and validity of the score should be evaluated against other imaging and patient-reported outcomes.
We thank the participants of the Oslo HOA cohort for helping us to perform this study.
Funding This study was supported by grants from the South-Eastern Norway Regional Health Authority.
Competing interests Professor Johannes Bijlsma was the handling editor.
Patient consent Obtained.
Ethics approval This study was conducted with the approval of the Regional Ethical Committee (Norway).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.