Article Text

Extended report
Metric properties of advanced imaging methods in osteoarthritis of the hand: a systematic review
  1. Michael S Saltzherr1,2,
  2. Ruud W Selles3,4,
  3. Sita M A Bierma-Zeinstra5,
  4. Galied S R Muradin1,
  5. J Henk Coert3,
  6. Johan W van Neck3,
  7. Jolanda J Luime2
  1. 1Department of Radiology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
  2. 2Department of Rheumatology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
  3. 3Department of Plastic, Reconstructive and Hand Surgery, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
  4. 4Department of Rehabilitation Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
  5. 5Department of Family Practice, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
  1. Correspondence to Michael S Saltzherr, Department of Radiology, Erasmus MC, PO Box 2040, 3000 CA Rotterdam, The Netherlands; m.saltzherr{at}


Objective To assess the value of advanced imaging techniques in the detection of hand osteoarthritis (OA) and hand OA progression.

Methods PubMed/Medline and Embase were searched until April 2012 for studies on imaging of hand OA that presented quantitative data on validity, reliability or responsiveness. Articles presenting only data on conventional radiography (CR) were excluded. Methodological quality was assessed by the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist for validity, the Quality Appraisal of Reliability Studies (QAREL) for reliability and the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) for responsiveness.

Results Of 627 citations, 25 studies on ultrasonography (US), MRI or scintigraphy were included. No studies on CT, positron emission tomography or single photon emission CT met our eligibility criteria. Validity was generally assessed against healthy controls, CR or clinical examination. Overall, US and MRI detected more disease than CR and found significant differences between patients and healthy controls. Scintigraphy detected fewer pathological joints than CR. Intra- and inter-reader reliability varied for US (κ=0.01–1.0) and MRI (κ=0.15–0.84 and intraclass correlation coefficient=0.21–0.99) and was good for scintigraphy (κ=0.61–0.84). There were no responsiveness studies for MRI. US responsiveness studies showed a reduction of soft-tissue changes after treatment which correlated with decrease in pain (r=0.7–0.8). For scintigraphy, scores decreased over time while CR showed progression of hand OA.

Conclusions MRI and US seem to be the most promising candidates for early detection of hand OA and for future use in clinical trials. However, further research is needed to improve scoring methods, to compare US with MRI, to confirm reliability of MRI and to further determine the responsiveness of US and MRI.

  • Hand Osteoarthritis
  • Ultrasonography
  • Magnetic Resonance Imaging

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Hand osteoarthritis (OA) is a disabling disease, with prevalence of up to 70% among the elderly.1 ,2 Patients typically present with intermittent joint pain and stiffness,3 loss of joint mobility and loss of grip strength causing impairment in daily activities.1 ,4–6 Hand OA is characterised by degradation of articular cartilage, synovial inflammation and bone deformation. Possible treatments are limited, but new pharmacological treatments are being developed.7

Conventional radiography (CR) is the standard imaging method for assessing structural changes in OA.8 ,9 It can display joint space narrowing (JSN), an indirect measurement of cartilage destruction and bone deformation. Although four major scoring systems are available for evaluating hand OA on CR,10–13 there is no consensus on the optimal system. These scoring systems have demonstrated good reliability,14 ,15 but low sensitivity to change within 1 year.14 CR does not show inflammation and seems unable to show the start of cartilage degradation.16 CR is therefore not optimal for identifying early OA or for monitoring disease progression for periods of <1 year.17

Several other imaging techniques can be considered for detecting and monitoring OA-related changes, each with their own advantages and disadvantages. These include CT, ultrasonography (US), MRI and nuclear imaging methods like positron emission tomography (PET), single photon emission CT (SPECT) and scintigraphy. CT is the best method for imaging structural bony changes, but cannot depict cartilage or the joint capsule. US can visualise cartilage and other soft tissues, but the ultrasonic waves may be blocked by bony structures, hindering imaging of the whole joint. MRI visualises both bone and the soft tissues, but has a lower resolution than other imaging techniques, is time consuming and relatively expensive. Nuclear imaging methods do not visualise structural anatomy, but show metabolic activity within the joints, which can often be detected before radiographic changes.

To assess the value of advanced imaging techniques for detection of hand OA and its progression, we performed a systematic review of the literature to assess validity, reliability and responsiveness for CT, US, MRI, PET, SPECT and scintigraphy.


Search strategy and selection

The electronic databases Medline and Embase were searched for articles up to April 2012. The search terms included keywords such as ‘osteoarthritis’, ‘hand joints’ and ‘imaging techniques’ (see online supplementary text S1). No language restrictions were used. Titles and abstracts were independently screened by two reviewers (MSS, JJL or RWS) to identify eligible articles. If one of the reviewers selected an abstract, the full-text article was retrieved, screened and, if eligible, selected for review. Selection disagreements were resolved by consensus. Reference lists of retrieved articles were checked for additional records.

Papers were eligible if (1) the paper was a full-length primary paper on hand OA; (2) CT, MRI, US, PET, SPECT or scintigraphy was used to image one or multiple hand joints in patients diagnosed with, or suspected of having, hand OA or if one of these techniques was used to assess hand OA-related characteristics in healthy controls; (3) one or more of the following joints were imaged: first carpometacarpal (CMC1), scapho-trapezio-trapezoidal, metacarpophalangeal (MCP), proximal interphalangeal (PIP) or distal interphalangeal (DIP) joint; and (4) a quantification of validity, reliability or responsiveness was presented.

Both criterion validity and construct validity studies were included. Criterion validity is determined by comparison with an optimal reference standard, which we considered to be a comparison against histology or arthroscopy. Construct validity is determined by comparison with other techniques measuring similar properties and we therefore included comparisons against other imaging techniques, clinical examination and healthy controls. Reliability studies were included if any form of inter-reader or intrareader reliability was reported. Responsiveness studies were included if they measured change and compared this change with another method.

We excluded articles if CR was the only imaging technique used or if descriptive data only were reported, without hypothesis testing. We also excluded articles that assessed a patient group of diverse arthritides and data from patients with OA was not reported separately. The primary reviewer (MSS) extracted all the data, which included study design, patient characteristics, details of imaging technique, method of image analysis and outcome measures.

Quality assessment

Methodological quality was assessed using three checklists. The Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool with additional QUADAS items for validity,18 ,19 the Quality Appraisal of Reliability Studies (QAREL) checklist for reliability,20 and the responsiveness checkbox of the Consensus-based Standards for the selection of health status Measurement Instrument (COSMIN) for responsiveness.21 The checklists were adapted for our specific purpose (see online supplementary text S2). Questions were answered with ‘yes’, ‘no’ or ‘unclear’. If studies investigated multiple outcome measures, then multiple quality assessments were performed. Quality assessment was performed independently by five reviewers (MSS, SMABZ and RWS for QUADAS; MSS, JJL and JWvN for QAREL; and MSS and JJL for COSMIN). Disagreements were resolved by discussion.


Selection of studies

Our search identified 869 records (313 Medline and 556 Embase), including 242 duplicates (figure 1). We considered 106 relevant and retrieved them in full text. Seventy-seven articles were excluded, including three because they were not in English.22–24 Four articles25–28 reported data about the same cohort and we included the most informative article.28 Two other articles also reported data from the same study population,29 ,30 of which one was kept.29 Reference checking did not result in any additional records.

Figure 1

Results of systematic search and selection process.

Study characteristics

Twenty-five articles were included in this review:28 ,29 ,31–53 Fourteen articles on US, five on MRI, five on scintigraphy and one on both US and MRI. Abstract screening yielded two PET and one SPECT article on hand OA, which were excluded because no quantification of validity, reliability or responsiveness was presented,54 ,55 or because patients with diagnoses other than hand OA were included.56 We did not identify any CT study. The characteristics of the included studies are summarised in table 1.

Table 1

Characteristics of the included studies

The inclusion criteria varied between studies from symptomatic hand OA without abnormalities on CR or positive American College of Rheumatology criteria,62 to erosive hand OA on CR. This heterogeneity in patient populations reflects the variation in disease duration, which ranged from a few months to more than 10 years. Age and sex distributions were consistent among most studies (mean or median age of patients >55 and 61–100% being female). The scored joints ranged from a single CMC1, DIP or PIP joint to a 30-joint examination of thumb base, DIP, PIP and MCP joints of both hands. One scintigraphic study also included the radial and ulnar part of the wrist.28

Methodological quality

The results of the presented studies pose some limitations and should be interpreted with caution (see online supplementary text S2 for details). The optimal spectrum of patients should consist of a mix of patients who are likely to undergo imaging for diagnosis or follow-up of hand OA. However, some studies only included patients with severe OA, while others added healthy controls to the patient group. Other general limitations included insufficient description of sample size determination and lack of information about the training and experience of the examiner.

In the validity studies, the use of only severely affected patients might have increased sensitivity, while the use of healthy volunteers as reference standard might have increased specificity or overestimated correlations.19 In the reliability studies, agreement might have been inflated in samples where results are obvious—for example, in patients with extreme disease status or healthy controls.20 Examiner blinding was insufficiently described in reliability studies. As incomplete blinding may affect reliability results,20 it should be described extensively. Responsiveness studies often lacked a priori hypotheses of the expected change, which are recommended as it is easy retrospectively to create alternative explanations for low correlations or differences between changes.21 It was also often unclear whether raters could review their prior ratings. This is important as not knowing previous results minimises expectation bias, but gives a higher measurement error.63


Eleven US, five MRI and three scintigraphy articles examined validity (table 2). None of the studies determined criterion validity by comparing with histology or arthroscopy. Construct validity was determined by using different comparators as healthy controls, CR, joint pain, joint swelling or MRI.

Table 2

Validity of US, MRI and scintigraphy studies for hand OA

Four of 11 US studies compared patients with hand OA with healthy controls and reported significant differences in JSN,42 osteophytes,42 synovitis,31 ,42 ,48 power Doppler (PD) signal,31 ,42 ,48 and joint effusion,31 ,37 ,48 while no significant differences were found for tendon effusion.31 Five studies compared structural US changes with CR, and US generally detected more osteophytes,41 ,46 ,51 ,52 erosions51 ,52 and JSN.41 Only one study detected fewer erosions with US (sensitivity=0.73, specificity=1.0).38 Joint pain, tender joints and swollen joints were used as comparator in four studies and agreed poorly with US greyscale measurements of synovitis, effusion, PD measurements, JSN and osteophytes.31 ,42 ,45 ,46

One out of five MRI studies compared patients with hand OA with healthy controls, reporting significantly more ligament abnormalities, tendon abnormalities, cartilage abnormalities, joint effusion, osteophytes, bone marrow lesions (BML), erosions and cysts in patients.29 Two other studies compared MRI with CR and found that MRI detected significantly more osteophytes and erosions, while CR detected significantly more cases with malalignment.33 ,36 A fourth study investigated associations between MRI and joint pain on palpation and found the highest associations for synovitis (OR=2.4, 95% CI 1.6 to 3.8) and bone attrition (OR=2.5, 95% CI 1.5 to 4.1).35 One study compared US with MRI and reported moderate agreement between these modalities (k=0.41–0.55). US detected more osteophytes and effusion, while MRI detected more erosions and synovitis.53

Three scintigraphy studies compared isotope uptake in bone with joint pain and CR. Isotope uptake was correlated with joint pain (τ=0.24),28 and OA on CR (r=0.50–0.61).32 ,50 Scintigraphy detected fewer pathological joints than CR.


Eight US, four MRI and two scintigraphy studies examined reliability (table 3). Four US studies assessed inter-reader reliability. In two studies agreement was good (κ=0.83–0.99) for synovitis, PD, effusion, osteophytes and erosions,52 ,53 while in one study this varied for synovitis, PD and osteophytes (κ=0.229–0.530).40 Intrareader reliability was assessed in five studies. In four studies, intra-reader reliability assessed by one reader was moderate to good (κ=0.62–0.94) for synovitis, PD, JSN, effusion and osteophytes and good for cartilage thickness (intraclass correlation coefficient (ICC)=0.96).42 ,46 ,48 ,51 The fifth study reported intrareader reliability for seven readers, ranging from poor to good (κ=0.172–1.0) for synovitis, PD and osteophytes.40

Table 3

Reliability of US, MRI and scintigraphy studies for hand OA

Three MRI studies reported that inter-reader reliability was high for erosions, JSN, BML, malalignment and ligament absence (κ=0.76–0.84 and ICC=0.79–0.97); moderate to good for synovitis and tenosynovitis (κ=0.58 and ICC=0.48–0.51); low for cysts (ICC=0.21) and variable for osteophytes (κ=0.15 and ICC=0.88).33 ,34 ,53 MRI intra-reader reliability was assessed in two studies and was high for synovitis, osteophytes, erosions, JSN, BML, malalignment and ligaments (κ=0.71–0.84 and ICC=0.84–0.99); moderate for cysts (κ=0.66 and ICC=0.59) and variable for tenosynovitis (κ=0.30 and ICC=0.63).34 ,35

One scintigraphy study reported high inter-reader reliability (κ=0.61–0.82),49 and one scintigraphy study reported high intrareader reliability (κ=0.84).39


Two US and three scintigraphy studies assessed change scores over time and included a comparator. Only two of these studies assessed true responsiveness by calculating a correlation coefficient between the changes (table 4).

Table 4

Change and responsiveness of US and scintigraphy studies for hand OA

One US study reported a significant decrease in PD and effusion in patients treated with intra-articular hyaluronic acid injections. These decreases correlated with a significant reduction of pain (r=0.7 and r=0.8).44 The other US study reported a small non-significant decrease in grey-scale synovitis and PD in patients treated with intramuscular methylprednisolone injections, while there was a significant decrease in pain.43

In the scintigraphy studies, no interventions were used, but change during disease progression was measured. In all three studies scintigraphic studies scores decreased over time while the disease progressed and radiographic and pain scores increased.28 ,32 ,50 Changes in the radiographic scores were weakly correlated with changes in the scintigraphic scores (r=0.13).32


This systematic review shows that there is growing evidence on validity, reliability and responsiveness of advanced imaging methods in hand OA. US and MRI seem the most promising candidates, with US being the most investigated modality. Few studies have compared US directly with MRI. Wittoek et al53 reported that MRI was more sensitive for synovitis and erosions, but US detected more effusion and osteophytes. This last finding, however, is in contrast with a recent publication by Mathiessen et al,64 in which osteophytes were more often detected with MRI (87% vs 75%). According to Mathiessen, the MRI might have underperformed in the study by Wittoek, as they did not use standardised scoring methods and had poor inter-reader reliability.

US and MRI were both more sensitive for detecting osteophytes and erosions than CR, with the exception of one US study. US and MRI also showed significant differences between patients and healthy controls for structural and soft-tissue changes, including ligament abnormalities, which were only investigated with MRI, and cysts and BML, which cannot be assessed with US. Correlations between US and clinically assessed synovitis were low, as also found in hip and knee OA studies.65 Reported reliabilities were mostly moderate to good for US and MRI, although some variability was seen in the few MRI studies for synovitis, tenosynovitis, cysts and osteophytes. Responsiveness was only evaluated in US, which demonstrated that reduction of soft-tissue lesions correlated with pain decrease. More studies should therefore focus on reliability of MRI, responsiveness of US and MRI and comparison of US and MRI.

Bone scintigraphy seems less promising for detection and follow-up of hand OA. Scintigraphy was weakly correlated with clinical symptoms and detected fewer pathological joints than CR. The reliability of scintigraphy was good, but scintigraphy scores decreased over time, while the disease progressed clinically and radiographically. This responsiveness pattern is comparable to the results from a systematic review of knee OA,66 and inherent to the technique. Scintigraphy shows increased uptake of bone tracers, representing osteophyte and cyst formation.67 As the new osteophytes become visible on imaging techniques showing structural damage, they will relieve stress on the joint and scintigraphic findings will diminish.67

No studies on CT, PET or SPECT reported validity, reliability or responsiveness. However, these may be less optimal than with US and MRI. Although CT is more sensitive than MRI and US for detecting erosions,68–70 it does not visualise cartilage or other soft tissues. PET and SPECT use radiopharmaceutical agents that target bone and may therefore have similar limitations to those described for scintigraphy. However, this may change when cartilage-specific tracers become available.71 ,72

A variety of scoring methods was used in the reviewed studies. These methods were often newly devised by the authors (based on rheumatoid arthritis literature), or not properly described. In both US and MRI literature only a single scoring method was used in multiple studies. The US method by Keen et al40 was used in eight articles, although mostly with additions or alterations to the original method. The MRI scoring method by Haugen et al34 has so far been used only in articles by the author's own study group and has undergone one change in subsequent studies. As seen in knee OA,73 scoring methods can improve over time and with new insights into OA. These improvements may lead to shorter scoring times, further improvement of reliability, validity and responsiveness and, hopefully, a widely accepted consensus method.

A number of issues should be taken into account when interpreting the results of this review. Our search was extensive but we might still have missed publications. Three articles were excluded because of language difficulties,22–24 as we could not reliably determine methodological quality and extract data. We found no criterion validity studies in which histology or arthroscopy was used as a reference standard, probably because these are not easily obtained for hand OA. Not all included validity studies were primarily designed to assess validity, which might have limited their methodological quality. Comparison of construct validity studies was hindered by differences in pathology definition, statistical analysis and comparators. Homogeneity of study design and reporting should therefore be improved in future studies.

We included data on DIP, PIP, MCP, CMC1 and scapho-trapezio-trapezoidal joints, but did not asses differences between these joints. However, anatomical differences may affect imaging performance. For example, limited resolution of MRI may hamper assessment of the smaller DIP joints,34 while US may not fully assess the third and fourth MCP joints, owing to a restricted acoustic window.74 Both MRI and US have technologically advanced in recent years and results from older studies might therefore not be comparable with those of the newer studies. This may also explain why the only study in which US was less sensitive than CR, was also the oldest study that compared the two methods.38

In conclusion, MRI and US seem to be the most promising candidates for early detection of hand OA and for future use in clinical trials. However, further research is needed to improve scoring methods, compare US with MRI, confirm reliability of MRI and better determine responsiveness of US and MRI.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Tore K Kvien

  • Correction notice This article has been updated since it was published Online First. The author affiliations have changed.

  • Contributors All authors contributed to the conception and design of the study, drafted or revised the manuscript and approved the final version of the article.

  • Competing Interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.