Objective: To examine the interobserver agreement of commonly used clinical tests and diagnoses in patients with shoulder pain, and the accuracy of these tests and ultrasonographic findings in comparison with arthroscopic findings.
Methods: Eighty six patients with longstanding shoulder joint pain were “blindly” examined by two trained doctors using several clinical tests. In all patients an ultrasonographic examination was performed, and in 42 (49%) an arthroscopy.
Results: Tests for impingement showed poor to moderate agreement. Tenderness of muscles, muscle weakness, and tests for labral lesion also showed poor agreement. Pain during muscle contraction showed moderate agreement. The agreement of clinical diagnoses was poor and the accuracy was low in comparison with arthroscopy. Ultrasonography was accurate in full thickness supraspinatus tendon tears, but inaccurate for partial tears and labral lesions.
Conclusions: Most clinical tests showed poor agreement. Clinical and ultrasonographic diagnoses had low accuracy in comparison with arthroscopy.
Statistics from Altmetric.com
More research is needed to establish uniform methods of defining shoulder disorders.1 Good agreement should be obtainable when different clinicians use clinical classification criteria of shoulder problems, and if possible these criteria should be validated—for example, by arthroscopy.
In a study in which diagnosis of shoulder pain was intended, only three clinical diagnoses emerged from cluster analysis2: one group with movement restriction (synovial group), a small group with restriction in all directions (frozen shoulder), and a group without restriction of shoulder joint movements (shoulder girdle group).
Even though most shoulder problems resolve in a few weeks to months, longlasting shoulder problems are common.3 Many clinical tests are used to establish the different shoulder diagnoses (impingement/tendinitis, different tendon lesions, acromioclavicular disease, labral lesions). These are often supplemented by magnetic resonance imaging or an ultrasonographic examination, both of which have been found to be accurate for diagnosing full thickness rotator cuff tears.4,5 We wanted to find answers to the following questions about the process of diagnosing chronic shoulder pain:
What is the interobserver agreement of historical information, clinical tests, and the final diagnosis?
To what extent are clinical and ultrasonographic diagnoses accurate in comparison with arthroscopic diagnoses as the “gold standard”?
The study group comprised consecutive patients with shoulder pain who fulfilled the following inclusion criteria:
Clinical suspicion of labral or rotator cuff lesion or
Shoulder pain originating from the shoulder joint/tendons, with a duration of at least two months (severe pain), three months (moderate pain), or six months (slighter pain). Steroid injections should have been tried without lasting effect or found not to be indicated. This primary evaluation was based upon history, symptoms, and tests as described below and was performed by trained specialists.
Exclusion criteria included diffuse pain syndromes, disc, joint or nerve disease, severe medical diseases, psychiatric disease, pregnancy or lactation, previous shoulder operation, or fracture.
Consecutive patients from the participating departments were included. An ultrasonographic examination of the shoulder was first performed by one of two experienced ultrasonographers who was unaware of the clinical condition of the patient. Thereafter, patients were examined clinically by an orthopaedic surgeon and a rheumatologist randomly, both having long experience with shoulder disorders. Before starting the study each clinical test was carefully discussed to optimise interobserver agreement.
The clinical examinations were performed on the same day and were initially performed without prior knowledge of the patient's history. History and symptoms were then registered by a standardised interview. After this, the examiners were given the result of the ultrasonography. After each step the examiners registered suggestions for the diagnosis. Then the two clinicians discussed their findings and decided on a diagnosis and a treatment plan. Arthroscopy was found to be indicated in 42 patients.
Range of motion was clinically assessed in the standing position without the use of a goniometer for abduction in the plane of scapula (scaption), and flexion (active and passive) and external rotation. Tenderness was assessed by manual pressure at the muscles and tendons of the shoulder—pressure was not standardised. Pain and weakness during isometric contraction was assessed for the abduction elevation, internal and external rotation, and elbow supination. Impingement tests were carried out by active and passive flexion and scaption.6 A 90 degrees flexion or scaption movement was combined with internal rotation (Hawkins test).7 These impingement tests were regarded as positive if they elicited pain. The impingement release test was performed by pressing the humerus head downward during abduction.8
A number of instability/labrum tests were carried out. An apprehension test was performed with the patient in a supine position.9 The relocation test was performed by applying a posteriorly directed force during a supine apprehension test.9 The anterior release test was performed by pressing the humerus backward in the supine position followed by a release of the force.10 Crank and anterior slide tests were performed by applying a force in the axis of the humerus.11 In these tests laxity, click, and pain were registered. The “load-shift” test was used to examine anterior and posterior laxity by pulling and pushing the humerus head forwards and backwards. The sulcus test was performed by applying a firm downward pressure on the arm during slight abduction.
Abnormal subluxation and tenderness of the acromioclavicular and sternoclavicular joints was registered. Laxity was registered by applying a downward pressure on the shaft of the clavicle irrespective of whether arm adduction or scapula elevation provoked pain from the joints.
For ultrasonography a Siemens Versa-pro equipment was used with a 7.5 MHz linear array and a curvilinear 5 MHz transducer. Rotator cuff tendons, the posterior labrum, and the subacromial bursa were examined with the patients sitting.5 The anterior labrum was assessed with the patients supine.12
Anterior and posterior portals were used for arthroscopy, and the humeroscapular joint was examined systematically in all patients.
Interobserver agreement for assessment of the range of motion (degrees) was assessed by calculation of the standard deviation of the difference.13 All other clinical tests were scored as either positive or negative. Interobserver agreement (reliability) of the different tests was determined by κ statistics.14 κ Values <0.4 indicate low agreement, values 0.4–0.6 moderate agreement, and values >0.6 good agreement. The clinical diagnoses and ultrasonography were related to the arthroscopic findings, allowing calculation of the sensitivity, specificity, and accuracy.
The project was approved by the local ethics committee, and informed consent was obtained from the patients.
Eighty six patients (52% women), with a mean symptom duration of 25 months, were included in the study.
Agreement of the responses to different interview questions was variable (table 1). There was generally low agreement between the two observers for clinical tests (table 2). The results of some tests are not shown in table 2: (a) The orthopaedic surgeon generally scored labrum tests positive 1.3–3 times more often than the rheumatologist did. All showed poor agreement (κ<0.4); (b) tests for muscle weakness and abnormality of the acromioclavicular and sternoclavicular joints all showed poor agreement (all κ<0.4).
The interobserver agreement of diagnoses based on clinical examination alone was poor (all κ<0.4) and did not improve when the patient's history was known.
Ultrasonography and arthroscopy
The results of ultrasonography were full thickness supraspinatus rupture in four patients, partial rupture in seven, anterior labrum lesion in five, infraspinate rupture in one. In 57 (66%) no lesions were registered. Arthroscopy showed supraspinatus full thickness rupture in three patients, partial rupture in four, infraspinatus tendon rupture in two, anterior labral tears in eight, and superior labral tears in three.
Accuracy of clinical tests and diagnoses compared with arthroscopy
Only the total supraspinatus ruptures had reasonable sensitivity (67%) and a specificity of 90% when examined by the surgeon (accuracy 79%). The accuracy of the diagnoses of partial supraspinatus ruptures and labral lesions and all the rheumatologist's diagnoses was <65%. The accuracy was not improved by adding information of the patient's history, and only marginally higher when the result of the ultrasonography was available, and only for supraspinatus ruptures.
Accuracy of clinical ultrasonography
Utrasonography was reasonably accurate for full thickness rotator cuff tears (two out of three, sensitivity 67%)—specificity 100%. The sensitivity of ultrasonography for detecting partial thickness tears and labral tears was <35% for both.
The historical information, obtained by standardised interview by the two examiners, had low κ values. The κ value was >0.4 for only a few of the diagnostic tests; this is usually regarded as the lowest acceptable level. Only “Hawkins test”, “pain by abduction/internal rotation/external rotation”, “tenderness in the infraspinatus muscle”, and “anterior drawer” had acceptable agreement, while the agreement between the two observers for other tests was poor. Of the range of motion tests passive abduction and flexion had the best agreement. The two examiners had gone through each test carefully before starting the study to make sure that they performed them similarly. Therefore, the poor agreement for the result of the tests is probably not caused by completely different ways of performing the tests, but may be partly explained by differences in interpreting the patients' responses to the tests. The different clinical backgrounds (rheumatology, orthopaedic surgery) may also have influenced the results.
One other study of interobserver agreement had results comparable with ours.15 In that study a long duration of symptoms was associated with disagreement on diagnoses. It should be emphasised that patients with primarily shoulder girdle pain or frozen shoulder were not included in our study. Higher interobserver agreement would probably be obtainable if these patient groups were included in the study and if fewer diagnostic groups were used.2
The accuracy of clinical tests and diagnoses in comparison with the arthroscopic findings was low and only slightly better when the results of ultrasonographic evaluation became available in addition to the clinical examination. Better results for both agreement and accuracy of clinical and ultrasonographic diagnosis are probably obtainable in patients with a shorter history and with more clearly defined conditions—for example, in patients with traumatic shoulder instability or full thickness tears of the rotator cuff.
We thank the Danish Rheumatism Association for financial support.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.