Objective: To assess the intrarater and interrater reliability among rheumatologists of a standardised protocol for measurement of shoulder movements using a gravity inclinometer.
Methods: After instruction, six rheumatologists independently assessed eight movements of the shoulder, including total and glenohumeral flexion, total and glenohumeral abduction, external rotation in neutral and in abduction, internal rotation in abduction and hand behind back, in random order in six patients with shoulder pain and stiffness according to a 6×6 Latin square design using a standardised protocol. These assessments were then repeated. Analysis of variance was used to partition total variability into components of variance in order to calculate intraclass correlation coefficients (ICCs).
Results: The intrarater and interrater reliability of different shoulder movements varied widely. The movement of hand behind back and total shoulder flexion yielded the highest ICC scores for both intrarater reliability (0.91 and 0.83, respectively) and interrater reliability (0.80 and 0.72, respectively). Low ICC scores were found for the movements of glenohumeral abduction, external rotation in abduction, and internal rotation in abduction (intrarater ICCs 0.35, 0.43, and 0.32, respectively), and external rotation in neutral, external rotation in abduction, and internal rotation in abduction (interrater ICCs 0.29, 0.11, and 0.06, respectively).
Conclusions: The measurement of shoulder movements using a standardised protocol by rheumatologists produced variable intrarater and interrater reliability. Reasonable reliability was obtained only for the movement of hand behind back and total shoulder flexion.
- range of motion
- outcome measurement
Statistics from Altmetric.com
Shoulder pain is common in the general population, its point prevalence averages between 7% and 51% and it is known to increase with age. Restricted range of motion and shoulder pain can interfere with activities in daily life and is associated with work absenteeism and use of medical services.1–5 Many patients receive some evaluation by a family doctor, rheumatologist, orthopaedic specialist, or physical therapist.3
A physical examination is often used for both diagnosis and evaluation of treatment success in patients with shoulder pain. One aspect of physical assessment of the shoulder is the evaluation of range of motion. No “gold standard” for the measurement of shoulder range of motion is yet available. Clinical trials that have assessed the efficacy of interventions for shoulder pain have commonly used range of motion of the shoulder as a measurement tool.6 To be of value in clinical trials or routine care its reliability (that is, the repeated administration of an instrument to a stable population yielding the same results) should be established.
Multiple methods for estimating shoulder range of motion have been used in the past, including visual estimation, the two armed goniometer, or a gravity referenced goniometer.7–16 In many of these studies the methods are poorly described and most looked at passive range of motion only, which may reflect a less functional outcome.14 The results may not be applicable to symptomatic patients with varying degrees of pain and stiffness. Only a few studies have included subjects with shoulder pain.7,8,13–16
We previously developed a standardised protocol, which we assessed in a similar group, for the measurement of active range of motion in clinical trials for shoulder pain.13 The intrarater and interrater reliability of our standardised protocol was found to be acceptable when physiotherapists performed the range of motion assessments. We concluded that it would be appropriate for use both in research and clinical practice by physiotherapists. The gravity inclinometer chosen is fast and easy to use but has not been evaluated in observers with a different professional background.
To determine whether our protocol may also be of value when rheumatologists perform the measurements, this study aimed at examining the interrater reliability (that is reliability when performed by multiple raters) and intrarater reliability (that is reliability when performed again and again by the same rater or examiner) of the standardised protocol when performed by six rheumatologists. Although this study replicates a previous study, which looked at different subjects with similar complaints, differences in professional training and daily use of manual examination techniques between physiotherapists and rheumatologists might influence the reliability of the measurements. For this reason, we compared the results with those obtained by six physiotherapists.
The standardised protocol for the measurement of shoulder movement with the Plurimeter-V gravity inclinometer (Dr Rippstein, Zurich, Switzerland) was described in detail previously.13 To minimise potential sources of variation, for each movement, we specified the position of the patient, positioning of the instrument on the patient, stabilisation of the joint to be measured, and determination of the end point of movement. The gravity referenced inclinometer was used to measure in total eight movements: total shoulder flexion; glenohumeral flexion; total and glenohumeral abduction; internal rotation; and external rotation in both neutral and abduction. Internal rotation was performed in abduction only. Hand behind back, assessed by palpation to the nearest spinal process, was also included.
Six patients with varying degrees of pain and stiffness in the shoulder, and varying diseases, were recruited from a private rheumatology practice. Subjects were excluded if repeated testing was likely to aggravate the condition.
Six rheumatologists undertook a one hour training session to familiarise themselves with the process and standardise the measurement methods between the examiners. All were given a detailed manual. All had experience in measuring shoulder range of motion, but none had used an inclinometer. The examiners in the study were instructed to demonstrate each movement to the patient and ask the patient to move the arm until the first sensation of pain so that each examiner agreed on the definition of the first onset of pain. Care was taken in aligning the inclinometer on the patient's arm, ensuring consistent positioning of adjacent joints. The participant baseline data collected included age, sex, duration of the shoulder pain, and a history of trauma or shoulder surgery. In addition, the patients rated their severity on an ordinal five point scale (1, not at all severe; 5, maximum severity).
Before the measurements the patients were taken through a series of warm up exercises to reduce the risk of a mobilisation effect from the repeated movement assessment. According to a 6×6 Latin square design each patient was examined twice by the same examiner (in round 1 and round 2). Each examiner spent about five minutes with each patient in each round. Between rounds 1 and 2 there was a one hour break for both patients and examiners. The order of the examiners and the order of the eight movements were randomly assigned for each round separately. In addition, the examiners were in separate rooms, and had no contact with the other examiners during the measurement procedures. Thus the examiners were unaware of each other's scores and their own scores of the previous round.
Analysis of variance was used to partition the total variability into components of variation due to patients, examiners, round, two-way interactions of individual components, and residual error (using the SAS VARCOMP procedure).17,18 All sources were considered as random effects.
Results are presented as the total variability and the percentage of the total variability attributable to each variance component. Intrarater and interrater reliability were calculated from the variance (var) components through appropriate intraclass correlation coefficients (ICCs)19 according to the following formulae:
The closer the ICC is to 1, the better the reliability. Statistical uncertainty in the computed ICCs is expressed through approximate 95% confidence intervals obtained from 5000 bootstrap resamples of both subjects and rheumatologists.
Interexaminer reliabilities were also calculated for each round separately:
Six patients (two male, four female), with a mean age of 72 years (range 54–82), took part in the study. The right shoulder was assessed in four patients and the left shoulder in two. The mean duration of symptoms was 16 months (range 1–48). Two patients had a history of shoulder trauma and one patient had had previous shoulder surgery. The severity rating by each patient ranged from 2 (mild severity) to 3 (moderate severity) on the ordinal five point scale.
Table 1 shows the means and standard deviations for each of the six patients in rounds 1 and 2, combining the scores of the six rheumatologists for each movement. The range of motion of patients varied considerably. For example, the mean total shoulder flexion in round 1 varied between 66.7° (SD 22.3) and 136.5° (12.1). The differences between rounds 1 and 2 were smaller.
Table 2 shows the intrarater and interrater ICCs calculated for each of the eight shoulder movements together with their approximate 95% confidence intervals. The movement of hand behind back and total shoulder flexion gave the highest ICC scores for both intrarater reliability (0.91 and 0.83, respectively) and interrater reliability (0.80 and 0.72, respectively). Low ICC scores were found for the movements of glenohumeral abduction, external rotation in abduction, and internal rotation in abduction (intrarater ICCs 0.35, 0.43, and 0.32, respectively); and external rotation in neutral, external rotation in abduction, and internal rotation in abduction (interrater ICCs 0.29, 0.11 and 0.06, respectively).
Table 3 gives the individual components of variation as a percentage of the total variance. The proportion of the variance between examiners for the movements of external rotation in neutral, external rotation in abduction, and internal rotation in abduction was considerably greater than that of the explained variance of the patients, thus yielding very low interrater ICCs (see respective formula) for these movements.
In a previous study, we evaluated the reliability of shoulder measurement between physiotherapists using the same design and a similar patient group with shoulder pain.13 Figures 1 and 2 display graphically the ICCs of the rheumatologists compared with those of the physiotherapists. Overall, the physiotherapists achieved higher intrarater and interrater reliability ratings than rheumatologists, especially for the movements of external rotation and internal rotation.
In this study we found that using a standardised protocol for the measurement of shoulder movements produced variable intrarater and interrater reliability. Reasonable reliability (ICC>0.7) was obtained for the movement of hand behind back and total shoulder flexion only. The other movements in our protocol produced fair to poor reliability, especially for those movements demanding more complex stabilisation and handling skills, such as external rotation and internal rotation. There was a learning effect as demonstrated by the increase in interrater reliability in round 2, though we did not formally test for statistical significance of this apparent increase in reliability. However, we think that the reliability can be improved with practice.
As far as we know, no previous published studies have investigated the differences in reliability between different groups of raters for measuring shoulder movements. Physiotherapists and rheumatologists may differ in their use of manual techniques, especially those movements requiring fixation, which are performed more often by physiotherapists. There may also be other determinants of the differences other than the professional background of the raters, such as experience in examining shoulders, different verbal and non-verbal cues to patients, and level of confidence, which may also have a role. The differences in reliability might also depend on the selection of the individual raters. For example, the physiotherapists selected in our previous study might have been a more homogeneous group. In addition, we cannot exclude the possibility that the observed differences between the ICCs of the physiotherapists and rheumatologists were due to real differences between the patients. Although both studies recruited patients with stiff painful shoulders with similar duration of symptoms and pain scores, some minor differences were noted, including the fact that patients in this study had more homogeneous pain scores. Unfortunately, the small number of patients in our study does not allow for more analyses to examine the sources of disagreement between the rheumatologists and physiotherapists.
Other studies have reported interrater ICC values for patients with shoulder disorders ranging between 0.26 and 0.95.7,13–15 The reliability scores in our study are consistent with the overall conclusion of previous studies that reliability is highly variable between the various shoulder movements. The measurement instruments and the methods that were used in previous studies were also different, as were the reliability statistics (such as the limits of agreement16). Previous publications have discussed the use of appropriate ICCs. Studies that have reported on the reliability of shoulder movements in healthy subjects9,10,12,20 may either overestimate or underestimate the reliability of shoulder movements when applied in a clinical setting. For reasons of clarity we provide the formulas of the ICCs that we used. ICC values are known to depend on the variation in the study group.21 More homogeneous study groups may produce lower ICCs as the main contributor of the variance (patient) is negligible. As the end point of range of movement was determined by the patients' subjective reporting of pain the results of this study are only applicable to patients with shoulder pain.
With further training, and with standardisation of methods of measurement, the rheumatologists may reach higher levels of reliability. Considering the variability in reliability, it may be questioned whether it is necessary to have patients perform all isolated shoulder movements, especially if there are time constraints. The scapular fixation may require extra training to perform in a standardised way. One way to overcome this would be to eliminate complex shoulder movements, and choose the most reliable movements—for example, hand behind back instead of internal rotation in abduction, which both assess internal rotation. Although we had the examiners do a single movement only, the ICCs may also improve with two readings.
In conclusion, we recommend that for clinical trials of painful shoulder disorders where shoulder movements are an end point the assessments should be performed by the same examiner whenever possible. The examiner requires training and practice in a standardised protocol. For rheumatologists, one hour's training may not be enough. Consideration should be given to choosing only those measures with known reliability.
Electro-Med, Melbourne for the supply of the Plurimeter-V inclometer.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.