Two pragmatic trials of treatment for shoulder disorders in primary care: generalisability, course, and prognostic indicators
- 1Primary Care Sciences Research Centre, Keele University, Keele, North Staffordshire, UK
- 2Institute for Research in Extramural Medicine, VU University Medical Centre, Amsterdam, Netherlands
- Correspondence to:
Dr Elaine Thomas
Primary Care Sciences Research Centre, Keele University, North Staffordshire ST5 5BG, United Kingdom;
- Accepted 15 December 2004
- Published Online First 7 January 2005
Objective: To investigate predictors of long term prognosis in patients treated for shoulder pain in primary care.
Methods: Data were taken from two pragmatic randomised clinical trials investigating the effectiveness of conservative treatments for shoulder pain presenting to primary care. Shoulder pain severity, disability, and perceived recovery measured in the long term (UK, 18 months; Netherlands, 12 months) were considered as outcome measures. Prognostic indicators measured before randomisation were determined by linear regression (pain severity and disability) and logistic regression (perceived recovery).
Results: 316 adults with a new episode of shoulder pain were recruited (UK, n = 207; Netherlands, n = 109). In multivariate analysis, greater shoulder disability at follow up was associated with higher baseline disability score, concomitant neck pain, and a gradual onset and longer duration of shoulder symptoms. Pain scores at follow up were higher in women and in those with longer baseline duration of symptoms and higher baseline pain or disability scores. Being female, reporting gradual onset of symptoms, and a higher baseline disability score each independently reduced the likelihood of perceived recovery.
Conclusions: The results suggest that there is no long term difference in outcome between patients with shoulder pain treated with different clinical interventions in different clinical settings, or having different clinical diagnoses. Baseline clinical characteristics of this consulting population, rather than the randomised treatments which they received, were the most powerful predictors of outcome. Whether this highlights the need for earlier intervention or reflects different natural histories of shoulder pain is a topic for further research.
- NRS, numerical rating scale
- RCT, randomised controlled trial
- SDQ, shoulder disability questionnaire
- VAS, visual analogue scale
Shoulder problems are common, with up to 47% of adults in the general population reporting such symptoms in a one year period.1 In terms of presentation to general practice, the annual consultation rate for new episodes of shoulder pain is approximately 1%.2 The current evidence from both observational studies3–5 and randomised clinical trials in primary6–8 and secondary care9,10 suggests that many sufferers have an unfavourable long term outcome, irrespective of treatment. Identifying those groups of individuals with shoulder pain who have poor long term outcome would have several advantages, including the ability to advise individual patients on their likely course.
The objectives of this analysis were threefold: first, to investigate the generalisability of the findings from two trials by determining clinical heterogeneity across the two studies in terms of participants, interventions, and outcome; second, to determine the course of shoulder complaints in the complete sample over the follow up period; and third, to investigate potential prognostic indicators for poor long term outcome, using data collected before randomisation.
The trial by Van der Windt et al6 compared the effectiveness of a local intra-articular injection (by a posterior route) of 40 mg triamcinolone acetonide and a course of physiotherapy, in 109 participants presenting to primary care in and around Amsterdam with a new episode of painful stiff shoulder (capsular syndrome).
The trial by Hay et al7 compared the effectiveness of a subacromial local corticosteroid injection of 40 mg of methylprednisolone and 4 ml 1% lignocaine (lidocaine) and a course of community based physiotherapy. This study was based in North Staffordshire and randomised a total of 207 participants attending their general practitioner (GP) with a new episode of shoulder pain. In contrast to the trial of Van der Windt et al, the participants in the Hay trial had a broad range of shoulder problems without focus on a particular diagnosis.
In both studies, consecutive patients consulting in primary care for shoulder pain were eligible for recruitment. The following inclusion criteria were applied in both studies: age 18 years and over, ability to complete questionnaires in the relevant languages, and able to give informed consent. Exclusion criteria in both studies included: bilateral symptoms, contraindication to the treatments being evaluated, recent treatment with either a corticosteroid or physiotherapy, and previous surgery, dislocation, or fracture in the shoulder area. However Hay et al7 additionally excluded patients who had consulted their GP with shoulder pain during the preceding 12 months.
In both studies, patient characteristics and potential prognostic factors were recorded by a research nurse at an initial visit before randomisation. Demographic and clinical characteristics included age, sex, duration of current shoulder complaint, and use of painkillers.
Both studies assessed the following: disability associated with the shoulder pain; pain severity during the day; and participants’ perception of the outcome. This information was collected at three follow up points: short term (six weeks in the UK, seven weeks in the Netherlands), mid-term (six months in both studies), and long term (18 months in the UK, 12 months in the Netherlands). However, there were minor differences between the two studies in terms of the scaling used in these three outcome measures.
Different shoulder disability questionnaires (SDQ) were used in the two studies (SDQ-UK11 and SDQ-NL12). To record the pain severity, Van der Windt et al6 used a 0–100 visual analogue scale (VAS), while Hay et al7 used a 10 point numerical rating scale (NRS). To standardise these two outcome measures across both studies, measurements from the Hay study were transformed to 0–100 scales, where 100 indicates maximum pain or disability. The SDQ-UK comprises of 23 areas in which shoulder disability is assessed—for example, fastening clothing, reduced role in household jobs. To put this transformed 0–100 scale of disability into context, four points on the 0–100 scale would be approximately equal to the addition of one more area in which the participant reported difficulty on the original 23 item version of the SDQ-UK.
To rate person perceived recovery from baseline, both studies used a Likert scale, with 5 points for the Hay study and a 6 point scale for the Van der Windt study. Here, the scores from both studies were standardised by dichotomising to two groups into (i) those who had not improved or had worsened (“unchanged”, “worse”, “much worse”), and (ii) those who had improved (“recovered”, “improved” (UK); “recovered”, “much improved”, “somewhat improved” (Netherlands)).
We investigated differences between the two study populations regarding demographic and clinical characteristics collected at baseline. Summary data were calculated—proportions for categorical variables and means and standard deviations for numerical variables. For categorical data, difference in proportions and their associated 95% confidence intervals were calculated; for numerical data, mean differences and their associated 95% confidence intervals (CI) were calculated. Differences between the two study populations with regard to baseline pain and disability scores were also calculated: first, the unadjusted mean differences and 95% confidence intervals; second, the adjusted mean differences and 95% CI, allowing for any differences in the demographic or clinical characteristics between the studies (linear regression).
Comparisons of the course between the two trials, and between the two treatment groups within the trials, were made. Univariate and multivariate analyses were used to investigate the associations between potential prognostic indicators and outcome in the long term. For each of the three outcome measures examined (disability, pain, and perceived recovery) different models were built, with the model being parameterised to determine factors associated with a poor outcome—that is, a higher score for disability or pain (linear regression) and not improving or worsening (logistic regression). The variables “country” (Netherlands, UK) and “treatment” (injection, physiotherapy) were included in all models as covariates. All putative prognostic factors showing a univariate association with the outcome at issue (p<0.10) were put forward into a multivariate analysis (backward elimination (p<0.10)) to determine a group of factors that were independently associated with a poor outcome. We chose this cut off of p<0.1 to represent significance rather than the more conventional, but no less arbitrary, value of 0.05, the use of which has been shown to fail to identify factors known to be of importance.13 Analyses were carried out using Stata 7.0.14
In all, 203 patients were referred from the 60 participating GPs in the trial based in the Netherlands and 109 (53.7%) were randomised (56 to physiotherapy and 53 to corticosteroid injection). Reasons for exclusion were: diagnosis of capsular syndrome could not be confirmed (n = 73), no consent (n = 6), not eligible (n = 10), or they had recovered (n = 5).6 In the study by Hay et al,7 207 of 237 patients (87.3%) referred to the trial by the participating GPs were randomised (103 to physiotherapy and 104 to corticosteroid injection). Reasons for exclusion were no consent (n = 12), not eligible (n = 11), or they had improved (n = 7).
Table 1 presents the baseline demographic and clinical characteristics and measurements for both studies at baseline. The two studies were similar with respect to mean age, proportion of women, proportion with the dominant side affected, and onset of current symptoms. However, participants in the trial of Van der Windt et al reported a significant longer duration of current symptoms, a higher percentage of concomitant neck pain, and a lower percentage of recent use of painkillers. With respect to baseline measures of pain and disability, differences were apparent between the trials. Disability scores were significantly higher in the Dutch study, while conversely pain scores were significantly higher in the UK trial. After adjusting for demographic and clinical characteristics, the difference in pain severity between the two studies was reduced. However, the difference in disability scores persisted after this adjustment.
Course of shoulder symptoms
Despite a significant difference in improvement rates in the short term for the Dutch trial (difference = 17.6% (95% CI, 5.0% to 30.3%)), the pattern of improvement rates was similar over the longer term both between countries and between treatments within countries (table 2).
Figure 1 presents the course of “severity of shoulder disability” for each intervention, separately. At the long term follow up point (12/18 months), a decrease in disability score from baseline was seen for almost all participants (90.1%), regardless of treatment or country. The course of participants who received a corticosteroid injection was slightly more favourable in the short term for the Dutch trial, but in the mid- and long term both treatment groups were similar. The course for the two treatment groups from the UK trial were almost identical. Comparing the data from the two countries, combining the treatment groups, the average disability scores fell by 68% in the UK trial compared with 57% in the Dutch trial. Hence, despite a lower long term disability score in the UK trial, the change from baseline was similar in both trials, as the Netherlands trial had a greater mean disability score at recruitment. A similar pattern to that observed for disability was seen for pain severity during the day (fig 2). Again, despite different mean scores at baseline, the UK participants having higher scores, all four treatment groups had substantially improved at long term follow up.
Disability score at long term follow up
In the univariate analysis, after adjusting for country and treatment, the following were all associated with higher disability score at long term outcome: concomitant neck pain, gradual onset of symptoms (that is, over a few weeks), longer duration of symptoms at recruitment, and higher baseline pain and disability scores (table 3). In the multivariate analysis, concomitant neck pain, a gradual onset of symptoms, longer duration of symptoms at recruitment, and higher baseline disability score each increased the long term disability score (R2 = 23.7%).
At baseline, the mean disability score was 55 points on a scale of 0–100. By long term follow up this had reduced to a mean of 21 points. A substantial effect on follow up disability score was attributable to the presence of concomitant neck pain at baseline and to a gradual onset of the shoulder symptoms, with each of these factors being linked to an approximate 7 point increase in the follow up disability score among participants with these characteristics compared with those without. This is equivalent to having two additional areas of limited everyday functioning reported on the SDQ-UK. Longer duration of symptoms at baseline also increased disability score at follow up; comparing two participants, alike in all other respects, each extra month of recorded duration would increase the follow up score by 0.5 points. Not surprisingly, higher disability at baseline led to a higher score at follow up; this is equivalent to stating that for each two additional areas of limited everyday functioning recorded at baseline, one would be retained at follow up.
Pain severity during the day at long term follow up
In the univariate analysis, after adjusting for country and treatment, the following were associated with higher pain severity in the day at long term outcome: male sex, longer duration of symptoms at recruitment, and higher baseline pain and disability scores (table 4). In the multivariate analysis, being male, having a longer duration of symptoms recorded at baseline, and the severity of both baseline pain and disability scores each independently increased the long term pain scores (R2 = 9.22%).
At baseline, the mean pain score was 54 points on a scale of 0–100. By long term follow up this had reduced to a mean of 13 points. Sex had a substantial effect on follow up pain score with men having scores 6 points higher than women. As seen for long term disability, pain severity scores at long term follow up were higher for those with longer symptom duration at baseline; each additional six months of duration at baseline increased the pain score at follow up by approximately 2 points. Pain at long term follow up was associated with both baseline pain and disability score.
Perceived recovery at long term
Here, as the outcome measure is dichotomous—that is, recovered or not recovered—the results are presented as odds ratios (the odds of not recovering given presence of the risk factor compared with the odds of not recovering given the absence of the risk factor). In the univariate analysis, after adjusting for country and treatment, the following were all associated with a poor outcome (“not improving”) at long term follow up: male sex, gradual onset of symptoms, longer duration of symptoms at recruitment, and higher baseline pain and disability scores (table 5). In the multivariate analysis, being male, reporting a gradual onset of symptoms, and higher baseline disability scores were independently associated with not recovering.
Men compared with women, and those who reported a gradual compared with a sudden onset, were at a threefold increased odds of not recovering. For each additional point on the disability score at baseline, the odds of a poor outcome were increased by 3%; hence for two participants who were 10 disability points apart at baseline, the one with the higher score would be 30% more likely to have persistent symptoms at long term follow up.
Comparing data from two large recent randomised clinical trials of shoulder pain in primary care gave us the opportunity to investigate the generalisability of these findings. Our analysis confirmed that, as expected from the inclusion and exclusion criteria, there were differences between the two study population in terms of their characteristics at entry to the trial. Despite these differences, however, the long term effect of treatment appears to be similar both within each trial and across both trials.
The group of prognostic indicators associated with each of the outcome measures examined differed with only one factor (disability score at baseline) common to each model. Disability, symptom duration and baseline pain level were the only factors to reach moderate to high evidence for predicting outcome in a recent systematic review of cohort studies.15 Prognostic models are unsuitable for making inferences on interventions to improve prognosis and so the models derived here are suitable for predicting long term outcome only—that is, they cannot imply causality.
Some of the heterogeneity seen in the clinical characteristics of the two study populations partly reflects the different exclusion criteria and definitions of “shoulder complaint” used. For example, Hay et al,7 unlike Van der Windt et al,6 excluded patients who had previously consulted for the same shoulder problem in the past 12 months. However, for the majority of the Dutch participants, the consultation leading them into the trial was their first in that year period. Van der Windt et al attempted to assemble a group of patients with a single diagnosis (capsular syndrome). This differed from the more general definition of “shoulder pain” as used by Hay et al. The higher level of baseline shoulder disability and higher prevalence of concomitant neck pain seen in the Dutch trial could be related to the different diagnostic criteria used. Indeed, when a subgroup of UK participants with shoulder restriction (either in active abduction or external rotation) was compared to those without restriction, those with restriction had higher baseline disability scores. The shorter duration of symptoms at baseline in the UK participants is likely to reflect the requirement that participants should not have consulted with their affected shoulder in the previous 12 months.
It is curious that the Dutch participants had higher baseline disability but lower pain scores that the UK participants. This finding suggests that the shoulder disability questionnaires used are indeed measuring something other than pain. This is likely to be particularly so for the SDQ-UK, which includes various questions about the more general effects of shoulder pain on health status (for example, irritability and so on). By contrast, the SDQ-NL is more restricted in its content, including questions mainly focusing on the effect of pain on limitation of function. This finding has been reported previously, where a higher correlation was seen between the SDQ-UK and the EuroQol, a generic health outcome measure, than between the EuroQol and the SDQ-NL.16
There was no evidence from either study that local steroid injection conferred long term benefit. Local steroid injection offered some benefit in terms of improvement in short term pain and disability only in the Dutch trial. This difference between the trials might relate to different patient selection, different steroid preparations, or differences in injection techniques. For example, the majority (75%) of the Dutch participants randomised to injection received two or three injections in the treatment period compared to one in the UK trial.
Pooling data from randomised trials potentially allows for the detection of important differences in secondary outcome measures for which the original trials were not individually powered to detect. In our study such analysis was hampered by a lack of consistency in the use of outcome measures. Although we attempted to standardise the two SDQs used in the trials, there appeared to be some differences relating to the content these two tools which compromises the validity of this approach.16 Hence the authors agree that a consensus on a core set of outcome measures for shoulder pain in needed.16,17
Despite the clinical heterogeneity apparent in the two study populations, the overall findings of the two trials suggest that shoulder injection and physiotherapy are similarly effective in the long term at reducing both pain and disability in patients presenting to primary care with shoulder pain. The results of this analysis suggest that there is no long term difference in outcome between patients treated with different clinical interventions in different clinical settings, or having different clinical diagnoses. Baseline characteristics of the population (gradual onset, duration and severity of symptoms) were the most powerful predictors of outcome. This has important implications for future interventions for shoulder pain; whether it highlights the need for earlier intervention or reflects different natural histories of shoulder pain is a topic for further research. However, the percentage of the variance explained in the models is quite low, which means that there are other factors not included in the model (either measured or not measured) that may explain a further amount of the variability in outcome among patients with shoulder disorders.
We thank the GPs, staff, physiotherapists, and participants involved in the two trials. The trials were funded by the Arthritis Research Campaign (UK), the Netherlands Organisation for Scientific Research (NWO), and the Fund for Investigative Medicine of the Health Insurance Council (Netherlands). ET would like to thank the NWO/British Council for funding a research visit to the Institute for Research in Extramural Medicine, Amsterdam.