Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
See article on page 82
Shoulder pain is common, affecting 15–30% of adults at any one time,1 of whom 1 in 20 will visit a general practitioner in the course of a year.2 What is the most useful way to categorise such a common symptom to measure its impact, study its aetiology, and determine the efficacy and effectiveness of treatment?
The clinical literature gives primacy to classifications based on presumed pathology. There is little concrete evidence that exalting most shoulder pain with terms such as tendonitis, bursitis or impingement syndrome is either reliable or useful, and such classifications cannot anyway provide a measure of outcome. Clinical measures of shoulder function, such as range of movement, provide a means to classify shoulder problems into subgroups and to assess change over time. However, they do not necessarily reflect patient well being or the ability to carry out usual activities. So the field is open for self completed questionnaires that assess symptom severity and the impact of shoulder pain on everyday living.
At least four instruments that measure function in this way have appeared recently.3-7 They include one (the Dutch Shoulder Disability Questionnaire or SDQ), which has received further study in this issue,6 ,7 two American schedules: the SPADI,3 and the Shoulder Rating Questionnaire or SRQ,4 and a British disability questionnaire.5 A number of others, such as the Dutch Shoulder Function Assessment,8 incorporate range of movement measurements by an observer, in addition to items completed by the patient. Each, like many new scales accepted for publication these days, have met some or all of the technical criteria of a “good” instrument (repeatability, validity, and responsiveness to change). Readers who are seeking a simple standard tool for use in clinical practice, audit or research, may justifiably be bewildered at the number, range, and persuasiveness of these instruments. Can we ask some simple questions of such measures to choose between them?
A first question is whether instruments specific to the shoulder are superior to measures of general function and health status such as the Short-Form 36.9 The latter can be used to compare the effects of different musculoskeletal conditions on daily functioning in different patients or to summarise the aggregate effect of different health problems in one person. However, studies that have compared these two types of measure (in osteoarthritis for example10 ) have concluded that outcome studies are more powerfully served by specific measures. Winters and colleagues have also pointed out the practical constraints on using long generic measures in routine clinical practice and to the redundancy of much of the information obtained from them in subjects with isolated shoulder problems.11 More difficult to judge, in the absence of comparative studies, is the place of instruments that relate to a wider area than the shoulder itself, such as the Disabilities of the Arm, Shoulder and Hand schedule (DASH).12
The four questionnaire schedules referred to above3-7 were all developed as specific instruments for measuring the impact of shoulder problems on daily life. Beyond the jargon and technicalities of questionnaire assessment, what validation is likely to be useful? We want such instruments to reflect real life. The Dutch group behind the SDQ for example6 ,7 considered this by selecting items from real clinical histories, consulting with 273 physiotherapists to identify what they perceived to be the 15 most important items, and finally piloting the 15 item schedule with a new group of patients.
The real life that is reflected, however, will depend on the setting chosen for developing and testing the questionnaire. The American schedules, for example, relate to outpatients and specialist office populations in America, where, for example, in the SRQ study4 56 of the 100 patients studied had an operation in the year after the first administration of the questionnaire. This is clearly a different context to the two settings in which the British questionnaire5 was developed (the general population and primary care) or to the outpatient and primary care fields in which the Dutch SDQ has now been tested.6 ,7 Other types of shoulder assessment have been tested in disease specific patient groups, such as those with rheumatoid arthritis,8 or in occupational settings.12
Another potentially desirable quality is the range of “domains” covered by the instrument. The British study for example was developed from a generic disability questionnaire, drawing questions from 11 of 12 categories of daily living,5 and the American SRQ was based on six different domains, including one specifically considering patient satisfaction.4 By contrast the SPADI covered only two domains,3 and this meant that sleep disturbance for example did not appear in the SPADI despite it being a frequently reported problem in other studies. Although the inclusion of a wide range of activities potentially affected by shoulder pain seems an attractive quality, we need evidence that it is relevant and useful. Some symptoms, such as irritability or bad temper, may be weak or redundant measures of the overall impact of shoulder pain on daily life.
How important are the technical characteristics of a questionnaire? Repeatability for example can be crucial to observer administered items, such as range of movement measures, but may be a rather over-hyped characteristic for self reported symptoms and daily activities. Reports of poorly repeatable questionnaires in the literature are rare and fluctuations over time in an episodic condition such as shoulder pain will tend to be attributed to change in the condition rather than to poor repeatability.
In the absence of a “gold standard” measure of disability, criterion validity asks : “Do the results using the new instrument reflect results from other measures which might also reflect severity or change?”. For example, the Dutch SDQ was compared with a pain scale and a single question about function,6 ,7 the American SRQ with a generic measure of arthritis impact,4and the other two with range of movement at the shoulder.3 ,5 Criterion validity carries the danger of circularity, particularly in the musculoskeletal literature. For example you might read one week that a range of movement scale of shoulder problems is valid because it correlates with a pain score; the next week you read about a pain severity scale that has been validated by its ability to reflect range of movement. Such exercises are in the end a comfort rather than a proof of validity.
Most interest nowadays concerns the use of questionnaires as outcome measures, and in particular their responsiveness to change over time. Of the measures considered here, the Dutch SDQ can now claim to have the most meticulous evidence about this,7 although the American SRQ provides data on preoperative and postoperative comparisons.4 Responsiveness can be judged in relation to another measure of change, such as, in the SDQ study, the patient’s own judgement of progress on a simple six point scale. Such a simple scale is an appealing yardstick, but why not use it anyway instead of the more complex instrument? The arguments in favour of complexity are both practical and technical. A complex scale allows change in specific activities to be explored, more variation between individuals across a wider range of domains to be summarised, and individual responses to be standardised.
One solution to the problem of the patient’s own perception being the main gold standard of change was proposed by Guyatt.13This entails asking the patient at baseline to select the item they perceive to be their most important restriction and then measuring changes in it over time as the main outcome. Two of the reviewed questionnaires4 ,6 ,7 incorporate a version of this idea. One reported difficulty with it is that patients change their perception of their most important problem by the time of the follow up visit.
Some of the studies considered the concept of “minimal important change”.13 This is the actual change in a questionnaire score that reflects clinically important change in the patient’s condition. This is a nice idea, getting away from statistical analyses and presenting actual changes. But how is it estimated? Well once again the patient’s own estimate of their progress is trundled out to validate it. The added value of this new measure remains to be clarified.
The final questions to raise about these instruments are more basic. Questionnaires should be simple, easy to use, and look like common sense. There should be evidence that they will work in the setting in which you would like to use them, and so the paper in this journal7 that takes an instrument developed in an outpatient department6 and re-examines it in a general practice setting is to be applauded. Different instruments may be required for different patient groups, but this must be balanced against the need to standardise results from different studies by the use of a single instrument.
This last point is one reason why we should attempt to identify a first choice. There are at least four reasonable shoulder specific questionnaires in the literature, many more when generic and clinical instruments are included.14 The questionnaires considered here seem sensible and incorporate efforts to reflect real life. They are also intriguingly different. They have been developed in different languages, different cultures, and, most importantly, different patient groups, and they emphasise different aspects of the “shoulder experience”. One focuses on the hurt that is associated with various shoulder movements,6 ,7 another on the difficulty that is experienced in doing the various tasks,5 while the American schedules consider both pain and activity restriction.3 ,4 Two have more obvious credentials for use in the community or in primary care5-7; one is well tested in a group undergoing surgery.4 They all go some way to meeting the technical demands of questionnaire design.
An editorial should leap off the fence, but my only rational conclusion is to urge those of you who want to design another shoulder questionnaire to reconsider. A systematic review of shoulder schedules would be useful in laying out a baseline of all published instruments, in helping decisions and in designing changes or comparative studies; and there is at least one excellent example in the occupational literature that reviewed 52 instruments related to the neck and upper limb.14 The authors of the paper in this issue suggest that more comparisons should be carried out7 and this seems good advice: more empirical testing of published questionnaires to build up our practical knowledge: how useful are they? how easy to use? how relevant to different situations? While admitting to one seductive but totally unacceptable argument in favour of using a particular questionnaire (“I’m one of the authors”5), in the name of science and sanity I would encourage critical comparative studies of published schedules to determine a standard measurement scale for studies of shoulder problems.
I am grateful for the many discussions with the shoulder research teams at the Arthritis and Rheumatism Council’s Epidemiology Research Unit, at the EMGO Institute of the Free University in Amsterdam, and at the University Hospital in Leiden. Sue Willson for the manuscript. Mike Doherty for his patience.