Objective: To compare the performance of the several different diagnostic criteria sets currently in use for polymyalgia rheumatica (PMR).
Methods: 213 patients attending eight rheumatological centres in eight different European countries were studied. All had recently been referred and were considered by the senior investigator at each centre, selected because of their experience in treatment of PMR, to have this condition. By use of a standard international proforma, the requisite diagnostic points in each criteria set were sought. Sensitivity for each criterion from each set was then calculated, as well as the sensitivity of each criteria set as a whole.
Results: Of four criteria sets compared, the Bird (1979) criteria performed best with a sensitivity of 99.5%, and the Hunder (1982) criteria second best, with sensitivity of 93.3%. These both performed significantly better than the two other criteria sets, though each of these was admittedly developed for rather specialised reasons.
Conclusions: Although this study compares homogeneity, we suggest the Bird 1979 or Hunder 1982 criteria should be used whenever possible. Studies that have used alternative criteria may have less sensitivity in diagnosis.
- PMR, polymyalgia rheumatica
- polymyalgia rheumatica
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Although the disease first described by Bruce in 18881 was probably polymyalgia rheumatica (PMR), this term was not used formally until first suggested by Barber in 1957.2 Working from a spa centre in the north of England, Barber drew on his clinical experience with large numbers of patients to define a condition for which no diagnostic test existed. Unless PMR is associated with giant cell arteritis and its typical histological features, this situation still holds true. As a result, a variety of clinical diagnostic criteria sets have been suggested over the last 25 years to aid future research. To date, there has been no formal comparison of these diagnostic criteria sets in a clinical setting.
The first criteria set to be formally proposed was a multi-collaborative one from 11 United Kingdom rheumatology units in 1979, which led to the Bird/Wood criteria.3 This was soon followed by two further criteria sets (from Jones and Hazleman in 19814 and from Hunder and colleagues in 19825), both of which were based more on clinical expertise than on epidemiological analysis. Two further criteria sets were developed for specific purposes. That from Wilke (1985)6 was specifically designed for use in giant cell arteritis so has not been included in this study, and the Nobunaga criteria of 19897 were designed specifically for a Japanese population and their requirements.
On an initiative of the European League Against Rheumatism’s Standing Committee on Clinical Trials Including Therapeutic Trials (ESCISIT), a European collaborating PMR group has been established. As a result, an opportunity arose to compare the several different diagnostic criteria sets. This study was undertaken from eight different rheumatology centres in eight countries across Europe, which felt able to participate in the protocol that had been circulated to representatives on the committee as well as to all European centres that had published extensively on PMR in the previous decade.
This study formed part of a larger study from the European collaborating PMR group, using mainly the same group of patients, which has led to the definition of formal response criteria for the condition.8
It was a prerequisite for participating centres that each should have a senior physician with substantial previous experience in the diagnosis and management of PMR, preferably with previous publications in the field, and that adequate facilities should exist for follow up and reasonable investigations to exclude conditions that might mimic PMR, though the extent of such investigation was left to the discretion of the contributor. However, where there was clear diagnostic ambiguity, these investigations were mandatory. This method of selection of centres was judged preferable to attempting a true pan-European distribution. Centres were only included for analysis if they contributed at least 10 patients over a four year period starting in 1998. On this basis, data from several other centres that could only produce fewer numbers were not used.
At first visit a full history was taken from each patient with suspected PMR. Blood was taken for measurement of erythrocyte sedimentation rate (ESR) in the clinic and, where possible, for C reactive protein (providing this assay was available in the local laboratory).
A proforma was completed, allowing an assessment of the extent to which symptoms, signs, and initial results accorded with the different criteria sets. If there was diagnostic confusion, investigations were done to exclude the presence of alternative diseases, as stipulated in table 1⇓. Where diagnosis was in doubt patients returned after up to one week for these results to be scrutinised.
Where the only diagnosis, on the basis of these tests, was still felt to be PMR in the opinion of the senior recruiting physician, the patient was recruited to the study and treatment was started immediately. On ethical grounds this was left to the discretion of the physician but was normally prednisolone (non-enteric coated) 20 mg/day until the symptoms responded, when the dose was reduced in steps according to the judgement of the clinician.
It was mandatory for patients to return within two weeks of starting treatment for the physician to determine whether a significant response to the steroid had occurred. Further follow up was encouraged (though it was not mandatory) at eight and 16 weeks for adjustment of the steroid dose as felt necessary by the rheumatologist. At each subsequent visit the diagnosis of PMR was further reviewed. At six months, if there was any doubt about the diagnosis or if any other condition, such as malignancy, had developed, the patient was excluded from analysis.
Those patients from this group for whom longer (up to two years) and more frequent follow up could be provided were also included (at the discretion of the investigator) in the parallel running “response criteria” study.8
Data recorded on the proforma comprised the following: initials, date of birth and sex, shoulder stiffness, neck involvement, shoulder involvement, upper arm involvement, hip involvement, thigh involvement, and whether these were unilateral or bilateral, early morning stiffness, rapid onset of symptoms, associated depression and/or weight loss, visual disturbance, headache, jaw ache, and neurological features. Early morning stiffness was estimated in minutes and tenderness of the shoulders or upper arm and swelling of the shoulder joint both recorded. ESR was measured (in one centre, Leeds, viscosity was substituted) and C reactive protein was also estimated where practical.
Comparison of the data was then made with the four selected diagnostic criteria sets. Sensitivity (the proportion of patients with the disease who are positive for the feature when related to all individuals with the disease) for each attribute and for each criteria set was calculated. Comparison of the separate sets was made by this method.
Local ethical approval was obtained at each participating centre.
The number of patients participating in the study from each centre is shown in table 2⇓.
The performance of each of the criteria sets and their component parts is shown in tables 3⇓ to 6⇓. The first column lists the criterion, the second the number of patients possessing this criterion set against the number of patients in the study for whom this criterion was evaluable on the data provided, and the third lists this figure as a percentage (the sensitivity of the criterion). The sensitivity of the performance of the criteria set as a whole is also given at the bottom of each table.
Some of the criteria sets studied used the absence of a particular disease as one of the criteria required. Because exclusion of such conditions was inherent to the study, we have assumed that this criterion was present in all patients included, though this was not always documented in respect of each individual patient. We have indicated where we have made this assumption.
Unfortunately it proved particularly hard to evaluate the Jones/Hazleman criteria, first because our proforma did not distinguish disease duration longer than two weeks, making a precise estimate of their two month cut off point impossible, and second because it uses C reactive protein. Even when C reactive protein determinations could be obtained, the varying normal range of the many different assays used throughout Europe prevented us from checking this criterion accurately.
The proposal of diagnostic criteria remains a problem, particularly for diseases where at present no clear diagnostic test exists. Even though more information may be available by way of laboratory and radiological investigations of diagnostic validity, ultimately diagnosis is still based on the consensus of a group of experienced clinicians. The ARA diagnostic criteria for rheumatoid arthritis9 and for systemic lupus erythematosus10 provide similar examples. In the absence of a pathological gold standard (except for cranial arteritis) we are unable to propose any alternative method of formulating criteria, and our methodology for the comparison of existing criteria sets was based upon this premise. As important as diagnostic criteria are response criteria. The American College of Rheumatology has moved in this direction more recently for more accurate assessment of response in clinical trials.11 Our collaborating PMR group recognises this, and the assessment of diagnostic criteria formed only one part of a larger study in which response criteria were evaluated (for the first time in the literature),8 largely from patients who participated in this and other studies but were available for longer term follow up.
A full assessment of the value of diagnostic criteria needs consideration of both the sensitivity and the specificity of each criterion. Each criterion is then selected on the basis of a combination of sensitivity and specificity. To behave well a criterion should have high values for each. Sometimes sensitivity and specificity values are added together to produce a “relative value”, which many find easier to understand than the Youden index.3 A fundamental weakness of this study remains that the data available did not allow the calculating of specificity for each attribute as it was not possible to enrol a control group of patients with conditions that mimic PMR for identical study in each of the many participating centres, largely on the grounds of expense. Therefore, our results might be better described as a “test of homogeneity” and it is possible our results are biased towards the study of “true” PMR rather than a clinical syndrome that might appear indistinguishable initially. In addition, patients with giant cell arteritis were not specifically sought and, by implication, excluded, though this reflects conventional practice throughout Europe whereby the majority of such patients are initially referred to ophthalmologists rather than to rheumatologists.
It is also accepted that our selection of centres (and therefore the geographical distribution of patients) is a little arbitrary, based as it was on the countries and centres available for collaboration at the initiation of the study. For diagnostic criteria, we felt it important to insist on the diagnostic opinion of a senior physician experienced in this disease. However, the inclusion of patients from eight different centres ensured that no single opinion predominated. We also excluded centres that were unable to contribute 10 or more patients, mainly for logistical reasons and partly on the grounds of expense.
With recruiting centres selected primarily for these reasons, we were then dependent upon the local customs at each centre. Some European countries have greater access to longer term follow up than others, and the mechanics of the study deterred participation of centres in several other European countries that were unable to provide long term follow up or which had experienced increasing difficulty in recruitment, for PMR is a disease now often treated by the primary care physician and not necessarily referred to hospital. The facilities for diagnosis also varied between centres, restricting the collection of investigations on patients with diseases that mimic PMR, and preventing an analysis by specificity as well as by sensitivity. A particular difficulty was the lack of C reactive protein at many of the participating centres, which relied upon ESR alone. This made comparison of the Jones/Hazleman 1981 criteria particularly difficult, a point that had not been appreciated when the study was designed and the centres recruited.
Although pan-European collaboration is to be encouraged, we also have slight anxiety that all participants adequately understood the English required or, whenever translation by the clinician was required, individuals were adequately able to separate “stiffness” from “pain”, which is a feature of some criteria sets.
One potential source of error, requiring clarification, is that the senior physician in each of the recruiting centres might have had preconceptions about the various criteria sets used, inevitably defining PMR by comparison (conscious or subconscious) with an existing criteria set. A specific concern is the large number of patients recruited from a single centre (Leeds), the address of the principal author of this paper and the first author on one of the papers defining a criteria set studied. In order to avoid any bias, for the purposes of this study only patients treated by a consultant colleague (CP) were used in this analysis. The first author (HB) participated in a coordinating role only. Although this still may not completely exclude such bias, we think it unlikely that this has occurred. Moreover, the weight of the many other participating centres included in the study would also reduce it.
Those criticisms apart, to our knowledge this represents the first systematic comparison of the several different criteria sets currently in use. It is of interest that criteria sets based on the option of including some of a group of criteria perform better than those for which the presence of all of a set of criteria are required. The optimum study, which would have allowed study of both sensitivity and specificity (the best criteria performing well on both of these attributes), is perhaps better suited to a more detailed study of larger numbers of patients in just one or two centres in the most “Westernised” countries, providing patients are seen in the hospital rather than in primary care. Therefore, a “best buy” criteria set for clinical trials may have eluded us but at least our analysis, based only on sensitivity, ensures that with the use of the sets that perform best, patients with the condition will not be missed in clinical practice.
The study has also provided a salutary insight into pan-European collaboration, a laudable aim that is one of the founding principles behind EULAR. This study, which had the support of ESCISIT, was one of the early endeavours initiated soon after this committee’s reconstitution in the mid-1990s. A firm policy decision was taken to encourage participation on the basis of enthusiasm and an interest in the disease rather than on the basis of scientific credibility alone, which might have restricted the study to western Europe. As a result, several countries in what was then “eastern” Europe participated even before the enlargement of the European Union. The study therefore provides an interesting exercise in the advantages and disadvantages of this particular method of selecting participating centres. Inevitably, with the meagre resources available at certain centres, the benefits of multinational collaboration have here taken precedence over the benefits of scientific homogeneity, which would have been more restrictive.
The following workers in the following centres provided additional clinical support: Roberto Caporali (Pavia), Dusan Logar (Ljubljana), Michael Mates and Moshe Sonnenblick (Jerusalem), Alena Tuchynova (Piestany), and Raili Vaas (Tartu).