Efficacy of non-steroidal anti-inflammatory drugs for low back pain: a systematic review of randomised clinical trials
- aInstitute for Research in Extramural Medicine, Vrije Universiteit Amsterdam, the Netherlands , bInstitute for Rehabilitation Medicine, Erasmus University Rotterdam, the Netherlands
- Dr B W Koes, Institute for Research in Extramural Medicine, Faculty of Medicine, Vrije Universiteit, Van der Boechorststraat 7, 1081 BT Amsterdam, the Netherlands.
- Accepted 10 January 1997
PURPOSE To assess the efficacy of non-steroidal anti-inflammatory drugs (NSAIDs) for low back pain.
DATA SOURCES Computer aided search of published randomised clinical trials and assessment of the methods of the studies.
STUDY SELECTION 26 randomised clinical trials evaluating NSAIDs for low back pain were identified.
DATA EXTRACTION Score for quality (maximum = 100 points) of the methods based on four categories: study population; interventions; effect measurement; data presentation and analysis. Determination of success rate per study group and evaluation of different contrasts. Statistical pooling of placebo controlled trials in similar patient groups and using similar outcome measures.
RESULTS The methods scores of the trials ranged from 27 to 83 points. NSAIDs were compared with placebo treatment in 10 studies. The pooled odds ratio in four trials comparing NSAIDs with placebo after one week was 0.53 (95% confidence intervals 0.32 to 0.89) using the fixed effect model, indicating a significant effect in favour of NSAIDs compared with placebo. In nine studies NSAIDs were compared with other (drug) therapies. Of these, only two studies reported better results of NSAIDs compared with paracetamol with and without dextropropoxyphene. In the other trials NSAIDs were not better than the reference treatment. In 11 studies different NSAIDs were compared, of which seven studies reported no differences in effect.
CONCLUSIONS There are flaws in the design of most studies. The pooled odds ratio must be interpreted with caution because the trials at issue, including the high quality trials, did not use identical outcome measures. The results of the 26 randomised trials that have been carried out to date, suggest that NSAIDs might be effective for short-term symptomatic relief in patients with uncomplicated low back pain, but are less effective or ineffective in patients with low back pain with sciatica and patients with sciatica with nerve root symptoms.
Low back pain is an important medical and socio-economical problem in western societies.1-3 A variety of therapeutic interventions are available, but, their efficacy often remains unknown.4 5 Consequently, decisions regarding optimal management strategies are not easy for physicians and therapists involved with the care for patients with low back pain. Possibly as a consequence of this situation the management of low back pain shows typically a large variation.6-9 The Quebec Task Force on Spinal Disorders reported in 1987 that the efficacy of most interventions had not been demonstrated by sound randomised clinical trials.4 In our recent series of review articles we assess the available randomised clinical trials to evaluate the scientific evidence of common interventions for low back pain. In earlier review articles we have reported on the efficacy of exercise therapy, spinal manipulation and mobilisation, bed rest and orthoses, back schools, traction therapy, and epidural corticosteroid injections.10-15 In this article we will focus on the efficacy of non-steroidal anti-inflammatory drugs (NSAIDs) for low back pain.
Worldwide NSAIDs seem to be the most commonly prescribed medications16 and they are also widely used for patients with rheumatic disorders, including low back pain.9 17 The US clinical guidelines for the management of acute low back pain state that there is fair to good evidence for the prescription of NSAIDs for symptom control when the patients’ response to non-prescription analgesics is inadequate.18 Their recommendation is based on four randomised clinical trials meeting their selection criteria only.19-22 The clinical guidelines from the UK, based on the same information, also recommend prescription of NSAIDs (and simple analgesics) in the early management strategy as symptom pain relief to prevent disability.23
The rationale of NSAIDs treatment for low back pain is based both on their analgesic potential and their anti-inflammatory action.24 25 To determine the current situation regarding the efficacy of NSAIDs for low back pain, we systematically assessed the evidence from published randomised clinical trials. As even randomised clinical trials may show biased outcomes related to methodological shortcomings in the design,15 strong emphasis is laid on the methodological quality of the trials.
SELECTION OF STUDIES
A MEDLINE literature search was carried out for the period 1966-1994 (keywords (MeSH): backache, low back pain, anti-inflammatory agents, non-steroidal (including all minor sub-headings). An EMBASE (Drugs and Pharmacology) search was carried out for the period 1980-1994 (keywords: non-steroid anti-inflammatory agent, backache, low back pain). In addition, the references given in relevant publications were further examined. Abstracts and unpublished studies were not selected. Studies had to meet the following criteria: (1) concerned a randomised clinical trial; (2) one treatment regimen included an NSAID (additional interventions were permitted); (3) the study subjects suffered from low back pain (or at least a subgroup of which the results are presented separately); and (4) the article was written in English.
ASSESSMENT OF METHODOLOGICAL QUALITY
All eligible trials were scored according to the criteria listed in table 1. The criteria are based on generally accepted principles of intervention research. Similar criteria have previously been used to assess the methodological quality of trials evaluating other therapeutic interventions for low back pain.10-14 26 27To each criterion a weight was attached indicating their putative relative importance. The maximum score for each study was 100 points. Items B, C, E, J, K, M, O, are relevant for assessing the internal validity of the trials.13 All trials were assessed by two reviewers (RJPMS, JMAM) independently of each other. In a subsequent meeting they had to reach consensus on each criterion they initially disagreed upon. Where disagreement persisted, a third reviewer (BWK) made the final decision. The assessments resulted in a hierarchical list in which higher scores indicate studies of higher methodological quality. The outcome of the studies will be discussed in relation to their methodological scores.
OUTCOME OF THE STUDIES AND STATISTICAL POOLING
A study was judged to be positive if the authors concluded that the NSAID at issue was more effective than the reference treatments (for example, placebo capsules, other NSAIDs or other (drug)therapy). Usually this meant that the difference in effect for the primary outcome was statistically significant at the conventional 5% level. In a negative study the authors reported no differences between the study treatments, or even better results in favour of the reference treatment.
Pooling was to be limited to studies of which the characteristics (that is, NSAID treatment/reference treatment, patients, and outcome) were clinically sufficiently similar. After assessment of the trials we agreed that only the placebo controlled trials were sufficiently similar to permit statistical pooling. We attempted to pool data for acute and chronic low back pain patients separately, using the (forced) success rates determined one and two weeks after randomisation. The results of a subset of trials were pooled statistically using Peto’s ‘observed minus expected’ method. We included a test for homogeneity of the odds ratios (ORs) of the randomised controlled trials.28 If there was heterogeneity, we present ORs and 95% confidence intervals (CIs) using the fixed effects model as well as the more conservative random effects model.29 Results are presented as ORs with corresponding 95% CIs. Treatment failures were compared between the intervention groups: an OR below 1 indicates a better outcome of the NSAID at issue. Sensitivity analysis was carried out by performing separate meta-analyses on subsets of trials based on methodological quality (that is, those higher and lower than 50 points).
A total of 26 trials met the inclusion criteria and were included in this review. Of these, four trials were published between 1960-1970, four between 1971-1980, 13 between 1981-1990, and five were published after 1991. Table 2 presents the trials in hierarchical order, according to their methodological quality.
Initially, there was disagreement between the two independent reviewers in 207 (20%) of the 1040 items scored. Disagreement mainly occurred because of reading and interpretation errors, and in the assessment of the two studies using a crossover design.22 30 Most of the disagreement was solved in a subsequent consensus meeting. The third reviewer had to make a final decision in 15 instances, mainly relating to criterion (C) ‘comparability of baseline characteristics’ in the case of the two crossover trials.
Table 2 shows the wide range in methodological scores (range 26-83). There were nine studies that scored more than 50 points (maximum score = 100). The median score was 48 points, indicating the overall moderate methodological quality of the trials. The most prevalent methodological shortcomings were (B) no description of randomisation procedure, (C) non-similarity regarding relevant baseline characteristics, (D) no adequate description of drop outs, (F) the small size of the study populations included, (H) no placebo control group, (M) no blinded outcome measurement (N) no long term (six months or longer) follow up (O) no intention to treat analysis, including a worst case analysis in cases with more than 10% loss to follow up.
If we consider the validity items only (items: B, C, E, J, K, M, and O from table 1) there seem to be no important changes in the hierarchy of the trials. In the top of the list Hosie31 remains the best study with 34 (71%) out of the maximum of 48 points for validity, followed by Goldie32 with 33 points. At the bottom of the list Postaccini19 remains with two points.
Overall, there were nine positive and 12 negative studies. In two studies positive results were reported for a subgroup only, and in three studies no conclusion was drawn. As the NSAIDs were compared with different reference treatments we present the results for comparisons with placebo (table 3), other (drug) therapy (table 4), and other NSAIDs (table 5), separately.
COMPARISONS WITH PLACEBO THERAPY
In five of 10 trials in which an NSAID was compared with a placebo the authors reported better results with the NSAID (table 3). Two trials reported positive results in a subgroup only, and in two other trials the authors reported no differences between the NSAID and the placebo. In one trial no conclusion was drawn. Of the five studies with methodological scores above 50 points, two reported a favourable outcome of NSAID in patients with acute low back pain. One reported favourable results of NSAID in a subgroup of acute low back pain (that is, those with initial moderate to severe pain) only. The two other studies reported no differences in effect between the NSAID and the placebo in patients with (a) acute low back pain and sciatica and (b) acute sciatica with nerve root symptoms.
In four of 10 trials patients were allowed to use rescue analgesics, usually paracetamol and codeine. In two of these the patients in the placebo group significantly used more rescue analgesics.20 47 In the two other trials there were no significant differences between the study groups regarding the use of additional analgesics.22 33 In three trials no rescue analgesics were permitted32 38 50 and in three other publications rescue analgesics are not mentioned at all.21 36 43
COMPARISONS WITH OTHER (DRUG) THERAPIES
There were nine trials comparing NSAIDs with other (drug) therapies (table 4). In five trials NSAIDs were not better than the reference treatment in patients with acute low back pain (four studies) and in chronic low back pain (one study). In three trials NSAIDs were reported to be better than the reference treatment in acute low back pain (two studies) and in chronic low back pain (one study). In one study no conclusion was drawn. Unfortunately, only one study scored more than 50 points. In this study NSAIDs were found to be more effective than paracetamol in patients with chronic low back pain.34
COMPARISONS BETWEEN DIFFERENT NSAIDS
In 11 trials a comparison was made between different NSAIDs (table5). In seven of these, there were no differences in effect between the NSAIDs for patients with acute or chronic low back pain. In three studies positive results were reported of one NSAID over the other(s) and in one study no conclusion was drawn. There were only three studies with more than 50 points. All three showed no difference in effect between the NSAIDs under study, although the authors of one study were more positive about one of the drugs.37
Complications or side effects of NSAIDs were reported in most of the trials included in this review. The number of patients reporting side effects varied from 0% to 31%. The side effects usually concerned mild to moderately severe events, such as abdominal pain and diarrhoea, and other side effects such as oedema, dry mouth, rash, dizziness, headache, tiredness, etc. There seemed to be no clear difference in the reported number or severity of side effects, or both, between the different types of NSAIDs.
Only the placebo controlled studies were regarded to be sufficiently similar to permit statistical pooling of the data. In general, the methodological quality of the placebo controlled trials was higher than the trials investigating other contrasts. Most placebo controlled trials involved patients with acute low back pain (duration less than six weeks). Of the 10 trials, seven involved patients with acute low back pain, two with chronic low back pain, and in one study the duration was not described.50 We refrained from performing a pooling of the two trials on chronic low back pain, because in one of these22 success rates could not be extracted. Of the seven placebo controlled trials on acute low back pain, three trials presented insufficient data to extract success rates. Unfortunately, this concerned two studies with relatively high methods scores.20 33 Efforts to contact the authors to obtain additional information did not succeed.
Four studies were included in the meta-analysis in which the short-term results were pooled (fig 1).21 32 38 47 The χ2 value for homogeneity of the ORs was 4.34 (3 df; p = 0.227). The pooled odds ratio for the success rate determined after one week was 0.53 (95% CI 0.32 to 0.89) using the fixed effect model, indicating a significant effect in favour of NSAIDs compared with placebo. Using the more conservative random effect model the point estimate remained similar (0.54) with wider confidence intervals (95% CI 0.29 to 1.00).
Separate meta-analysis of both studies with methods scores above 50 points resulted in an OR of 0.61 (95% CI 0.28 to 1.32) (fixed effects model). The meta-analysis of both studies with methods scores less than 50 points resulted in a lower OR of 0.48 (95% CI 0.25 to 0.95), indicating somewhat larger effects reported in the methodologically weaker trials.
Three studies in which the results after two weeks were included, were pooled. All three had methods scores above 50 points.32 36 38 The χ2 value for homogeneity was 4.58 (2 df; p=0.101). The pooled OR for the success rate determined after two weeks was 0.46 (95% CI 0.30 to 0.72) using the fixed effect model, indicating a significant effect in favour of NSAIDs compared with placebo. Using the random effects model the point estimate was 0.58 (95% CI 0.20 to 1.68).
This review shows some important methodological shortcomings in randomised trials evaluating the efficacy of NSAIDs in low back pain. The randomisation procedure was seldomly described, making it impossible for the reader of the article to discover if procedures were used that definitely excluded bias.52 Data on similarity of relevant baseline characteristics were often not presented, making it difficult to assess whether the study groups were sufficiently similar regarding their prognosis. Perhaps even more disturbing was the finding that number of drop outs and the reason for it were often not reported, while selective drop out of patients and loss to follow up may easily cause bias.
The small size of the study populations was also a commonly identified problem. For this reason, studies may lack the statistical power to detect clinically relevant differences in effects between the interventions under study, which of course only is a problem if pooling is not feasible. Another problem with smaller sample sizes is that important (un)known prognostic variables might not be in balance between the study groups after randomisation. Such situations may lead to biased outcomes if, by chance, patients in one group had a more favourable prognosis.
Another problem refers to the blinding of patients with respect to the interventions under study. Although it can be argued that patients will not be able to detect the content of the drug given, one should preferably evaluate whether the blinding was indeed successful by asking the patients to indicate which intervention they thought they had received.
The wide range of scores for methodological quality suggests that there is much room for improvement in future studies. It must be noted, however, that the reported methodological flaws are not unique for clinical trials evaluating the efficacy of NSAIDs. In general, the NSAIDs trials (median 48, range 27-83) seem to score somewhat higher than trials evaluating other interventions for low back pain. For example, trials evaluating the efficacy of spinal manipulation and mobilisation (median 35, range 20-56),10 exercise therapy (median 40, range 24-61)11 and back schools (median 36, range 16-70),12 traction therapy (median 36, range 23-66),14 all had a lower median methods score. It must be noted, however, that the methodological assessment was focused on the publication of the trial at issue. It might well be that the authors of a trial in fact conducted a high quality trial, meeting most of the criteria from table 1, but for some reason did not report the details in their article.
Another problem relating to the reporting of the trials is the clinical description of the study population. The descriptions and definitions of back pain (for example, acute and chronic, recurrence status, sciatica) varied widely among the studies included in this review. In some instances the definitions were not described at all. This situation hampers the interpretation of the study results. In future studies some standardisation of the description and classification of patients with low back pain (for example, the classification of the Quebec Task Force on Spinal Related Disorders)4 might be desirable.
The results of the 26 randomised trials that have been published to date, suggest that NSAIDs are effective for symptomatic short-term relief in patients with uncomplicated low back pain. The placebo controlled studies with methods scores above 50 points suggest that NSAIDs are effective in patients with (uncomplicated) low back pain, but are less effective or ineffective in patients with low back pain with sciatica and patients with sciatica with nerve root symptoms. The latter seems to be somewhat surprising because in patients with sciatica and nerve root symptoms some inflammation process is suggested to be part of the cause of the symptoms. One might have expected that NSAIDs would be effective in these patients because of the anti-inflammatory component of the drug. Whether NSAIDs are more effective than other (drug) therapies, including simple analgesics, remains unclear. Another question, still unanswered, concerns the long term effects of NSAIDs. Only one trial included a follow up measurement after six months with unclear results.19
The pooling of results from individual trials was confined to the placebo controlled trials on acute low back pain only. The trials investigating other contrasts were considered to be too heterogeneous regarding methodological quality, patients’ characteristics, contrasts under study, and outcome measurements, to allow pooling of their results. The meta-analyses of the results after one week indicated a pooled OR of 0.53 using the fixed effects model, indicating that NSAIDs were significantly more effective than placebo. The random effects model resulted in more or less the same point estimates with wider confidence intervals so that the point estimates reached borderline significance only. Given the last remark and because the sensitivity analysis indicated that the weaker trials reported larger effects the positive short- term effects of NSAIDs must be viewed with some caution. Caution is also indicated because the outcome measures in the pooled analysis were not identical. All four outcome measures consisted of a global (subjective) assessment of the clinical progress of the patient measured on an ordinal scale. In all four studies we were able to dichotomise the outcomes into ‘successes’ (for example, complete relief of pain, noticeable improvement, definitive positive effect) and ‘failures’ (for example, slight improvement, no chance/improvement, worse). However, whether these outcome measures are similar enough to permit statistical pooling remains an arbitrarily judgement. The outcomes after two weeks were more or less similar to those after one week. Again significant positive results were found for NSAIDs compared with placebo with the fixed effect model, however, the more conservative random effect model resulted in non-significant findings.
Numerous articles have reported on the side effects of NSAIDs, especially gastrointestinal events. In the studies presented in this review, side effects were also frequently reported, including abdominal pain, diarrhoea, oedema, dry mouth, rash, dizziness, headache, tiredness, etc. Most side effects were considered to be mild to moderately severe according to the authors of the studies. There seemed to be no clear difference in the reported number or severity, or both, of side effects between the different types of NSAIDs in the studies included in this review. However, the sample sizes of the studies, in general, were relatively low, permitting an inaccurate estimate of side effects only. Therefore, from the trials described in this review no clear conclusion can be drawn regarding the risks for gastrointestinal and other side effects when using NSAIDs.
There are certain limitations to the methods used in this systematic review. Publication bias cannot be ruled out, so it is possible that trials that were not published because of their (negative) results were missed. As we, for practical reasons, included English language papers only, there might also be a possibility for language bias, in the sense that perhaps the results of trials published in other languages might systematically differ from trials published in the English literature. Furthermore, the two independent reviewers were not blinded with respect to the source and outcome of the trials. However, the methodological criteria were quite strict and easy to apply. These criteria have been used for a number of reviews on conservative interventions for low back pain. In addition, a recent meta-analysis of spinal manipulation for low back pain demonstrated that the results of our scoring method were similar to results obtained by the scoring method of Chalmers et al.26 53One of the drawbacks of using this list of methodological criteria might be that trials showing a ‘fatal mistake’ (for example, irrelevant outcome measures, drop out rate exceeding 50%) may end up with a comparatively high score because they meet most of the other criteria. Studies with the highest methods scores should therefore be checked regarding such fatal flaws. No ‘fatal flaws’ were identified in the best studies (methods scores more than 60 points) in this review.
In conclusion, there are flaws in the design of most studies. The results of the 26 randomised trials that have been carried out to date, suggest that NSAIDs might be effective for short- term symptomatic relief in patients with uncomplicated low back pain, but are less effective or ineffective in patients with low back pain with sciatica and patients with sciatica with nerve root symptoms.
This study was supported by a grant from the Dutch Health Insurance Executive Board.
Explanation of the criteria from table 1. Each criterion must be applied independently of the other criteria.
- Description of inclusion and exclusion criteria (1 point). Restriction to a homogeneous study population (1 point).
- Similarity for: duration of complaints, value of outcome measures, age, recurrence status, and radiating complaints (1 point each).
- Randomisation procedure described (2 points). Randomisation procedure that excludes bias (for example, sealed envelopes) (2 points).
- Information from which group and with reason for withdrawal.
- Loss to follow up: all randomised patients minus the number of patients at main moment of effect measurement for the main outcome measure, divided by all randomised patients times 100.
- Smallest group immediately after randomisation.
- NSAID therapy explicitly described (5 points). All reference treatments explicitly described (5 points).
- Comparison with an existing treatment modality.
- Other medical interventions are avoided in the design of the study (except analgesics, advice on posture or use at home of heat, rest, or a routine exercise scheme).
- Comparison with a placebo therapy.
- Placebo controlled: attempt of blinding (3 points), blinding evaluated and fully successful (2 points). Pragmatic study: patients fully naive (3 points), or time restriction (no NSAID for at least one year) (2 points), naiveness evaluated and fully successful: 2 points.
- Use (measured and reported) of: pain, global measure of improvement, functional status (activities of daily living), spinal mobility, return to work (or to normal activities) (2 points each).
- Effect measurement (partly) by a blinded assessor (10 points).
- Moment of measurement during or just after treatment (3 points). Moment of measurement 6 months or longer (2 points).
- When loss to follow up is less than 10%: all randomised patients for most important outcome measures, and on the most important moments of effect measurement minus missing values, irrespective of non-compliance and co-interventions. When loss to follow up >10%: intention to treat as well as an alternative analysis that accounts for missing values.
- For most important outcome measures, and on the most important moments of effect measurement. In the case of (semi)continuous variables: presentation of the mean or median with standard error or percentiles.