Background: Acetaminophen (paracetamol) is recommended as the initial pharmacological treatment for knee or hip osteoarthritis. However, survey and clinical trial data indicate greater efficacy for non-steroidal anti-inflammatory drugs and cyclo-oxygenase-2 specific inhibitors.
Design: Two randomised, double blind, placebo controlled, crossover multicentre clinical trials, Patient Preference for Placebo, Acetaminophen or Celecoxib Efficacy Studies (PACES).
Patients: Osteoarthritis of knee or hip.
Intervention: “Wash out” of treatment; randomisation; 6 weeks of celecoxib 200 mg/day, acetaminophen 1000 mg four times a day, or placebo; second “wash out;” crossover to 6 weeks of second treatment.
Measurements: Western Ontario McMaster Osteoarthritis Index (WOMAC), visual analogue pain scale, patient preference between two treatments.
Results: Celecoxib was more efficacious than acetaminophen in both periods in both studies; WOMAC and pain scale scores differed at p<0.05 in period II and both periods combined of PACES-a and in periods I and II and both periods combined in PACES-b, but not in period I of PACES-a. Acetaminophen was more efficacious than placebo, generally p<0.05 in PACES-b, and >0.05 in PACES-a. Patient preferences were 53% celecoxib v 24% acetaminophen in PACES-a (p<0.001) and 50% v 32% in PACES-b (p = 0.009); 37% acetaminophen v 28% placebo in PACES-a (p = 0.340) and 48% v 24% in PACES-b (p = 0.007). No clinically or statistically significant differences were seen in adverse events or tolerability among the three treatment groups.
Conclusions: Greater efficacy was seen for celecoxib v acetaminophen v placebo, while adverse events and tolerability were similar. Variation in results and statistical significance in the two different trials are of interest.
- MDHAQ, Multidimensional Health Assessment Questionnaire
- NSAIDs, non-steroidal anti-inflammatory drugs
- SF-36, Short Form-36
- WOMAC, Western Ontario McMaster Osteoarthritis Index
- pain visual analogue scale
Statistics from Altmetric.com
- MDHAQ, Multidimensional Health Assessment Questionnaire
- NSAIDs, non-steroidal anti-inflammatory drugs
- SF-36, Short Form-36
- WOMAC, Western Ontario McMaster Osteoarthritis Index
Acetaminophen (paracetamol) is recommended as the initial drug treatment for patients with osteoarthritis of the knee or hip,1–3 based on: (a) a high level of gastrointestinal adverse events with non-specific non-steroidal anti-inflammatory drugs (NSAIDs), the primary treatment until the 1990s4,5; (b) a perception that acetaminophen is safer than these traditional agents 6; (c) clinical trials which were interpreted to indicate clinical equivalence of acetaminophen and ibuprofen7 or naproxen.8
In recent years the possible advantages of acetaminophen for most patients with osteoarthritis have been questioned.9–11 Two patient surveys indicated that most patients rated NSAIDs as a better treatment than acetaminophen.12,13 A randomised, double blind, crossover clinical trial indicated that 57% of patients reported a preference for diclofenac/misoprostol versus acetaminophen, while 21% preferred acetaminophen and 22% expressed no preference.14 Reanalysis of clinical trials which were interpreted to show similar efficacy of acetaminophen and non-steroidal anti-inflammatory drugs7,8 indicated greater effect sizes for ibuprofen and naproxen versus acetaminophen, similar to diclofenac/misoprostol, suggesting that absence of statistical significance resulted from low numbers of patients.9
A recent report indicated little efficacy of acetaminophen,15 although this study also included relatively few patients. Acetaminophen in doses of 4 g/day, as recommended for osteoarthritis by the American College of Rheumatology3 and other organisations, has been associated with a rate of gastrointestinal adverse events similar to standard NSAIDs,16,17 although these observations are confounded, in part, by a likelihood that acetaminophen is given to patients at highest risk for gastrointestinal events.
An important recent advance has been the development of cyclo-oxygenase-2 specific inhibitors, which have efficacy similar to that of non-specific NSAIDs, but fewer gastrointestinal adverse events.18,19 A recent study indicated significantly greater efficacy of rofecoxib than of acetaminophen in patients with osteoarthritis, with comparable adverse events.20 In this report we present the results of two randomised, placebo controlled, two period, three treatment, crossover, double blind clinical trials, Patient Preference for Placebo, Acetaminophen or Celecoxib Efficacy Studies (PACES), in which patients were given two of the three treatments. Results of both studies are similar, although variation of results and statistical significance in the two studies are of interest.
PACES-a and PACES-b had an identical design—a two period, double blind, double dummy, crossover trial of celecoxib, acetaminophen, and placebo in patients with osteoarthritis of the knee or hip. Inclusion criteria were age 45 or greater, radiographic Kellgren-Lawrence grade 2–4, a score of 40–90 mm on a visual analogue pain scale, and designation by the treating physician that the patient was a candidate for long term treatment with a cyclo-oxygenase-2 specific inhibitor drug or an analgesic drug. The primary exclusion criteria were significant medical comorbidities, rheumatoid arthritis or other inflammatory arthritis, acute joint trauma, chronic pain syndrome, expected need for surgery during the course of the study, oral or parenteral corticosteroids within 2 months, or intra-articular injections of hyaluronic acid within 9 months. Women of childbearing potential were required to use contraception; pregnant or lactating women were excluded. These studies were approved by the institutional review boards of all participating study sites. All patients signed written informed consent before participation.
Patients were assigned randomly to one of six treatment sequence groups to receive a sequence of two of three treatments, each for 6 weeks (fig 1), celecoxib 200 mg/day—taken in the morning, acetaminophen 1000 mg four times a day, or placebo. Each patient received a drug and a placebo or two placebos in each period. All assessments were conducted using patient and assessor questionnaires, sent by facsimile to the Pharmacia (Pfizer) data centre in Markham, Ontario, Canada. Data were entered into a clinical database using character recognition software; each entry was verified manually by a clinical data validation specialist and double checked by a senior validation specialist. Any discrepancies were sent to the site assessors for clarification and verification.
Visit 1 was a screening visit to review inclusion and exclusion criteria and obtain informed consent. Patient volunteers were given an extensive list of NSAIDs and analgesic drugs to discontinue and not to take throughout the study, and began a 3–7 day washout period. Propoxyphene (Darvon) 65 mg up to four times per day was given as a rescue analgesic drug; codeine 60 mg or tramadol (Ultram) 100 mg, up to four times per day, were provided as alternatives to fewer than 5% of patients if propoxyphene was poorly tolerated or ineffective. Patients were instructed not to take any rescue drug within 12 hours of any visit.
Visit 2 was conducted 3–7 days after visit 1 to randomise patients and provide the 6 weeks of study drug and placebo or two placebos for period I. Visit 3 occurred at the end of the first 6 week treatment period to evaluate period I study drug. Patients who discontinued in period I before completion of 6 weeks had a visit 3 assessment at the time of discontinuation, and were invited to participate in treatment period II. Patients who continued had a second 3–7 day washout period before treatment period II. Visit 4 occurred 3–7 days after visit 3, at which time 6 weeks of the second study drug and placebo or two placebos for period II were provided. Visit 5 occurred at the end of period II, or earlier if the patient chose not to complete period II, to evaluate period II study drug. Patients were queried at visit 5 about their overall preference between the two treatments periods.
Patient and assessor questionnaires
Patients completed a standard eight page questionnaire at each visit, which included a “disease-specific” Western Ontario McMaster Osteoarthritis Index (WOMAC),21,22 which contains 24 100 mm visual analogue scales to assess pain, stiffness, and function, including a general WOMAC and a WOMAC directed at the primary affected joint indicated by the patient, a “general arthritis” Multidimensional Health Assessment Questionnaire (MDHAQ),23–26 which includes 100 mm visual analogue scales to assess pain, global status, fatigue, and gastrointestinal distress; and a “generic” Short Form-36 (SF-36) health survey.27,28
Patients also completed a three page questionnaire unique to each visit to query general status and possible adverse events. The unique patient questionnaire for the final visit 5 included a query: “Please compare control of your arthritis during the first and second study periods,” with five response options, “much better during the first study period,” “better during the first study period,” “no difference between the first and second study periods,” “better during the second study period,” “much better during the second study period.” Responses of “much better” or “better” were merged in the analyses presented in this report.
Investigators completed unique questionnaires at each visit, which included the investigator assessment of the patient’s global status and change in global status on a 100 mm visual analogue scale, as well as a standard report used by the Food and Drug Administration for adverse events. The questionnaires are described in greater detail in the cited references and in the report of a previous study,14 in which virtually identical questionnaires were used.
The three major efficacy outcomes were the total WOMAC score, 100 mm visual analogue pain scale score, and patient preference for period I treatment versus period II treatment. The sample size was planned to have at least 0.90 power at the 0.05 significance level for treatment comparisons of the total WOMAC score and the 100 mm visual analogue pain scale during period I for at least 150 patients for each of the placebo, acetaminophen, and celecoxib groups. Furthermore, the paired preference comparison would have at least 0.90 power at the 0.05 significance level with at least 100 patients for each of the two sequences for acetaminophen before/after celecoxib.
The false positive (or type I) error rate was kept at the conventional 5% level for the global tests to compare all three treatments in the study for all three major efficacy outcomes through a closed testing procedure,29 with a fixed sequence of tests for which significance at all preceding steps was required to proceed to the next step. For PACES-a, the order of testing was total WOMAC for period I, followed by pain score for period I, and then patient preference response. For PACES-b the order of testing was patient preference, followed by the total WOMAC for period I, and pain score for period I. Pairwise tests between the three treatments were conducted only if the global test was found to be significant (which was the case in both studies). These methods for multiple comparisons essentially made the total WOMAC score have the primary role in PACES-a and the patient preference have the primary role in PACES-b.
Secondary end points included patient global assessment, also measured on a 100 mm visual analogue scale, SF-36 pain scores, MDHAQ activities of daily living scale scores, investigator assessment of patient global status, and investigator assessment of patient change in global status. The last observation carried forward procedure was used to manage missing data. Sites that enrolled fewer than five patients were pooled to allow for adjustment for site in the analyses.
Analyses of total WOMAC score, pain score, and paired patient preference were conducted on three groups: (a) “intention to treat”—all patients who received a single dose of the study drug in period I; (b) “protocol adherent”—omitting patients with major protocol violations such as non-compliance; (c) “all completers”—patients with complete data for the relevant period(s). Results from all three analyses were similar. Patient preference results are presented for patients in the intention to treat population who provided preference data (requiring participation in both periods) and for the protocol adherent population. Only the intention to treat results are presented for all other end points.
For all six sequence groups, means and their corresponding standard errors were computed to describe the distribution of continuous demographic, clinical, and patient/assessor questionnaire response variables. Distributions of categorical demographic, patient questionnaire response variables, and adverse events were described for each group with frequencies and/or percentages. The extent of random imbalances in comparisons of the groups at screening visit 1 was described with p values from a χ2/Fisher’s exact test for dichotomous variables and a Kruskal-Wallis test for continuous variables.
Efficacy of the three treatments was compared for continuous variables using analysis of covariance for the change during each of periods I and II, with screening and baseline scores included as covariates, along with fixed effects for treatment and centre. Repeated measures analysis of covariance through generalised estimating equations30 provided supportive analyses for both periods combined. This method had a full statistical model with covariates for treatment, period, screening (visit 1), baseline of the patient (visit 2), and baseline for the period (visit 2 for period I and visit 4 for period II), with patient as the random sampling unit. The absence of noteworthy carryover effects for the residual impact of period I on period II was confirmed through a corresponding expanded model with their additional inclusion. Patient preference was assessed using conditional logistic regression,31 with components for period and treatment.
The prevalence of adverse events was compared among treatment groups using conditional logistic regression. Tolerability was also assessed according to change in a gastrointestinal distress 100 mm visual analogue scale on the MDHAQ.
Baseline and overview
In PACES-a, 524 patients were enrolled and randomised, and 556 patients were enrolled and randomised in PACES-b (table 1). No atypical imbalances with p<0.05 were seen in values at screening between the six sequence groups or between the three period I treatment groups for age, sex, race, education level, duration of osteoarthritis, previous NSAID or analgesic use, proportion of patients who took aspirin for cardiac prophylaxis, patient identified index joint, radiographic grade of the index joint, global severity of osteoarthritis, WOMAC scores, and pain scores (table 1), other than the WOMAC pain score (p = 0.031) in PACES-a. This latter result is probably spurious relative to multiple comparisons and is of no consequence, as baseline scores are covariates in the analyses of covariance for the changes during periods I, II, and both periods combined. In both studies all global tests to compare the three groups were statistically significant (p = 0.01 in both) for the three major efficacy end points, allowing us to proceed to the test for the respective pairs of treatments. Also, all tests for carryover effects confirmed their absence.
In PACES-a, period I comparisons were addressed as the prespecified primary end point of that study, with differences between celecoxib and acetaminophen (p = 0.180) and between acetaminophen and placebo (p = 0.080) not significant, while differences between celecoxib and placebo were significant (p = 0.002) (table 3A). During period II and both periods combined, differences between celecoxib and acetaminophen (p<0.009) and celecoxib and placebo (p<0.007) were significant, but not between acetaminophen and placebo (p = 0.080 for both periods) (table 3A). In PACES-b, differences between celecoxib and acetaminophen, celecoxib and placebo, and acetaminophen and placebo were all significant (p<0.001 to 0.03) (table 3B). Similarly, statistically significant differences were seen for period II and both periods combined. Percentage improvements from baseline in WOMAC scores averaged over all four treatment periods in PACES-a and PACES-b, were 21.6% for celecoxib (range 17.6–26.0%), 13.0% for acetaminophen (range 9.5–16.3%), and 7.9% for placebo (range 5.3–9.5%) (tables 2A and B).
Pain visual analogue scale scores
Pain VAS results were similar to the WOMAC results (tables 2 and 3). Differences between celecoxib and acetaminophen were not significant (p = 0.193) in PACES-a, period I, while differences between acetaminophen and placebo (p = 0.031) and celecoxib and placebo (p<0.001) were significant (table 3A). In period II (table 2A, fig 2), differences between celecoxib and acetaminophen (p = 0.003) and celecoxib and placebo (p = 0.002) were significant, while differences between acetaminophen and placebo (p = 0.651) were not (table 3A). In both periods combined, differences between celecoxib and acetaminophen (p = 0.001), celecoxib and placebo (p<0.01), and acetaminophen and placebo (p = 0.02) were significant (table 3A). In PACES-b (table 2B, fig 2), differences in pain scores between celecoxib and acetaminophen, celecoxib and placebo, and acetaminophen and placebo in periods I, II, and both periods combined were significant (<0.021), other than differences between celecoxib and acetaminophen in period II (p = 0.054) (table 3B). Percentage improvements from baseline in pain scores averaged over all four treatment periods in PACES-a and PACES-b were 27.5% (range 22.5–33.4%) for celecoxib, 18.3% (range 10.9–25.8%) for acetaminophen, and 10.2% (range 3.7–16.4%) for placebo (tables 2A and B).
Paired patient preference
In the PACES-a “all completers” population, among 173 patients who received celecoxib and acetaminophen, 52.6% rated celecoxib as “much better” or “better”, 24.3% rated acetaminophen as “much better” or “better”, and 23.1% reported “no difference” (table 4, fig 3). Odds ratios were 2.07 for preference of celecoxib versus acetaminophen (p<0.001), 2.51 for celecoxib versus placebo (p<0.001), and 1.21 for acetaminophen versus placebo (p = 0.340). Similar results were seen for the intention to treat and protocol adherent populations (table 4).
In the PACES-b “all completers” population, odds ratios were 1.47 for patient preference of celecoxib versus acetaminophen (p = 0.009), 2.47 for celecoxib versus placebo (p<0.001), and 1.68 for acetaminophen versus placebo (p = 0.007). Again, similar results were seen for both the intention to treat and protocol adherent populations (table 4).
Other efficacy end points
In both PACES-a and PACES-b, analyses of the patient global scale, MDHAQ activities of daily living scale, investigator assessment of patient global status, investigator assessment of patient change in global status, and SF-36 pain scores disclosed patterns similar to those of the primary end points (data not shown—available on request).
Adverse events were reported by 23–29% of patients in the three groups. No significant differences were seen between the proportion of patients reporting any gastrointestinal event, specifically diarrhoea, dyspepsia, nausea, and flatulence, as well as for upper respiratory infection and headache, or any event (table 5).
The MDHAQ gastrointestinal distress scale indicated small changes over 6 weeks for all three treatments in both studies (tables 2A and B), suggesting no clinically significant gastrointestinal intolerability with any of the three treatments.
In PACES-a, eight adverse events were classified as serious because they required admission to hospital: one in the celecoxib group—intestinal obstruction and neuropathy; three in the acetaminophen group—one case of anxiety, one of cholelithiasis, and one of cholecystitis; and four in the placebo group—one patient with raised liver function tests, one with urinary tract malformation and rectal disorder, one with an accidental fracture, and one with sepsis. Two of the events, the intestinal obstruction that occurred with the patient taking celecoxib and increased liver enzymes, which occurred while a patient was taking placebo, were regarded by the investigators as potentially related to the study drug. The other serious adverse events were regarded as probably not related to the study drug.
In PACES-b, four adverse events were classified as serious, because they required admission to hospital: two in the celecoxib group—one case of cholelithiasis and one case of unstable angina; one in the acetaminophen group—chest pain, probably musculoskeletal in origin, and one in the placebo group—angina pectoris. All events were considered unrelated to the study drug by the investigators.
The data indicate a gradient of efficacy from celecoxib to acetaminophen to placebo. Although overall trends in the two studies are similar, numerical advantages in efficacy of celecoxib over acetaminophen, and acetaminophen over placebo according to WOMAC and pain scores in period I of PACES-a were not significant by the criterion of p<0.05. Differences between celecoxib and acetaminophen were significant in period II and both periods combined in PACES-a, and periods I, II, and both periods combined in PACES-b. Patient preference data for celecoxib versus acetaminophen or placebo were significant in PACES-a and PACES-b. Patient preference for acetaminophen versus placebo was significant in PACES-b, but not in PACES-a.
The rate of adverse events was low and similar for celecoxib, acetaminophen and placebo, with few serious adverse events. All three treatments were well tolerated at comparable levels. Although the patient preference inquiry was based primarily on efficacy, the absence of gastrointestinal intolerability with celecoxib was probably incorporated by the patients into an assessment of preference.
Greater efficacy of celecoxib compared with acetaminophen was also seen using other proposed measures of improvement in osteoarthritis clinical trials. A 20% improvement criterion proposed by Case et al,15 was seen for WOMAC scores for three of the four periods with celecoxib, compared with none of four with acetaminophen or placebo. For the pain visual analogue scale, the 20% improvement criterion was met in four of four periods for celecoxib, two of four for acetaminophen, and none of four for placebo. A 10 mm change in WOMAC scores proposed by Ehrich et al32 was seen for celecoxib in period I of PACES-a and both periods of PACES-b, and was not seen for acetaminophen or placebo in either trial.
PACES-a and PACES-b are presented as individual, rather than pooled studies, to illustrate natural variation in results and p values in two identical clinical trials. Numerical differences between celecoxib and acetaminophen were similar in PACES-a and PACES-b. The p value of 0.18 in PACES-a for differences between celecoxib and acetaminophen in period I indicates a result that could occur one in five times by chance, while the p value in PACES-b of 0.007, indicates an occurrence less than 1 in 130 times by chance. The data illustrate that a focus only on the statistical criterion of p<0.05 value may not necessarily be optimal to discern differences in the efficacy of one drug versus another.
The PACES clinical trials have several limitations. Firstly, comparisons between three drugs ideally might be conducted with each patient taking all three drugs. However, pragmatic considerations suggested only two treatment periods, but a design so that three treatments could be compared. Patient attrition also presents a limitation, as in any clinical trial, although completion rates for both periods of 73% of patients in PACES-a and 74% of patients in PACES-b compare favourably with single period clinical trials in osteoarthritis.
The patient preferences for celecoxib versus placebo or acetaminophen in this study were similar to those seen for diclofenac/misoprostol compared with acetaminophen in the ACTA study,14 and trends are similar to a trial of rofecoxib, celecoxib, and acetaminophen,33 although responses to acetaminophen were greater in that trial than in the PACES trial reported here. The rate of gastrointestinal events was considerably lower with celecoxib in PACES than with diclofenac/misoprostol in ACTA. Therefore, the results indicate greater efficacy of celecoxib versus acetaminophen for patients with osteoarthritis, with similar tolerability and safety. Although a substantially higher probability is seen that patients will respond to celecoxib compared with acetaminophen or placebo, individual variation is seen, as 1 in 3–5 patients expressed a preference for acetaminophen, and 1 in 3–5 expressed no preference in both trials.
In conclusion, we have found that the efficacy and patient preference for celecoxib is greater than that for acetaminophen, and the efficacy and patient preference for acetaminophen is greater than that for placebo. These results may have implications for an optimal pharmacological approach to the management of patients with osteoarthritis using drugs available at this time.
We recognise the contributions of Reinhard Schuller in designing, Dr Joseph Kohles in managing, and all the investigators in conducting the PACES clinical trials, and the expert assistance of Dr Barbara J Struthers in the design of the figures.
Sponsored by Pfizer Corporation.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.