Article Text

Download PDFPDF

Extended report
Does knee replacement surgery for osteoarthritis improve survival? The jury is still out
  1. Devyani Misra1,
  2. Na Lu1,2,
  3. David Felson1,
  4. Hyon K Choi2,
  5. John Seeger3,
  6. Thomas Einhorn1,
  7. Tuhina Neogi1,
  8. Yuqing Zhang1
  1. 1Department of Medicine, Boston University School of Medicine, Boston, Massachusetts, USA
  2. 2Department of Medicine, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts, USA
  3. 3Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA
  1. Correspondence to Dr Devyani Misra, Department of Medicine, Boston University School of Medicine, 650 Albany Street, Suite X200, Clin Epi Unit, Boston MA 02118, USA; Demisra{at}


Background The relation of knee replacement (KR) surgery to all-cause mortality has not been well established owing to potential biases in previous studies. Thus, we aimed to examine the relation of KR to mortality risk among patients with knee osteoarthritis (OA) focusing on identifying biases that may threaten the validity of prior studies.

Methods We included knee OA subjects (ages 50–89 years) from The Health Improvement Network, an electronic medical records database in the UK. Risk of mortality among KR subjects was compared with propensity score-matched non-KR subjects. To explore residual confounding bias, subgroup analyses stratified by age and propensity scores were performed.

Results Subjects with KR had 28% lower risk of mortality than non-KR subjects (HR 0.72, 95% CI 0.66 to 0.78). However, when stratified by age, protective effect was noted only in older age groups (>63 years) but not in younger subjects (≤63 years). Further, the mortality rate among KR subjects decreased as candidacy (propensity score) for KR increased among subjects with KR, but no such consistent trend was noted among non-KR subjects.

Conclusions While a protective effect of KR on mortality cannot be ruled out, findings of lower mortality among older KR subjects and those with higher propensity scores suggest that prognosis-based selection for KR may lead to intractable confounding by indication; hence, the protective effect of KR on all-cause mortality may be overestimated.

  • Orthopedic Surgery
  • Knee Osteoarthritis
  • Outcomes research

Statistics from


Knee replacement (KR) surgery is considered to be a definitive treatment option for patients with advanced knee osteoarthritis (OA), a disease that affects millions of older adults and has few effective pharmacological treatment options available at this time.1 ,2 It is a common procedure, with an estimated 600 000 procedures performed annually in the USA alone.3 KR surgery is associated with improvement in symptoms (knee pain), physical function and quality of life in majority of patients with knee OA.4 ,5 While chronic pain and poor physical function (eg, slow gait speed) have been associated with increased mortality in older adults,6 ,7 it is unclear whether the improvement in pain and function from KR translates into a survival benefit. Prior studies of KR and mortality have yielded conflicting results, with reports of excess,8 reduced9 and no difference10 in mortality with KR, mostly compared with a general population.

These discrepant findings reflect, in part, the challenges of studying mortality with KR surgery in observational setting, particularly when using administrative data or electronic medical records (EMRs). Because KR is an elective but invasive procedure, prognosis is an important consideration when deciding upon a patient's surgical candidacy. A survey of orthopaedic surgeons found variation in surgeons' selection criteria for KR, with factors such as obesity and poor physical function status, among others, adversely impacting surgeons' decisions for KR.11 Other factors, including patient's access to home care and physical therapy, surgeon volume and patient's relation with referring physician, were also found to impact decision for performing or referring for KR in another survey of orthopaedic surgeons and physicians.12 In the same survey, the authors described how physicians would prefer to refer a younger patient (50 or 60 years) over a 92-year-old patient needing KR, based on the perception of low functional status of the older patient.12 Factors contributing to prognosis-based selection (eg, functional status in elders) are not adequately captured in administrative data or EMR. Thus, prior studies of KR and mortality may have overestimated mortality benefit as patients selected for KR may be ‘healthier’ (have better prognosis) to undergo the surgery. This is particularly relevant to older subjects, given high prevalence of comorbidities and physical frailty that might render many of them ‘unfit’ to undergo surgery. In contrast, older subjects who do undergo surgery may be exceptionally ‘robust’. Thus, a major challenge to examining mortality risk related to KR is in adequately addressing confounding by indication resulting from selection of healthier subjects for surgery (or exclusion of ‘sicker’ subjects from surgery). This issue of confounding by indication is highlighted in a recent study in which subjects undergoing knee or hip replacement were found to be less likely to have all-cause 5 years hospitalisation and 10 years mortality, despite comprehensive matching for baseline characteristics.13 Interestingly, in the same study, prior to matching, participants undergoing elective knee or hip replacement were noted to be younger, healthier (fewer comorbidities) and belonging to higher socioeconomic group, compared with those not selected for joint replacement, highlighting that healthier candidates are selected for surgery.13

Thus, our objective in this study was twofold. First, to evaluate the relation of KR to the risk of all-cause mortality among subjects with knee OA, with particular attention to addressing potential sources of confounding bias that may account for effect of KR on mortality. Next, to perform additional analyses (differential mortality risk by age and candidacy for KR) exploring for potential residual confounding by indication despite our best efforts at mitigating this bias.


Study sample

The Health Improvement Network (THIN) is a UK primary care electronic database that has anonymised health data on approximately 10 million patients who were systematically followed in 558 primary care practices starting in 1986.14 The information available in THIN is collected by general practitioners (GPs) as part of their routine patient care, which is deidentified and integrated into a central database for research purposes.14 Diagnoses and test procedures are recorded with Read codes.14 Prescriptions written by primary care physicians are recorded automatically in the database as Drug codes, with the use of a coded drug dictionary (Multilex).15 Quality is checked regularly and the information from this database has been found to be representative of the UK population as a whole.14 ,16

Eligible participants of the current study consisted men and women aged 50–89 years during 2000–2012 with diagnosis of knee OA (Read code), and enrolled within THIN for at least 2 years (N=602, 733).

Exclusion criteria and study design

Exclusion criteria

Subjects with concomitant diagnosis of rheumatoid arthritis (defined by diagnosis code and use of disease modifying antirheumatic drugs) were excluded. To improve comparability between KR and non-KR subjects, we then excluded subjects with conditions that may deem them potentially ineligible for KR surgery due to increased risk of mortality (ie, make them less likely to be a surgical candidate), such as: body mass index (BMI) >40 kg/m2, history of joint infections, cancers with high risk for mortality (pancreatic, oesophageal, gastric or metastatic) and comorbidities with poor prognosis (eg, end-stage renal disease on dialysis and chronic lung disease with use of nasal cannula oxygen). The sample selection as well as inclusion and exclusion criteria are illustrated in figure 1.

Figure 1

Schematic representation of subject selection, inclusion and exclusion criteria. BMI, body mass index; KR, knee replacement; THIN, The Health Improvement Network.

Propensity score-matched cohort

Propensity score matching is a statistical matching method used for mitigating the effects of confounding by indication, especially in the presence of a large number of covariates, in epidemiological studies.17 Thus, to address confounding by indication, we performed propensity score matching, as follows. The time period between 2000 and 2012 was divided into twelve 1-year blocks, known as cohort accrual blocks. Within each cohort accrual block, among the remaining subjects (N=475 286) after exclusion described above, we identified subjects with incident (new-onset) KR (total or partial) using Read codes and calculated propensity scores for KR using logistic regression. The variables included in the model were risk factors that were associated with both all-cause mortality and decision-making for KR: knee OA duration and severity (referral to orthopaedic clinic after knee OA diagnosis, analgesic medications), sociodemographic factors (age at time of KR, sex, BMI and socioeconomic status (Townsend Deprivation Index)18), comorbidities (hypertension, diabetes, hyperlipidaemia, ischaemic heart disease, heart failure, atrial fibrillation, stroke, dementia/cognitive impairment, depression, seizure disorder, peripheral vascular disease, venous thromboembolism, chronic obstructive lung disease, lung infection, renal disease, liver disease, cancers except skin cancer, cellulitis, falls, hip fracture, anaemia and peptic ulcer disease), lifestyle factors (smoking status and alcohol use), healthcare usage (number of GP visits and hospitalisations), health status (albumin level) and medication use (non-steroidal anti-inflammatory medications, opioid or non-opioid analgesics, antihypertensive, cholesterol lowering, insulin/oral hypoglycaemic agents, bisphosphonates, raloxifene, strontium, glucocorticoids and antiepileptics). Sample with propensity scores below 2.5% and above 97.5% were excluded, to enable exclusion of subjects who either underwent KR or did not undergo KR contrary to prediction.19 ,20 The covariate assessment period was 2 years prior to index date (date of surgery for KR subjects and a randomly selected date within the cohort accrual block for non-KR subjects) for medications and healthcare usage, the most recent visit prior to the index date for sociodemographic and lifestyle factors, and any time before the index date for comorbidities.

Based on propensity scores, within each cohort accrual block, KR subjects were matched 1:1 to non-KR subjects, using greedy matching (fixed pairs once the pairs are established) method, a common method for creating propensity score-matched cohorts (figure 1).21 ,22 Out of 14 045 KR subjects, only three did not have a suitable non-KR match. The study outcome was all-cause mortality, which was determined by the date of death recorded in THIN.

Statistical analysis

Subjects with complete data were included for the analyses of this study. Follow-up started from the index date and continued until death, lost to follow-up, or end of the study (31 December 2012). All-cause mortality rate for each group was calculated by dividing the number of deaths by the total person-years of follow-up. Kaplan–Meier curves were plotted to determine the cumulative incidence of all-cause mortality rates for the KR and non-KR cohorts and the relation of KR to risk of all-cause mortality using Cox proportional hazards regression.

To explore for potential residual confounding by indication, we first examined differential effect of mortality risk with KR by quartiles of age category using Cox proportional hazards regression. Decrease in mortality risk with KR with increasing age may be possible, but more likely would indicate the presence of residual confounding bias due to selection of healthier candidates for KR, particularly in the very elderly. Next, we examined the relation of KR to all-cause mortality stratified by deciles of propensity score (ie, predicted probability for KR representing candidacy), using Cox proportional hazards regression. While potential effect measure modification cannot be excluded, a difference in mortality related to KR according to propensity score (ie, candidacy) would indicate the presence of confounding by unmeasured factors.

SAS V.9.3 (Cary, North Carolina, USA) was used for all analyses, with two-sided α of 0.05 for significance testing.

The Institutional Review Board at Boston University Medical Campus and THIN proposal review committee approved the study.


We identified 14 042 matched pairs of subjects with knee OA (mean age 71 years; 57% women; mean BMI 29 kg/m2), with and without KR (99.9% KR subjects found non-KR matches). The mean total follow-up time was 4.42 (SD=2.96) and 4.31 (SD=2.98) years for KR and non-KR subjects, respectively. All covariates were well balanced between the KR and non-KR cohorts (table 1).

Table 1

Baseline characteristics in the propensity score-matched cohort

During follow-up, 1159 deaths occurred in the KR group and 1418 deaths in the non-KR group. As shown in figure 2, the cumulative mortality was higher among the non-KR group (blue dotted line) than that of the KR group (red solid line). In the overall propensity score-matched study sample, crude mortality rates per 1000 person-years (total person-years) for the KR and non-KR cohorts were 19 (61 014.8) and 25 (58 293.9), respectively. Subjects who underwent KR had a 28% lower risk of all-cause mortality compared with the non-KR subjects (HR=0.72, 95% CI 0.66 to 0.78).

Figure 2

Cumulative mortality curves for time to death by knee replacement (KR) status in a propensity score-matched cohort of men and women with knee osteoarthritis.

The relation of KR to all-cause mortality according to age strata is shown in table 2. In the youngest age quartile (<63 years), KR subjects experienced slightly higher, although not statistically significant, all-cause mortality than non-KR subjects (HR=1.20, 95% CI 0.84 to 1.71). In contrast, in the other three age quartiles, subjects with KR had lower all-cause mortality than their counterparts. The HRs were 0.80 (95% CI 0.66 to 0.96) in age quartile 2, 0.75 (95% CI 0.66 to 0.86) in age quartile 3 and 0.65 (95% CI 0.55 to 0.77) in age quartile 4, respectively (test for interaction p<0.0001).

Table 2

Relation of KR to mortality risk by age quartiles in a propensity score-matched* cohort of men and women with knee osteoarthritis

In the analyses stratified by deciles of propensity score, as the propensity score decile increased (ie, greater likelihood of KR), the mortality rate among KR subjects consistently decreased (from 21 to 15 deaths per 1000 person-years), with the lowest mortality rate occurring in the highest decile category of propensity score (mortality rate 15 per 1000 person-years). However, no such consistent trend was noted among non-KR subjects (table 3).

Table 3

Relation of KR surgery to mortality risk, stratified by deciles of propensity scores, in a propensity score-matched cohort of men and women with knee osteoarthritis


In this large population-based time-varying propensity score-matched cohort of knee OA subjects, KR was associated with lower long-term all-cause mortality compared with those who did not undergo KR. This survival benefit was confined to older subjects, with a slightly increased risk of mortality among subjects <63 years old. While it is possible that survival benefit seen in older patients with KR is a true effect because it is in this group that greater physical activity is particularly important to survival, more likely it is a result of residual confounding because subject selection is rigorous in this age group due to vulnerability. Further, we found lower mortality risk among KR subjects compared with non-KR subjects across the full range of the propensity score for KR, including lowest decile of propensity score, which suggests that irrespective of candidacy for KR, subjects selected for KR surgery likely have better prognosis for survival.

Because KR improves pain and function, the resulting improved mobility is postulated to potentially translate into decreased long-term mortality risk. However, previous studies evaluating long-term mortality risk with KR have found conflicting results.8–10 ,23–25 For example, while one large study using Swedish Knee Arthroplasty registry data found lower standardised mortality rate with KR9 no difference in mortality risk related to KR was found in another study from Germany, both studies comparing with the general population.10 These differences in results reflect largely the challenge of studying mortality in the context of KR surgery in an observational setting using administrative or EMRs dataset, primarily due to confounding by indication, as described earlier in this manuscript. Many prior studies have only been able to adjust for age and sex when calculating standardised mortality ratio due to lack of access to additional confounder information.8–10 ,24 ,25 In the current study, we addressed this issue by restricting our study sample to those with knee OA, excluding subjects who were deemed ineligible for KR due to mortality risk and by using propensity score matching to minimise confounding.

Nonetheless, despite our efforts to address confounding by indication, our results suggest the presence of residual confounding. For example, the observation of improved survival immediately after KR, despite the expectation of potential short-term increased postoperative mortality risk supports the presence of residual confounding. Similar observation was noted in a study evaluating mortality risk following hip replacement, where a rapid decline in mortality among the hip replaced group within 3 months of the surgery was observed, which is too quick for the actual putative beneficial effects of joint replacement to occur. The authors postulated that this observation was likely related to low inherent risk of mortality in the group selected for surgery after the rigorous preoperative evaluation process.26 Further, our finding of a survival benefit of KR compared with non-KR being confined to older adults, with a slightly detrimental effect among the younger subjects, supports this selective practice pattern, that is, older adults who undergo KR are likely to be much healthier than their age-specific counterparts, whereas this selection is less pronounced in younger patients. Furthermore, we found that all-cause mortality decreased in the KR group compared with their non-KR counterparts irrespective of the propensity score decile, suggesting that irrespective of the propensity for KR, subjects who were selected for KR had better survival than the non-KR counterparts. This was true even for subjects in the lowest propensity score decile category. However, among KR subjects, mortality risk decreased as propensity score increased supporting that subjects who had greater probability of selection for KR (ie, better candidacy for KR) were less likely to die than those who were less likely to be selected for KR. Such a trend, however, was less apparent for the non-KR cohort. To explore whether propensity score for KR itself is a strong predictor for all-cause mortality and whether it varies by age, we conducted a post hoc analysis stratified by age (data shown in online supplementary appendix table S1). For every 0.2 increase in propensity score for KR, the hazards ratios for all-cause mortality decreased (HR 0.83, 0.69, 0.54 and 0.42, respectively, among subjects aged ≤63, >63 to ≤71, >71 to ≤79 and >79 years; p for interaction <0.002). These results further illustrate that with the same increment in propensity score, its protective effect on all-cause mortality was much greater among older subjects with knee OA than their younger counterparts. Thus, while a true protective effect of KR cannot be excluded, we believe the protective effect detected in the current study reflects confounding by indication, at least in part, despite extensive efforts to mitigate this bias.

Supplementary table

Association between propensity score for knee replacement surgery and mortality risk, stratified by age quartiles

Limitations of our study are primarily related to the data available in THIN, an EMRs database, which limited our ability to fully capture all factors that are pertinent in the decision to undergo KR (eg, physical function measures, severity of pain). Additionally, the underlying study sample of adult patients with knee OA identified in THIN was based on Read codes, which have not been specifically validated for the diagnosis of knee OA. However, the definition of OA has included a single medical contact in previous studies,27 ,28 including validation studies.29 Nonetheless, if a GP has identified a knee complaint in this age group, the most likely diagnosis is knee OA, and knee complaints are the primary indication for seeking and undergoing KR.30 Another limitation of this study is that while we included subjects with total and partial (unicompartmental) KR, due to too few individuals with partial KR (n=435, 3.1%) we were unable to evaluate separately their mortality risk separately from those with total KR. Similarly, there were too few subjects with BMI >40 kg/m2, which is still generally considered a relative contraindication for KR, to conduct separate analyses among these subjects; however, in sensitivity analyses, their inclusion in the main analyses did not alter the results.

Our study has many strengths. We used a large population-based data source that included a wide-ranging array of factors that may be taken into account when considering referral to an orthopaedic surgeon, a surgeon's recommendation for surgery and/or a patient's agreement to undergo surgery. All subjects have health insurance in the UK, and therefore, insurance status should not impact decision regarding KR. We used a time-stratified, propensity score-matched cohort approach to account for changes in the relative importance of confounding variables at different calendar times and to account for secular trends. We conducted additional analyses to explore potential residual confounders that may affect the study findings.

In summary, we found a strong protective effect of KR on all-cause long-term mortality risk, particularly among older adults. However, our sensitivity analyses suggest potential residual confounding. Factors that may contribute to such decision-making are unlikely to be adequately captured in EMRs or administrative databases. Nonetheless, KR does not appear to be associated with increased risk of all-cause mortality, and while we cannot rule out that KR may potentially reduce the risk of mortality over the long-term, the true extent of that potential benefit is difficult to discern due to confounding by indication in observational studies using administrative data or electronic health records.



  • Handling editor Tore K Kvien

  • TN and YZ are co-last authors.

  • Contributors DM was involved in study design, interpreting results and drafting and revising manuscript. NL was involved in data extraction, analyses and interpreting results. YZ and TN were involved in study design, interpreting results and manuscript preparation. DF, HKC and JS were involved in interpreting results and manuscript preparation.

  • Funding This study was funded by Arthritis Foundation Postdoctoral Fellowship award and supported by NIAMS P60AR47785, ACR Rheumatology Research Foundation Investigator Award and Boston University CTSI KL2 scholarship (grant number 5KL2TR001411-02).

  • Competing interests None declared.

  • Ethics approval Boston University School of Medicine.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.