Objective To assess patient-physician discordance in global assessment of disease activity in early axial spondyloarthritis (axSpA) over time and determinants of discordance.
Methods DESIR (Devenir des Spondyloarthropathies Indifférenciées Récentes) is a French, multicentre, longitudinal cohort of patients with early inflammatory back pain suggestive of axSpA. Patient global assessment (PGA) and physician global assessment (PhGA) were rated with a 0–10 numerical rating scale, every 6 months during 2 years then at 3 years. Discordance was defined by the absolute difference |PGA–PhGA|≥3 (range 0–10) and was analysed at each visit. Determinants of (PGA−PhGA) were assessed at the visit level by a generalised linear mixed model.
Results A total of 702 patients were analysed at baseline (401 with complete data over 3 years): mean age 33.8±8.6 years, 379 (54.0%) female, mean symptom duration 18.1±10.5 months. Mean PGA values were always higher than mean PhGA values with a mean absolute difference of 1.8 points. At baseline, 202 (28.8%) patients had discordance mainly by PGA>PhGA; over 3 years the frequency of discordance was stable (range 25.5–28.8%). Discordance was not stable at the patient level, 118 (29.4%) patients were discordant once and 88 (22.0%) twice, and only 92 (22.9%) more than twice. Determinants of (PGA−PhGA) were spine pain (β=0.24, p<0.001) and fatigue (β=0.13, p<0.001).
Conclusions Discordance concerned a quarter of patients with early axSpA. Over 3 years of follow-up, discordance did not decrease (no ‘reference shift’). Discordance was not a stable trait, indicating discordance is not a patient characteristic.
- Disease Activity
Statistics from Altmetric.com
In the management of chronic diseases and in rheumatology in particular, the medical decision should be based on a consultation between the physician and the patients.1–3 In rheumatic diseases, including rheumatoid arthritis (RA), axial spondyloarthritis (axSpA) and psoriatic arthritis, the integration of the patient in therapeutic decisions is an important aspect of management.4–6 However, the patient's opinion of disease activity does not always reflect the physician's opinion.7 ,8 Disagreements in the assessment of these diseases is a real problem, with an impact on treatment decisions and shared decision-making.9 One way to explain the gap in assessment of disease activity is to explore disagreements between patient global assessment (PGA) and physician global assessment (PhGA). Patient-physician discordance in the overall assessment of the disease can lead to patient dissatisfaction regarding treatment decisions which could negatively affect medical care with poor adherence, impact on the evolution of the disease and also a cost for the society.9 ,10 Most published data on patient-physician discordance concern RA.11–14 Discordance is usually defined as a difference in ratings of global assessment ≥3 points on a 0–10 scale.11 ,14 ,15 Around a third of patients with RA have ‘significant’ discordance with the physician in global assessment of disease and it appears patients often rate their disease activity higher than the physician.11 In RA, PGA may be based more heavily on patients’ subjective perception of pain and functional incapacity; in contrast, PhGA focuses on inflammation; that is, swollen and tender joint counts and acute phase reactants.11 This could argue for the impact of chronic widespread pain syndrome on discordance.
In axSpA, little is known about patient-physician discordance:16 only one article explored discordance in axSpA; and it was not based on PGA and PhGA.
Evolution of discordance over time is unknown. Given the impact of axSpA on quality of life with an alteration over time, it is possible that there may be a ‘reference shift’ in early disease leading to changes in discordance over the first few years. Indeed, there may be an adaptation of the patient over time, and the assessment of change would be biased if patients use a different ‘referential frame’ to judge their disease activity.17 Furthermore if discordance reflects patient characteristics and/or manifestations of chronic widespread pain syndrome, then we would expect discordance to be noted over consecutive visits for a given patient. However, longitudinal assessments of discordance are lacking.
The objectives of this study were to assess the presence and levels of discordance over time in early axSpA and the determinants of discordance.
Patients and method
Study population and study design
Devenir des Spondyloarthropathies Indifférenciées Récentes (DESIR) is a French prospective longitudinal cohort involving 25 rheumatology centres of patients with early inflammatory back pain suggestive of axSpA as previously published.18 Patients were recruited if they had inflammatory back pain suggestive of axSpA of more than 3 months and less than 3 years.18 A total of 708 patients were included in the cohort.
PGA and PhGA
PGA and PhGA were rated at each visit with a 0–10 numerical rating scale and were assessed respectively as ‘Please place a vertical mark on the scale below to indicate the effect your disease as had on your well-being over the last week’ with ‘none’ and ‘very severe’ as anchors, taken from the Bath Ankylosing Spondylitis Patient Global Score;19 and ‘Please mark below your overall assessment of the activity of the rheumatic disease’, with ‘inactive disease’ and ‘active disease’ as anchors.
The difference (PGA−PhGA) was analysed as a continuous value. Then, discordance was defined as a binary variable. There is no accepted definition of relevant discordance, a cut-off based on a ≥3/10 change being the most frequently chosen value for discordance in RA.11 ,14 ,15 In the present study, the absolute difference in rating between PGA and PhGA ≥3/10 points was considered as a relevant discordance. Based on the difference between PGA and PhGA, three groups were identified: concordant rating group (difference between PGA and PhGA within ±2), higher patient rating group (PGA exceeding PhGA by ≥3) and lower patient rating group (PGA lower than PhGA by ≥3). A sensitivity analysis was also done considering a difference in rating between PGA and PhGA ≥2/10 as a relevant discordance.
Discordance over time
Patients were followed every 6 months during the first 2 years then at 3 years: a total of six visits were performed during the 3 years for each patient. The difference (PGA−PhGA) and the frequency of discordance were analysed at each visit.
Determinants of discordance
Several potential exploratory factors were assessed at baseline. Patient demographic/general variables included age, sex, body mass index, ethnicity, level of education (postsecondary defined as longer than high school) and paid work status. Disease characteristics included duration of symptoms, fulfilment of Assessment of SpondyloArthritis international Society (ASAS) axSpA criteria,20 HLA-B27 status, past history of peripheral arthritis, dactylitis or extra-articular manifestations (uveitis, psoriasis, inflammatory bowel disease); and sacroiliitis (radiological or MRI, the evaluation was performed at two levels: local then central18). At each visit, pain (spine, joints and entheses) and fatigue (from the Bath Ankylosing Spondylitis Activity Index, BASDAI21) were assessed by a 0–10 numerical rating scale. Physical function was evaluated by the Bath Ankylosing Spondylitis Functional Index (BASFI)22 and quality of life by the Short-Form 36 Health Survey questionnaire.23 Physicians assessed 53 joints for tenderness, 28 joints for swelling and enthesitis count by the Maastricht Ankylosing Spondylitis Enthesitis Score (range 0–13).24 C reactive protein (CRP) was also measured at each visit.
At baseline, all patients with PGA and PhGA available were analysed. Demographic and clinical characteristic variables were expressed as means (SD) for quantitative data. Bland-Altman plot was used to explore the agreement between PGA and PhGA. Association between baseline characteristics and discordance (higher/lower/concordant groups) was performed using one-way analysis of variance, and factors related to (PGA−PhGA) were analysed by linear multivariate regression.
For longitudinal analyses only patients with all data available for PGA and PhGA for each of the six visits were included. The percentage of patients with discordance was also calculated, as sensitivity analysis, only for patients who fulfilled ASAS axSpA criteria (clinical or radiological). Mean absolute difference |PGA−PhGA| and the percentage of patients with discordance were analysed at each time point and compared by paired t tests, with a Bonferroni correction for multiple comparisons (p=0.0083). Discordance over time was calculated as an agreement statistic (κ coefficient) at each visit. Intraclass correlation for repeated data (PGA and PhGA) was calculated over all visits based on a generalised linear mixed model; 95% CI was calculated with a bootstrap procedure.
To identify variables explaining PGA−PhGA over time, a generalised linear mixed model was used with random effect for subjects and visits. Variables in the model (as fixed effect) were: clinical characteristics at each visit (CRP, components of the BASDAI, BASFI), demographic characteristics (age, sex, ethnicity, level of education, paid work status and ASAS axSpA criteria). Variables that were highly associated were not included in the model (HLA-B27, radiological or MRI sacroiliitis, Short-Form 36 Health Survey). Analysis was done in two steps: first, without imputation of missing data, then with multivariate imputations by chained equations only for the covariables of the mixed model. R (V.3.1.1) was used for all statistical analyses.
Demographic and clinical characteristics
A total of 702 patients were analysed at inclusion; six patients were excluded because of missing data on PGA. The mean age was 33.8±8.6 years, 379 (54%) were female, mean symptom duration was 18.1±10.5 months, 483 (68.8%) fulfilled the ASAS axSpA criteria, 409 (58.3%) carried the HLA-B27 (table 1). The mean tender joint count was 3.2±6.2, the mean swollen joint count was <1 and the mean enthesitis count was 1.2±2.3. Mean body mass index was 24.0±4.7 kg/m2. Longitudinal analysis of discordance included a total of 401 patients: 184 patients were lacking at least at one visit and 42 were present at all visits but had PGA or PhGA missing. The baseline characteristics of these patients were similar (table 1).
The absolute difference |PGA-PhGA| over time
Mean PGA values were always higher than mean PhGA values (table 2). However, while some patients had PGA>PhGA, others had PGA<PhGA. The mean absolute difference |PGA−PhGA| was 1.7 to 1.8 (across visits), with a median difference of 1 point at each visit.
Discordance at baseline
The mean scores for PGA and PhGA were respectively 5.1±2.6 and 4.3±2.2 (p<0.001) (table 2). Most patients and their physicians did not differ too greatly in their scores, as 500 (71.2%) patients had a global rating within 2 points of their physician's ratings. A total of 151 (21.5%) patients scored their disease activity 3 points or more above their physicians’ (higher patient rating group), whereas 51 (7.3%) patients scored their disease activity lower than their physicians’ (lower patient rating group) (table 2, figure 1 and see online supplementary figure S1). The difference (PGA−PhGA) at baseline did not differ greatly according to the level of symptoms (see online supplementary figure S2).
Discordance over time
The percentage of patients with discordance was 28.8% (202 patients) at baseline; over 3 years, the frequency of discordance was stable (range 25.5–28.8%) with a proportion of ‘higher patient rating group’ versus ‘lower patient rating group’ stable over time (table 2).
The sensitivity analysis using a difference in rating between PGA and PhGA ≥2/10 points showed more discordance, as expected. Around 50% of patients were discordant across visits (table 2 and see online supplementary table S1). The percentage of patients with discordance (using a difference in rating between PGA and PhGA ≥3/10 points) who fulfilled ASAS axSpA criteria was slightly lower (range 22.9–25.1%, data not shown).
Of 401 patients with six visits, 298 (74.3%) patients were discordant at least once: 118 (29.4%) were discordant once, 88 (22.0%) twice, 89 (22.2%) three to five times and only 3 (0.8%) patients were always discordant. Only 103 (25.7%) patients were not discordant at any time point (see online supplementary table S1). κ Coefficients for discordance across two visits indicated it was not the same patients who where discordant (κ range 0.03–0.28). Agreement assessment by intraclass correlation between PGA and PhGA over time was 0.43 (95% CI (0.39 to 0.47)).
Determinants of discordance
At baseline, the higher patient rating group (PGA>PhGA) was associated with: more female subjects, lower proportion of Caucasian patients, less ASAS axSpA criteria fulfilled, less HLA-B27 carried, less sacroiliitis (radiological or MRI) and more patients’ complaints (pain and fatigue) compared with the concordant rating group. The lower patient rating group (PGA<PhGA) had less female subjects, a higher proportion of ASAS axSpA criteria fulfilled, more patients carried the HLA-B27 and had an elevated CRP; pain, fatigue and BASFI were lower in this group (data not shown). In linear multivariate regression, only pain (p<0.001) and fatigue (p<0.001) were independently associated with (PGA−PhGA) (data not shown).
Over time (table 3), the generalised linear mixed model showed that spine pain and fatigue were the two predictors of the difference (PGA−PhGA) (respectively, β=0.24 and β=0.13, p<0.001 for both). Higher spine pain and higher fatigue were related to a higher difference between PGA and PhGA, through a higher PGA. Demographic variables and BASFI were not associated with discordance (table 3). Results with multiple imputations on covariables were similar (data not shown).
This study brings important information on the gap between patient and physician assessment and discordance over time in early axSpA. Over 3 years follow-up, mean PGA values were always higher than mean PhGA values with an absolute mean difference (0–10) around 1.8 points. Discordance defined by |PGA−PhGA|≥3 concerned 25.5–28.8% of patients and this percentage was stable across visits. However, discordance did not concern the same patients. The most important determinants of the difference between PGA and PhGA were spine pain and fatigue, which were related to a higher PGA.
Discordance between PGA and PhGA in the present study was around 25%. The only other publication in 203 patients with axSpA indicated that patients and physicians had different views on disease activity with discordant opinions in around 30% but using a different methodology from us.16 The rate of discordance in the present study is rather lower than discordance described in RA (around 36%), with the same cut-off (≥3/10) even in early RA.11 ,14 ,25 Some hypotheses for this difference might be that in RA there are objective criteria (joint counts, acute phase reactants), and the physician will base his PhGA on these criteria separately from PGA.11 In axSpA, there are few valid objective activity criteria: clinical examination is poor, MRI is not used for follow-up,26 CRP is often normal, as was indeed the case in 68.5% in the present study. Thus in axSpA PhGA is probably more influenced by PGA (or BASDAI) than in RA. In established RA, structural damage could explain a high PGA even in the absence of inflammation. In early axSpA (<3 years’ duration in DESIR cohort), in the absence of disease activity there are often no patients’ complaints (as spinal ankylosis is low).18
There is no standardised way to define discordance.15 A strength of the present study is that the difference between PGA and PhGA was analysed as a continuous value and as a binary value (similarly to previous studies in RA) but with two cut-offs based on the difference between PGA and PhGA. The cut-off of ≥3/10 points difference to define discordance has been frequently used in RA literature and has been reported by some authors as the minimum clinically important improvement in global assessment.27 In the present study, there were twice more patients who were discordant across visits when a difference in rating between PGA and PhGA ≥2/10 points was used (around 50% vs 25%). Such difference between two cut-offs was explained by an absolute mean difference between PGA and PhGA of 1.8 points. Few patients had a difference between PGA and PhGA ≥4/10 points. Our interpretation is that PhGA and PGA are often differing only by 1–2 points in axSpA.
Over 3 years of follow-up, although patients probably had a better understanding of their disease, discordance in global assessment did not decrease, suggesting there is no ‘reference shift’. The impact of health state changes on an individual's quality of life has gained increased attention in social and medical clinical research.28 An emerging construct of relevance to this line of investigation is the response shift phenomenon.28 This construct refers to the changes in the meaning of an individual's self-evaluation of a target construct, such as health-related quality of life and can affect the interpretation of change in measures of the construct collected over time.29 Response shift should be taken into account when assessing health-related quality of life to allow a more valid interpretation of treatment effects.30 Nevertheless, there is no published data on this phenomenon in chronic inflammatory rheumatic diseases. We expected that discordance would decrease over time as patients would have a better understanding of the disease. However, this was not the case.
An important point highlighted in this study was that discordant patients were mostly not the same across visits; indeed 74.3% of patients were discordant at least once, but only 10.5% more than three times. Given the possible coexistence of axSpA and fibromyalgia31 ,32 a hypothesis could be that patients with discordance might have fibromyalgia and would then be discordant at all visits. However, this was not the case, for most patients with discordance. This suggests that discordance is not a patient-specific trait, but rather characterises a visit. These findings may profoundly change current approaches to the patient-physician discordance gap in assessment. Taking into account shared decision-making, as discordance may occur only once it should not affect the physicians’ therapeutic decision at the discordant visit. However, it is noteworthy that 10.5% of patients were repeatedly discordant: in such a situation the physician may want to reconsider the patient status. The physician characteristic probably does not explain the difference between PGA and PhGA: in DESIR for a given patient, it is always the same physician who evaluates the patient. Nevertheless, no data studied physicians’ characteristics in the patient-physician discordance.
Most of the discordance was associated with patients rating axSpA activity higher than their physician. Who is right? Some hypotheses might be that physicians did not detect signs of disease activity; this may be the case if patients had transient flares.33 ,34 Of note, discordance was not related to the level of symptoms (see online supplementary figure S2). Physicians might not consider the occurrence of a personal life event affecting the PGA. Another cause might be the potential impact of unmeasured cultural factors on disease activity assessment by the patient.
In the present study, spine pain was overwhelmingly the most important determinant of the difference (PGA−PhGA) followed by fatigue. Both these factors explained a higher PGA than PhGA, at baseline and over time.
In the axSpA study on patient-physician perspective of disease activity, patients based their judgement on the presence of complaints related to axSpA and the impact of spinal mobility was almost neglected; whereas, the physician's judgement about disease activity was importantly influenced by assessment of spinal mobility and function.16
In the present study, the second determinant of discordance was fatigue. The feeling of invisibility and difficulty to describe the experience of fatigue might explain why this is less well taken into consideration by the physician. Furthermore, there might exist a mutual reinforcement of fatigue and pain.36 Depressive symptom may be a determinant of discordance.37 ,38 In the present study unfortunately, a depression measure was not available. Two studies in RA reported the Health Assessment Questionnaire to be associated with higher PGA,13 ,14 whereas another did not.12 In our study, we did not analyse the Health Assessment Questionnaire but rather the BASFI in the generalised linear mixed model and the BASFI was not associated with discordance.
This study has strengths and weaknesses. The questions used for PGA and PhGA were not identical. The Bath Ankylosing Spondylitis Patient Global Score reflects the effect of axSpA on the patient's well-being over the last week, that is, it is truly a global assessment of the patient.19 On the other hand, PhGA evaluated disease activity. This could explain, at least partly, discordance. However, other studies on patient-physician discordance in RA used similar questions for PGA and PhGA.12 ,14 Patients completed the study questionnaire prior to the physician and the physicians were aware of PGA. However, this is the case in clinical practice. Thus, this study from that point of view reflects discordance as might be seen in clinical practice. The present study included patients with inflammatory back pain suggestive of axSpA, however, 31.2% of the patients did not fulfil the ASAS axSpA criteria indicating a heterogeneous population. It is noteworthy that discordance was slightly higher in patients without a confirmed axSpA. The understanding of the disease when axSpA is not confirmed may be more difficult.
A main strength of this study was the number of patients with repeated data over 3 years follow-up, whereas all published data on discordance are cross-sectional. This is, to our knowledge, the first study of discordance over several time points and the fact that discordance affects many patients only once may profoundly change the way discordance is currently approached.
In conclusion, in axSpA care where objective measures are often limited and subjective, patient-reported information becomes the only available option for treatment decision-making.39 More work needs to be done on how the discordance between PGA and PhGA impacts clinical outcomes for our patients and whether interventions to reduce discordance would improve outcomes.
The DESIR cohort is conducted under the control of Assistance Publique-Hopitaux de Paris via the Clinical Research Unit Paris-Centre and under the umbrella of the French Society of Rheumatology and INSERM (Institut National de la Santé et de la Recherche Médicale). The database management is performed within the department of epidemiology and biostatistics (Professor Paul Landais, DIM, Nîmes, France).The authors thank the different regional participating centres: Pr Maxime Dougados (Paris—Cochin B), Pr André Kahan (Paris—Cochin A), Pr Olivier Meyer (Paris—Bichat), Pr Pierre Bourgeois (Paris—La Pitié-Salpetrière), Pr Francis Berenbaum (Paris—Saint Antoine), Pr Pascal Claudepierre (Créteil), Pr Maxime Breban (Boulogne Billancourt), Dr Bernadette Saint-Marcoux (Aulnay-sous-Bois), Pr Philippe Goupille (Tours), Pr Jean-Francis Maillefert (Dijon), Dr Xavier Puéchal (Le Mans), Pr Daniel Wendling (Besançon), Pr Bernard Combe (Montpellier), Pr Liana Euller-Ziegler (Nice), Pr Philippe Orcel (Paris—Lariboisière), Pr Pierre Lafforgue (Marseille), Dr Patrick Boumier (Amiens), Pr Jean-Michel Ristori (Clermont-Ferrand), Dr Nadia Mehsen (Bordeaux), Pr Damien Loeuille (Nancy), Pr René-Marc Flipo (Lille), Pr Alain Saraux (Brest), Pr Corinne Miceli (Le Kremlin Bicêtre), Pr Alain Cantagrel (Toulouse), Pr Olivier Vittecoq (Rouen).
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Handling editor Tore K Kvien
Contributors Conception and design: CD, LG and BF. Analysis and interpretation of data: CD, LG and BF. Drafting the manuscript: CD and LG. Critical revision of the manuscript for important intellectual content: CD, LG, BF, AS, BG and AM. Statistical analysis: CD, BG and AM. Supervision: LG, BF, BG, AM and AS.
Funding French Society of Rheumatology, Master grant (2802). The DESIR cohort is financially supported by unrestricted grants from the French Society of Rheumatology and Pfizer, France.
Competing interests None declared.
Patient consent Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.