Article Text
Abstract
Objective To evaluate the impact of laboratory results on scoring of the Physician Global Assessment (PGA) of disease activity in systemic lupus erythematosus.
Methods Fifty clinical vignettes were presented via an online survey to a group of international lupus experts. For each case, respondents scored the PGA pre and post knowledge of laboratory test results (pre-lab and post-lab PGAs). Agreement between individual assessors and relationships between pre-lab and post-lab PGAs, and PGAs and Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K) were determined. Respondents were also asked about factors they incorporate into their PGA determinations.
Results Sixty surveys were completed. The inter-rater PGA reliability was excellent (pre-lab intraclass correlation coefficient (ICC) 0.98; post-lab ICC 0.99). Post-lab PGAs were higher than pre-lab PGAs: median (IQR) pre-lab PGA 0.5 (1.05), post-lab PGA 1 (1.3) (p<0.001), with a median (IQR) difference of 0.2 (0.45). In general, all abnormal labs including elevated anti-double stranded DNA antibody level (dsDNA) and low complement impacted PGA assessment. Cases with weakest correlations between pre-lab and post-lab PGA were characterised by laboratory results revealing nephritis and/or haematological manifestations. Both pre-lab and post-lab PGAs correlated with SLEDAI-2K. However, a significantly stronger correlation was observed between post-lab PGA and SLEDAI-2K. Multiple factors influenced PGA determinations. Some factors were considered by an overwhelming majority of lupus experts, with less agreement on others.
Conclusions We found excellent inter-rater reliability for PGAs in a group of international lupus experts. Post-lab PGA scores were higher than pre-lab PGA scores, with a significantly stronger correlation with the SLEDAI-2K. Our findings indicate that PGA scoring should be performed with knowledge of pertinent laboratory results.
- systemic lupus erythematosus
- outcomes research
- disease activity
Statistics from Altmetric.com
Key messages
What is already known about this subject?
The physician global assessment (PGA) is a key outcome measurement of lupus disease activity used in longitudinal cohorts and clinical trials.
What does this study add?
The inter-rater reliability of the PGA has not been previously assessed, and in this study was excellent among lupus experts.
Including laboratory test results when determining the PGA generally results in a higher PGA score.
PGAs scored with knowledge of laboratory values have a better correlation with other validated disease activity measures than PGAs scored prior to knowledge of these data.
How might this impact on clinical practice or future developments?
Scoring of the PGA with access to laboratory test results is recommended. Simple, clear and precise instructions for PGA scoring may improve its performance and consistency in clinical practice and in clinical trials.
Introduction
The Physician Global Assessment (PGA) is a frequently used, subjective outcome measure of disease activity in systemic lupus erythematosus (SLE) which encapsulates the physician’s judgement of overall, (global) disease activity. It is used in clinical trials, observational cohorts and in clinical care, often supplementing other assessments such as the SLE Disease Activity Index (SLEDAI) and the British Isles Lupus Assessment Group (BILAG) Index. The PGA is also an integral part of SLE treatment response indices. It is included in the SLE Responder Index (SRI) and BILAG-based Combined Lupus Assessment (BICLA),1 2 and used to define low disease activity (Lupus Low Disease Activity State-LLDAS) and remission.3 4 Surprisingly, PGA scoring is typically performed with minimal guidance and accordingly, factors determining the physician global assessment of lupus disease activity are incompletely understood. Some clinicians score the PGA at the time of the patient encounter while others determine the PGA after receipt of laboratory results. A single-clinician pilot study performed in a U.S. outpatient setting suggested that there is significant variability between the PGA scored before and after receipt of laboratory test results; furthermore, the PGA scored with knowledge of laboratory results had a better association with SLEDAI.5
The current study was initiated to formally evaluate the effect of laboratory test results on PGA scoring in SLE and was performed on 50 lupus case scenarios by a group of international lupus experts. Agreement between individual assessors was determined. Additionally, factors influencing the physician assessment of lupus disease activity were explored.
Methods
Survey design
An invitation to participate was sent to a group of 194 international rheumatologists with expertise in SLE. None were involved in the survey design. Fifty de-identified clinical vignettes based on real-life cases, spanning the spectrum of SLE manifestations and severity (SLEDAI-2K range: 0 to 28), were presented via an online survey using SurveyMonkey.
Each case was organised in the same format and included demographic descriptors (sex, age and occupation), past SLE history, detailed current disease features, treatment and laboratory test results. Clinical characteristics of the cases are presented in table 1. Case pairs with and without inclusion of laboratory test results were presented in random order to each expert, who was asked to first score the PGA without laboratory test results (‘pre-lab’). The case vignette was then re-presented with inclusion of results of laboratory tests and the respondent was next asked to score the ‘post-lab’ PGA. Only forward progression through the survey was allowed; however, participants were able to log in multiple times until the survey was fully completed. Scoring of the PGA with the usual instructions ‘How do you rate your patient’s current disease activity?’ was performed using the well established anchored scale of 0 to 3 with 0=none, 1=mild, 2=moderate and 3=most active disease imaginable.6 Following the assessment of cases, respondents were asked to answer 10 qualitative questions in order to clarify how they incorporated data into their PGA scoring. Multiple reminders were sent to complete the study and only completed surveys were used in this data analysis.
Patients were not involved in this research.
Statistical analysis
Measures of central tendency and spread were used to describe responses, and Pearson’s correlation coefficient (CC) was used to evaluate the relationship between pre-lab and post-lab PGA and between pre-lab and post-lab PGA and SLEDAI-2K. Inter-rater reliability of PGA responses was assessed using the intraclass correlation coefficient (ICC) with a two-way random effect model (2,k). Data were analysed using Stata V.15 (StataCorp, College Station, Texas).
Results
Respondent demographics
Completed surveys were received from 60 respondents, providing a data set of 3000 unique paired responses. Demographic characteristics of the respondents are presented in table 2 (See Acknowledgements for a list of contributors).
Pre-laboratory and post-laboratory PGA scores
Pre-lab PGA scores ranged from 0 to 2.25 and post-lab PGA scores ranged from 0.4 to 2.5. Inter-rater PGA reliability among lupus experts was excellent (pre-lab PGA ICC 0.98; post-lab PGA ICC 0.99). Descriptive statistics (median, IQR) for the individual case PGA scores are shown in online supplementary table S1. The spread of PGA scores according to SLEDAI-2K score groupings dividing cases into mild, moderate or severe disease activity (SLEDAI-2K <6, 6 to 11 and >11) are shown in online supplementary figure S1A (pre-lab PGA) and S1B (post-lab PGA). There was significantly greater variability of pre-lab PGAs for cases with higher SLEDAI-2K and significantly greater spread of post-lab PGAs for cases with intermediate SLEDAI-2K compared with cases with low or high SLEDAI-2K scores. Knowledge of laboratory results influenced PGA scoring. The median post-lab PGA (1.0, IQR 1.3) was higher than pre-lab PGA (0.5 (1.05)) (p<0.001), in all but two cases (table 3). In two cases, the post-lab median PGA was lower than the pre-lab PGA score; these cases represented two of the six vignettes with no abnormal laboratory results. The delta-PGA was defined as the difference between the post-lab and pre-lab PGA. The median (IQR) delta-PGA, for all case pairs was 0.2 (0.45) (table 3).
Supplemental material
Supplemental material
The correlation between pre-lab and post-lab PGA was moderately strong (Pearson correlation coefficient=0.682 (95% CI 0.030 to 0.621). In 20 cases, the CC was ≥0.8; in 14 cases, the CC was 0.6 to 0.79 and in 16 cases, the CC was ≤0.59. The 16 cases with the poorest correlation between pre-lab and post-lab PGA (CC ≤0.59) were primarily vignettes in which laboratory data revealed lupus nephritis and/or haematological manifestations (online supplementary table S2). The median delta PGA was ≥0.3 in 19 cases. A higher pre-lab PGA appeared to be predictive of a median delta PGA of ≥0.3, as the delta PGA was ≥0.3 in half of the cases with a pre-lab PGA between 1 and 2, in comparison to 35% of those with pre-lab PGA between 0 and 1 (online supplementary table S3).
Supplemental material
Supplemental material
Influence of laboratory abnormalities on PGA score
In general, all abnormal laboratory results, including elevated anti-dsDNA antibody and low complement levels impacted the scoring of the PGA (table 3). Laboratory abnormalities with the greatest impact on post-lab PGA were urinalysis (presence of urinary casts, pyuria, haematuria and proteinuria), thrombocytopenia and elevations of erythrocyte sedimentation rate (ESR) >40 mm/hour and C-reactive protein (CRP) >10 mg/L, each with a median delta PGA greater than 0.5. Hypocomplementaemia, mild anaemia, leucopenia and elevations of ESR >20 mm/hour and CRP >5 mg/L had less influence on PGA scores, but each was associated with a median delta PGA ≥0.3.
Relationship between PGA and SLEDAI-2K
Disease activity assessed using the SLEDAI-2K ranged from 0 to 28. There was a moderately strong correlation between SLEDAI-2K and PGA, which was significantly higher for post-lab (CC=0.79, 95% CI 0.48 to 0.80) than pre-lab (CC=0.67, 95% CI 0.66 to 0.88) PGA, Steiger’s test p value=0.038 (figure 1A and B).
Qualitative factors determining PGA scoring
Most respondents (86.7%) found the instructions for PGA scoring (‘How do you rate this patient’s current disease activity’) understandable, although several respondents desired a more specific definition of ‘current’, that is, the duration of time incorporated by the PGA (table 4). The majority of respondents (83.3%) also found the Visual Analogue Scale of 0 to 3 easy to use; a suggestion for improvement was to ‘change the scale to 0 to 10 to allow greater precision in differentiating disease activity’. Another comment was that ‘subdecimal differences were not meaningful’.
Factors influencing respondents’ PGA scoring are shown in table 4. Approximately two-thirds of respondents indicated that they take prior disease manifestations into account, largely to give context to a patient’s current disease manifestations, and to help judge new activity compared with chronic ongoing activity. The amount of time incorporated into the definition of ‘current disease activity’ prior to the PGA assessment visit varied; 73% considered 7 to 10 days (36.7%) or 1 month (36.7%) as their timeframe for ‘current’ disease activity (see table 4). Disease activity during the day of the visit or during the previous 1 to 2 days or up to 3 or 6 months was considered by 10%. The majority of respondents indicated that they also factor current medications into their PGA score. Approximately one-third of respondents indicated that they took prior damage and medications into account when scoring the PGA and over 80% of respondents considered patient-reported factors such as pain, fatigue, functional limitation and impaired quality of life in their PGA determinations. Serological markers, that is, hypocomplementaemia and the presence of dsDNA antibodies were incorporated into PGA scoring by 88.7% and 85% of respondents (respectively), while 50% reported that ESR or CRP values influenced their scoring of the PGA. Lastly, respondents reported that before including any feature into their overall assessment of disease activity, that they needed to be at least 80% certain that the feature was attributed to SLE.
Discussion
This study is the first to formally assess factors that impact the scoring of the PGA in evaluating SLE disease activity. The PGA is a subjective measure which is dependent on on the assessor’s clinical judgement. It is the summation of the clinical and laboratory information for a particular patient encounter. The data presented here demonstrate that when scored by physicians with expertise in SLE, the PGA performs well across continents and cultures. Although some cases had a wide range of PGA scores, there was a high degree of agreement among these respondents, as ICCs for inter-rater reliability between the PGAs of assessors were 0.98 and 0.99 for pre-lab and post-lab PGAs, respectively. This finding, along with the good correlation between the PGA and SLEDAI-2K, provides reassurance that the PGA is a valid outcome measure. This is an important conclusion given that the PGA is a component of multiple composite measures used in clinical trials evaluating new treatments for SLE.
An essential finding is that PGAs scored with inclusion of laboratory data were consistently and significantly higher than those scored without these data. Importantly, PGAs scored with knowledge of laboratory data correlated significantly better with the SLEDAI-2K, a validated instrument of lupus disease activity. This suggests that the post-lab PGA is a more accurate measure of SLE disease activity than a PGA determined before knowledge of laboratory results. Our vignettes were based on real-life patients and included cases in which the only abnormalities were elevated inflammatory markers and/or serological abnormalities. Not surprisingly, the cases with the largest differences between pre-PGA and post-PGA determinations were those in which laboratory results revealed abnormalities in the renal or haematological domains. We found that in 40% of the cases evaluated in this study, meaningful differences (>0.3 units) were noted between PGA scores with and without laboratory results. This is particularly relevant as both the SRI and BICLA, composite response indices used in clinical trials, require that the PGA not increase by more than >0.3 units.
The PGA was first incorporated into an instrument assessing SLE disease activity in the Lupus Activity Index (LAI) in the 1990’s.7 As the final score of the LAI included laboratory variables in a separate domain, the PGA in this context was based solely on the physician’s assessment of disease activity at the time of the patient visit, that is, without laboratory values. The PGA has subsequently been used as a stand-alone instrument in clinical trials as well as in longitudinal lupus cohorts to measure lupus disease activity. While waiting for laboratory data may be logistically cumbersome and may delay the completion of the PGA, most assessors incorporate laboratory values into their PGA scores, as intuitively, ‘global’ disease activity integrates all features of disease including laboratory data. Similarly, treatment decisions in both the clinic and in clinical trials are usually finalised only after laboratory values are known. While some clinical trials allow PGA scoring to occur when laboratory values have been received, other trials have mandated that the PGA be completed at the time of the clinical trial visit, that is, without knowledge of the laboratory values accompanying the visit. This inconsistent scoring process can lead to discrepancies between the PGA score and treatment decisions.
Following the determinations of pre-PGA and post-PGA scoring of the clinical vignettes, respondents completed a qualitative survey regarding the factors they consider relevant when scoring the PGA, and its instructions. Despite the excellent agreement of PGA scores between respondents, we noted marked differences in the factors influencing their scoring. Clear, precise and concise instructions will likely improve PGA accuracy and consistency in clinical practice as well as in SLE clinical trials.
There are several limitations of this study. Although strengths of this study include the large number of cases and the number of international experts evaluating each case, the excellent observed inter-rater reliability is based on respondents who are lupus experts. These data may not be generalisable to general rheumatologists or other physicians who treat fewer patients with SLE. Although the case vignettes were based on real patient scenarios, the use of vignettes as opposed to real patients is an additional limitation and may have resulted in higher inter-rater reliability than if actual patients had been evaluated.
In conclusion, the current study reveals a very high level of agreement in PGA scoring of case vignettes among lupus experts, and shows that the availability of laboratory data significantly impacts PGA scoring. PGAs performed with knowledge of laboratory testing results in a score which is higher and which has a significantly greater correlation with the SLEDAI-2K. The variability of factors reported by respondents which contribute to their PGA determination supports further evaluation and refining of PGA scoring instructions. Precise guidelines for the completion of the PGA will likely improve its performance as an outcome measure in both observational studies and clinical trials.
Acknowledgments
We thank the following rheumatology experts for their participation: Graciela Alarcon, Simone Appenzeller, Martin Aringer, Sang-Cheol Bae, H. Michael Belmont, George Bertsias, Ricard Cervera, Megan Clowse, Nathalie Costedoat-Chalumeau, Mary E Cronin, Maria Dall'Era, Andrea Doria, Thomas Dörner, Rebecca Fischer-Betz, Richard Furie, Gary Gilkeson, Dafna Gladman, Fiona Goldblatt, Laniyati Hamijoyo, Murat Inanc, Peter Izmirly, Kenneth Kalunian, Diane Kamen, David Karp, Yasuhiro Katsumata, Alfred Kim, Kyriakos Kirou, Evandro M Klumb, Kichul Ko, Fotios Koumpouras, Aisha Lateef, Deborah Levy, Roger A Levy, Juan Javier Lichauco, Anita Lim, Julie Li-Yu, Worawit Louthrenoo, Odirlei Andre Monticielo, Eric Morand, Peter Nash, Sandra V Navarra, Ola Nived, Marzena Olesinska, Sean O'Neill, Anisur Rahman, Rosalind Ramsey-Goldman, Fancine Machao Ribeiro, Violeta Rus, Amit Saxena, Matthias Schneider, Allan Sturgess, Katherine Thanou, Zahi Touma, Murray Urowitz, Carlos Vasconcelos, Alexandre Voskuyl, Cesarius Singgih Wahono, Daniel Wallace, Kristy Yap, Elena Zakharova.
Footnotes
Handling editor Josef S Smolen
CA and AA contributed equally.
Presented at This work was presented as a poster presentation at the 2018 American College of Rheumatology/Association of Rheumatology Professionals Annual Meeting: Aranow C, Askanase A, Huq M, et al. Laboratory Investigation Results Influence Physician’s Global Assessment of Disease Activity in Systemic Lupus Erythematosus (abstract). Arthritis Rheumato. 2018;70 (suppl 10).
Correction notice This article has been corrected since it published Online First. A typographical error has been corrected in the title and the funding statement has been updated.
Contributors Study design: CA, AA, MN. Data collection: CA, AA, MN, AC. Data analysis: CA, AA, MN, SO, AC, EFM, MH. Interpretation of findings: CA, AA, MN, SO, AC, EFM, MH. Preparation of manuscript: CA, AA, MN, SO, AC, EFM, MH. All authors read and approved the final manuscript. Please note that CA and AA contributed equally to this work.
Funding This work was in part supported by funding from the Lupus Research Alliance/Lupus Therapeutics (LRA/LT) to Cynthia Aranow and Anca Askanase. Mandana Nikpour holds an NHMRC Fellowship (APP1126370).
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.
Patient consent for publication Not required.
Ethics approval Ethics approval for the study was obtained from the Human Research Ethics Committee of St Vincent’s Hospital Melbourne, Australia.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.