Article Text
Abstract
Objectives This study tested the concurrent validity of the systemic lupus erythematosus responder index (SRI) in assessing improvement in juvenile-onset systemic lupus erythematosus (jSLE).
Methods The SRI considers changes in the SELENA–SLEDAI, BILAG and a 3-cm visual analogue scale of physician-rated disease activity (PGA) to determine patient improvement. Using prospectively collected data from 760 unique follow-up visit intervals of 274 jSLE patients, we assessed the sensitivity and specificity of the SRI using these external standards: physician-rated improvement (MD-change), patient/parent-rated major improvement of wellbeing (patient-change) and decrease in prescribed systemic corticosteroids (steroid-change). Modifications of the SRI that considered different thresholds for the SELENA–SLEDAI, BILAG and 10-cm PGA were explored and agreement with the American College of Rheumatology/PRINTO provisional criteria for improvement of jSLE (PCI) was examined.
Results The sensitivity/specificity in capturing major improvement by the MD-change were 78%/76% for the SRI and 83%/78% for the PCI, respectively. There was fair agreement between the SRI and PCI (kappa=0.35, 95% CI 0.02 to 0.73) in capturing major improvement by the MD-change. Select modified versions of the SRI had improved accuracy overall. All improvement criteria tested had lower sensitivity when considering patient-change and steroid-change as external standards compared to MD-change.
Conclusions The SRI and its modified versions based on meaningful changes in jSLE have high specificity but at most modest sensitivity for capturing jSLE improvement. When used as an endpoint of clinical trials in jSLE, the SRI will provide a conservative estimate regarding the efficacy of the therapeutic agent under investigation.
- Systemic Lupus Erythematosus
- Epidemiology
- Disease Activity
Statistics from Altmetric.com
Introduction
Juvenile-onset systemic lupus erythematosus (jSLE) is a complex autoimmune disease often with multi-organ involvement. However, inflammation in the various organ systems is often discrepant.1 No single clinical sign or laboratory test can be used to determine whether a clinically relevant improvement of jSLE has occurred or not. When assessing the overall course of jSLE, changes in all the various organ systems that are affected need to be integrated.2
Several composite indices are available to capture improvement of children or adults with systemic lupus erythematosus (SLE).3–6 The Pediatric Rheumatology International Trials Organization/American College of Rheumatology (ACR) provisional response criteria (PCI) were developed and validated for jSLE,7 ,8 while the systemic lupus erythematosus responder index (SRI) has been used in clinical trials of adults with SLE in support of the efficacy of belimumab for SLE treatment9 and is being tested in clinical practice. Information about the performance characteristics of the SRI in jSLE is lacking. Phenotypic differences between adults and children with SLE mean that one cannot assume a priori that response criteria suitable for measuring treatment responses in adults with SLE would be suitable for use in jSLE.10 However, to compare the efficacy of medications between adults and children with jSLE, it is highly desirable to use similar, or if possible identical, efficacy measures.
Therefore, the objectives of this study were: to determine the accuracy of the SRI in jSLE; to evaluate potential modifications of the SRI for assessing clinically relevant improvement of jSLE; and to compare the performance of the SRI with that of the PCI for capturing improvement in jSLE.
Materials and methods
Patients and visits
We combined data from three prospective jSLE cohorts.8 ,11 ,12 For these cohorts, children (n=274) fulfilling ACR classification criteria for SLE before the age of 16 years were recruited from 15 paediatric rheumatology centres and studied every 3 months for up to 18 months.13 All patients included had a minimum of two visits, with a median number of four visits (range 0–10). Patients were recruited over a period of 64 months for the three cohorts. The disease course of jSLE was rated by the treating paediatric rheumatologist and the parent (or patient) at each study visit. The disease activity measures were filled out by the site principal investigators. The study was approved by the institutional review boards of the participating centres. Informed consent and, as appropriate, assent were obtained.
Disease activity
The summary score of the safety of oestrogens in lupus erythematosus: national assessment version of the systemic lupus erythematosus disease activity index (SELENA–SLEDAI) is the sum of 24 weighted items that were present within the preceding 10 days.14 We also collected the physician-rated disease activity from the 3 cm visual analogue scale (PGA3: 0=none; 1=mild; 2=moderate; 3=severe) component of the SELENA–SLEDAI and the 10 cm visual analogue scale (PGA10: 0=inactive, 10=very active).
The British Isles Lupus Activity Group index (BILAG) features alphabetical domain scores to reflect the changing severity of clinical manifestations and physician's intention to treat across eight organ domains.15 The most commonly employed scheme to convert the alphabetic into numeric BILAG scores in jSLE is that by Stoll et al16 (BILAGStoll: A=9, B=3, C=1, D/E=0).
Measures of improvement (response criteria) tested
Systemic lupus responder index
The SRI considers changes of the SELENA–SLEDAI, the BILAG and the PGA3. The SRI defines a responder as a patient whose disease course fulfils all of the following: (1) reduction of the SELENA–SLEDAI score of 4 or more; (2) no new BILAG A or no more than one new BILAG B domain score; and (3) no worsening of the PGA3 by 0.3 or more points. These cut-off values have been shown in the past to constitute clinically meaningful changes of these disease measures in adults with SLE.17
Modified versions of the SRI
Table 1 provides details about five modified versions (SRIa–e) of the SRI assessed in this study. The SRIa–e versions consider previously published minimal clinically important differences (MCID) of the SELENA–SLEDAI, BILAG and PGA when used in jSLE.11 In particular, modifications considered included: (1) traditional standard error measurement (±1 standard error mean or SEM) MCID thresholds (SRIa and SRIb); (2) MCID thresholds at ±1.645 SEM (SRIc); and (3) MCID thresholds that reflect the 70% and 90% probability of detecting meaningful changes (SRId and SRIe, respectively). Additional details about these threshold values are provided elsewhere.11
PRINTO/ACR criteria of improvement of jSLE (PCI)
Improvement of jSLE as per the PCI is based on relative changes of the jSLE core response variables (CRV). They are the PGA10, visual analogue scale of parent or patient assessment of wellbeing, the child health questionnaire—physical function summary score, scores of a disease activity index (here SELENA–SLEDAI), and timed proteinuria (here spot urine protein : creatinine ratio). According to the PCI, a clinically relevant improvement of jSLE has occurred if there is at least a 50% improvement of at least two CRV without concomitant worsening of more than one of the remaining CRV by more than 30%.18
External standards
Four commonly employed external standards were used to assess the concurrent validity of the response criteria tested.
Physician-rated change of jSLE disease activity (MD-change)
The treating rheumatologist was asked to complete the sentence stem, ‘Compared to the last study visit 3 months ago and the patient's overall disease, the patient experienced a _____’, with the response denoting change in the disease course between consecutive visits on a five-point Likert scale (major improvement of disease, minor improvement of disease, no change in disease, minor flare of disease or major flare of disease). For the analysis, we dichotomised the responses of MD-change using two approaches: (1) MD-change major: ‘major improvement’ versus ‘no major improvement’ (ie, minor improvement of disease, no change in disease, minor flare of disease or major flare of disease); and (2) MD-change any: ‘any improvement’ (ie, major or minor improvement of disease) versus ‘no improvement’ (ie, no change in disease, minor flare of disease or major flare of disease).
Patient/parent rating of patient wellbeing with jSLE (patient-change)
The patient or his/her parent completed the sentence stem, ‘Compared to the last study visit 3 months ago, and when considering medications, school, work, life at home, doctor visits, pains, and feelings, the overall well-being is _____’, with the responses denoting change in the patient/parent's perspective of the course of jSLE (much improved, somewhat improved, unchanged, somewhat worse, or much worse). For the analysis, we dichotomised the responses to ‘much improved’ versus ‘not much improved’, ie, somewhat improved, unchanged, somewhat worse or much worse.
Decrease in prescribed systemic corticosteroids (steroid-change)
We also assessed the change in the average daily dose of systemic corticosteroids prescribed to the patient to define a clinically relevant improvement of jSLE. Improvement was defined as any decrease in oral steroid dose and frequency plus cessation of intravenous pulse steroids.
Statistical analysis
Visits with SELENA–SLEDAI scores less than 2 were excluded in the analysis as the smallest MCID for the SELENA–SLEDAI in the response criteria tested is a change of two points. We determined sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV) of the various response criteria compared to the external standards (MD-change major, MD-change any, patient-change and steroid-change). As done in the past, the overall accuracy of each response criterion was estimated by multiplying its sensitivity by its specificity.7 ,19 We tested the strength of agreement among various response indices using the kappa statistic with agreement interpreted as follows: 0, poor; 0.01–0.2, slight; 0.21–0.4, fair; 0.41–0.6, moderate; 0.61–0.8, substantial; and 0.81–1, almost perfect.20
In exploratory analyses, we assessed the effect of differences in the baseline characteristics of the patients on the performance of the response criteria tested. In particular, we tested whether the measurement characteristics of the response criteria differed in patients with low to moderate disease activity (SELENA–SLEDAI 4–7) versus high disease activity (SELENA–SLEDAI ≥8) at baseline.
We evaluated the relative contribution of the SELENA–SLEDAI, BILAG and PGA3 to the overall accuracy of the SRI (see supplementary table S1, available online only). We also evaluated the accuracy of the candidate response criteria using data when baseline SELENA–SLEDAI scores were 4 or greater, 6 or greater and 12 or greater (see supplementary tables S2–S4, available online only).
We considered p values of 0.05 or less to be statistically significant and performed statistical analysis with SAS (V.9.2) and Microsoft Excel (V.2008).
Results
Patient characteristics and disease activity
We considered 274 jSLE patients and their data on 760 unique between-visit intervals (ie, follow-up visits every 3 months). Patients constituted a convenience sample recruited during routine clinic visits to the paediatric rheumatology providers for jSLE care. Demographics and clinical characteristics of the patients at the baseline visit are shown in table 2. The patient composition of the study population was comparable to north American cohorts.21 ,22
Physician-rated improvement of jSLE
Table 3 summarises the performance of the various response criteria when considering the four external standards used in this study to define the presence versus absence of jSLE improvement.
In capturing MD-change major, the sensitivity of the SRI was 78% with a specificity of 76%, PPV of 33%, NPV of 93% and accuracy of 0.59. Among the modified versions of the SRI, the SRIa and SRIb had higher sensitivity at 83% and 85%, respectively, with comparable specificity to the SRI. The SRIc and SRId approximated the performance of the SRI with similar sensitivity and specificity. The SRIe had higher specificity but lower sensitivity than the SRI (see supplementary figure S1, available online only). Compared to the SRI, the sensitivity of the PCI (83%) was better for capturing physician-rated major improvement. The specificity, PPV, NPV and accuracy of the PCI were 78%, 43%, 93% and 0.65, respectively.
All of the response criteria considered had lower sensitivity with expected higher specificity in capturing MD-change any compared to MD-change major (table 3).
Among the disease measures comprising the SRI, the SELENA–SLEDAI had the most relative contribution of the overall accuracy of the SRI. Exclusion of the BILAG or PGA3 from the SRI did not change the accuracy of the SRI in detecting improvement using MD-change as the external standard (see supplementary table S1, available online only).
Family-rated improvement of jSLE
The accuracy of the response criteria were lower for capturing jSLE improvement when rated by the families (patient-change) compared to jSLE improvement rated by the treating physician (MD-change any, MD-change major). The PCI, SRI and SRIc were the most sensitive indices (40–43%), while the SRIe was the most specific (87%) response index for patient-change (table 3).
jSLE improvement based on tapering of systemic steroids
The accuracy of the response criteria considered was also poor for capturing steroid-change due to low sensitivity. The SRI had comparable sensitivity (33–35%) with the modified SRI versions and the PCI when considering steroid-change as the external standard while the SRIe, again, was the most specific (84%, table 3).
Measurement characteristics of response criteria for patients with low to moderate versus high disease activity at baseline
There were limited events with ‘major improvement’ for low–moderate baseline SELENA–SLEDAI. Nevertheless, the SRI had lower accuracy than the PCI when considering different levels of baseline disease activity as defined by the SELENA–SLEDAI (table 4 and see supplementary tables S2–S4, available online only). The SRI was more specific while the PCI was more sensitive when the baseline disease activity was low to moderate (ie, SELENA–SLEDAI 4–7). At higher baseline disease activity (ie, SELENA–SLEDAI ≥8), the converse was observed (ie, the SRI was more sensitive and the PCI more specific) (table 4).
Agreement among response criteria
There was fair to moderate agreement between the SRI and the PCI (κ=0.35, 95% CI 0.02 to 0.73; p=NS and κ=0.58, 95% CI 0.45 to 0.72; p=NS) for physician-rated improvement (MD-change major, MD-change any). As expected, the degree of agreement among the modified versions of the SRI depended on the similarities in their thresholds for change in the SELENA–SLEDAI, BILAG and PGA10 (table 5).
Discussion
Well-performing response criteria must have both high sensitivity and specificity to test medical interventions with the smallest possible patient sample size.23–25 This is particularly relevant for orphan diseases such as jSLE.26 For jSLE, the SRI is highly specific but only modestly sensitive for capturing improvement irrespective of the external standards considered. We confirmed better performance of the PCI compared to the SRI in identifying major response to therapy of patients with jSLE. However, none of the response criteria appeared well suited also to capture mild or moderate improvement with acceptable sensitivity. Regardless of the external standards used, we found that the sensitivity of the SRI improved when modified (ie, SRIa and SRIb) to account for MCID thresholds that are relevant in jSLE.11 For the SELENA–SLEDAI, the ±1 SEM MCID threshold is a decrease in the summary score by 2 points or more. Therefore, improvement in a single domain such as mucocutaneous system or serositis, or a combination of haematological and serological improvements is perceived as sufficient for jSLE patients to be considered improved.27 As expected, other modified versions of the SRI considering ‘harder to achieve’ (larger) SELENA–SLEDAI improvement thresholds had even lower sensitivity than the original SRI. The lower sensitivity of these SRI modifications makes them poorly suited to capture a clinically important improvement of jSLE accurately.
Our analyses also suggest that the SRI and PCI were only fair in agreement when detecting major improvement of jSLE. This may be explained by different ‘drivers’ that contribute to the sensitivity of the SRI compared to the PCI. The former is centred around changes of the SELENA–SLEDAI while the latter relies on concurrent changes of the CRV for jSLE.
The sensitivity and accuracy of the PCI in capturing either major or any improvement as rated by the treating physician were better in patients with low to moderate disease activity than in patients with high disease activity as measured by the SELENA–SLEDAI. For the PCI, this may be because at low to moderate disease activity, achieving the required 50% improvement in two CRV is easier (smaller absolute change) than at higher disease activity. As an example, a decrease in the SLEDAI from 4 to 2 and the PGA10 of 2 to 1 is easier to achieve than a decrease in the SLEDAI from 16 to 8 and the PGA10 from 8 to 4. For the SRI, the specificity is lower at higher disease activity. This may be because of the physicians’ differential interpretation of organ system response at high disease activity. Improvement in singular domains (eg, musculoskeletal) or a combination of domains (eg, complement levels and alopecia) that constitute a decrease of 4 points in the SELENA–SLEDAI may be insufficient for a patient to be considered truly relevantly improved when more severe or multiple target organ involvement is present. Moreover, the physician's perception of improvement may require larger SLEDAI decrements at higher disease activity.
Given the extreme diversity of jSLE phenotypes and the small patient populations available for clinical research, improvement criteria based on continuous measures may be preferable.8 For example, the disease activity scale based on 28 joints is such a continuous measure that is used to measure the course of rheumatoid arthritis with high accuracy.28 Continuous outcome measures also lend themselves for better use in meta-analysis and comparative effectiveness research.
Our study supports the notion that the accuracy of the jSLE response criteria considered in this study more closely reflects the physician's perspective than that of the patients and their parents. When considering the parent or patient perception of jSLE improvement as an external standard, the SRI and PCI both had very low sensitivity despite high specificity. This may be because patients/parents assign different importance to certain disease features compared to the treating physicians, as we have previously reported.29 This finding also suggests that additional patient-reported outcomes are needed to be included in future jSLE clinical trials in order to capture patient and parent-perceived treatment effects adequately.30
The accuracy of the response criteria was even lower when using changes in prescribed systemic corticosteroids as the external standard. This may reflect the steroid-sparing properties of concomitantly used immunosuppressive medications. Alternatively, the wide variation in steroid prescribing practices among providers of these patients or non-adherence to steroids may have contributed to the observation that changes in corticosteroid regimens are poor surrogates for jSLE improvement.27
Our study should be viewed in light of certain limitations; the foremost being that the data analysed in this validation study are not part of a clinical trial dataset. However, the data collection was prospective in nature and included standardised training of the investigators in completing disease indices, ensuring that the quality of the data is high, and is likely to be comparable to clinical trial data. An advantage of our dataset is that the studied cohort was representative of children and adolescents with jSLE seen in tertiary care centres. Compared to the highly selected patient populations usually included in clinical trials, we were able to assess the accuracy of response criteria in a cohort with highly diverse disease phenotypes, including those with low disease activity and variable degrees of damage. The absence of a generally accepted criterion or gold standard for evaluating the course of jSLE is a significant limitation of any similar study; therefore, we relied on external standards most commonly used in previous jSLE research.7 Of note, the MD-change used as one of the four external standards is a Likert scale distinct from the PGA component of the response criteria. We also found that the exclusion of the PGA from the SRI did not change the accuracy of the SRI substantially in detecting improvement using MD-change as the external standard (see supplementary table S1, available online only).
In conclusion, this study supports the theory that the SRI has at most modest sensitivity but high specificity for capturing jSLE improvement. Therefore, when used as an endpoint of clinical trials in jSLE, the SRI will provide a conservative estimate regarding the efficacy of the therapeutic agent under investigation. Select modified versions of the SRI, which consider clinically meaningful changes in jSLE, appear to perform better than the original SRI. In this validation study, the PCI has overall greater accuracy than the SRI for capturing major jSLE improvement regardless of external standards used and baseline disease activity. The overall higher accuracy of the PCI than the SRI, in detecting improvement regardless of the external standard used, must be weighed against the importance of using identical outcome measures for both adults and children with SLE enrolled in clinical trials.
Acknowledgments
CCHMC: Jamie Meyers-Eaton and Joshua Pendl (site coordination and database management). Texas Scottish Rite Hospital: Shirley Henry (site coordination). University of Chicago Comer Children's Hospital: Becky Pupluva (site coordination). Children's Memorial Hospital: Blair Dina and Adlin Cedeno (site coordination). British Columbia Children's Hospital: Angelyne Sarmiento and America Uribe (site coordination). Cohen Children's Medical Center of New York: Marilynn Orlando (site coordination). Columbia University Medical Center: Margaret Carson (site coordination). Medical College of Wisconsin: Judyann Olson (data collection). University of Oklahoma Health Sciences Center: Kathy Redmond (site coordination). Duke Children's Hospital and Health Center: Janet Wooton and Jennifer Stout (site coordination). UCSF: Deborah Carlton (site coordination). University Hospitals Cleveland: Michelle Wallette (site coordination). Joseph M. Sanzari Children's Hospital: Mary Ellen Riordan (site coordination), Justine Zasa (database management). Hospital for Sick Children: Lawrence Ng (site coordination). Dupont Hospital for Children: Sivia Lapidus (data collection).
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online figure
- Data supplement 2 - Online tables
Footnotes
Handling editor Tore K Kvien
-
Contributors All authors named qualify for the definition given by the International Committee of Medical Journal Editors for authorship.
-
Funding HIB is supported by the NIH grants: 5U01-AR51868, P60-AR047884 and 2UL1RR026314.
-
Competing interests None.
-
Patient consent Obtained.
-
Ethics The study was approved by the institutional review boards of the participating centres.
-
Provenance and peer review Not commissioned; externally peer reviewed.
-
Data sharing statement The authors have access to any data on which the manuscript is based and will provide such data on request to the editors.