Objective: To assess the reliability of Systemic Lupus Erythematosus Disease Activity Index (SLEDAI)-2000 index in routine practice and its ability to capture disease activity as compared with the British Isles Lupus Assessment Group (BILAG)-2004 index.
Methods: Patients with systemic lupus erythematosus from 11 centres were assessed separately by two raters in routine practice. Disease activity was assessed using the BILAG-2004 and SLEDAI-2000 indices. The level of agreement for items was used to assess the reliability of SLEDAI-2000. The ability to detect disease activity was assessed by determining the number of patients with a high activity on BILAG-2004 (overall score A or B) but low SLEDAI-2000 score (<6) and number of patients with low activity on BILAG-2004 (overall score C, D or E) but high SLEDAI-2000 score (⩾6). Treatment of these patients was analysed, and the increase in treatment was used as the gold standard for active disease.
Results: 93 patients (90.3% women, 69.9% Caucasian) were studied: mean age was 43.8 years, mean disease duration 10 years. There were 43 patients (46.2%) with a difference in SLEDAI-2000 score between the two raters and this difference was ⩾4 in 19 patients (20.4%). Agreement for each of the items in SLEDAI-2000 was between 81.7 and 100%. 35 patients (37.6%) had high activity on BILAG-2004 but a low SLEDAI-2000 score, of which 48.6% had treatment increased. There were only five patients (5.4%) with low activity on BILAG-2004 but a high SLEDAI-2000 score.
Conclusions: SLEDAI-2000 is a reliable index to assess systemic lupus erythematosus disease activity but it is less able than the BILAG-2004 index to detect active disease requiring increased treatment.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Systemic lupus erythematosus (SLE) is a complex multisystem autoimmune disease with diverse immunological and clinical manifestations. Assessment of disease activity poses a challenging problem as any organ system can be affected and it is well known that SLE may mimic manifestations of other diseases. As there is no single biomarker that adequately reflects disease activity, numerous composite clinical indices have been developed for the assessment of disease activity.1 Two commonly used disease activity indices in clinical studies are the classic British Isles Lupus Assessment Group (BILAG) index and the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI).
The classic BILAG index was developed on the principle of the physician’s intention to treat.2 3 It is a transitional index that is able to capture changing severity of clinical manifestations. It has ordinal scales by design and does not have a global score. Instead it produces, at a glance, disease activity across the different systems. Over time, several deficiencies were noted by members of BILAG, which prompted a major revision, giving rise to the BILAG-2004 index.4
The SLEDAI index was developed in Canada and has 24 items.5 It was subsequently noted that it focused on new or recurrent manifestations and failed to capture ongoing activity. This led to a revision, giving rise to the SLEDAI-2000 index.6 This index produces a global score ranging from 0 to 105 and weighting is used resulting in an individual item score ranging from 1 to 8. To date, almost all the validation studies of the SLEDAI index involved the original index, and SLEDAI-2000 had not been fully validated.7–14 The SLEDAI-2000 index was only validated retrospectively against the SLEDAI index, which showed that SLEDAI-2000 correlated with its predecessor.6
The purpose of this study was to assess the reliability of SLEDAI-2000 in routine clinical practice and to compare its performance with BILAG-2004 with regards to the ability to detect disease activity requiring increased treatment. The reliability of the BILAG-2004 index has been reported elsewhere.4 15
PATIENTS AND METHODS
This was a multicentre cross-sectional study involving 11 centres across the United Kingdom. This study was done in parallel with the inter-rater reliability study (second reliability exercise) of BILAG-2004 index.15 Patients with SLE who satisfied the American College of Rheumatology criteria for the classification of SLE were recruited.16 17 Patients were excluded from the study if they were pregnant, under the age of 18 years or unable to give valid consent. This study received multicentre research ethical approval from Hull and East Riding Research Ethics Committee as well as approval from local research ethics committees of all participating centres. Written consent was obtained from all patients. This study was carried out in accordance with the Helsinki Declaration.
Patients were assessed separately by a local rheumatologist and an external rater (first author). In total, 14 raters were involved in this study. This study was performed in the setting of routine clinical practice and medical records were available to both of the raters. Disease activity was assessed using the BILAG-2004 and SLEDAI-2000 indices by each rater. Before the study, training was provided to the raters on both of the disease activity indices.
This index is an ordinal scale index with nine systems (Constitutional, Mucocutaneous, Neuropsychiatric, Musculoskeletal, Cardiorespiratory, Gastrointestinal, Ophthalmic, Renal and Haematology). It records disease activity occurring over the past 4 weeks as compared with the previous 4 weeks. Like the original BILAG index, it is based on the principle of a physician’s intention to treat. It is not intended to be a global score but provides a review of disease activity across the nine systems. It categorises disease activity into five different levels from A to E. Grade A represents very active disease requiring immunosuppressive drugs and/or prednisolone dose of >20 mg daily (or equivalent). Grade B represents moderate disease activity requiring a lower dose of corticosteroids, topical steroids, topical immunosuppressives, antimalarials or non-steroidal anti-inflammatory drugs. Grade C indicates mild stable disease while grade D implies no disease activity but the system had previously been affected. Grade E indicates no current or previous disease activity. Even though it was developed based on the principle of intention to treat, this index was devised to capture manifestations of disease activity and the treatment has no bearing on the scoring of this index.
This index has 24 items of which 16 are clinical items, whereas the remaining eight items are based solely on laboratory results (urinary casts, haematuria, proteinuria, pyuria, low complements, increased DNA binding, thrombocytopenia and leucopenia). A manifestation is recorded if it is present over the past 10 days regardless of severity or whether it has improved or worsened. A previous study has shown that a score of ⩾6 is consistent with active disease requiring treatment.18
The reliability of the SLEDAI-2000 index was assessed using the level of agreement for each item in the index. The ability to detect disease activity was assessed by determining the number of patients with discordant BILAG-2004 and SLEDAI-2000 scores. These discordant scores can be divided into those with high activity on BILAG-2004 (overall score of A or B) but low SLEDAI-2000 score (<6), and those with low activity on BILAG-2004 (overall score of C or D or E) but high SLEDAI-2000 score (⩾6). The overall BILAG-2004 score for a patient was determined by the highest score achieved by any system in the index. Treatment of these patients were analysed and an increase in corticosteroids, antimalarials or cytotoxic treatment was used as the gold standard for active disease. Assessment by the external rater was used for this analysis to avoid rater effect and to minimise bias as treatment decisions were made by the local rater. Analysis with different SLEDAI-2000 cut-off scores for definition of active disease was also performed. Statistical analyses were performed with Stata for Windows version 8 (Stata Corporation, College Station, Texas, USA).
Ninety-three patients were recruited and the demographics of the patients are summarised in table 1.
Inter-rater agreement of SLEDAI-2000
There were 43 patients (46.2%) with a difference in the total SLEDAI-2000 score between the two raters. Of these, 19 patients (20.4%) had a score difference between raters of 4 or more. There was a good level of agreement in the items of SLEDAI-2000, ranging from 81.7% to 100% for each item in the index (table 2). However, all the clinical items with perfect agreement had null score by both raters.
Ability to detect disease activity
There were 54 patients (58.1%) with high activity according to the BILAG-2004 index (overall score of A or B) and of these, 29 patients (53.7%) had their treatment increased while five patients (9.3%) had their treatment reduced. However, there were far fewer patients (24 patients, 25.8%) with high activity according to the SLEDAI-2000 index (score of ⩾6) and of these, 14 patients (58.3%) had their treatment increased whereas two patients (8.3%) had their treatment reduced.
Thirty-five patients (37.6%) had high activity on BILAG-2004 but a low SLEDAI-2000 score, whereas there were only five patients (5.4%) with low activity on BILAG-2004 but a high SLEDAI-2000 score (table 3). This difference was statistically significant (p = 0.015). When data from the local rater were used, there were more patients (41; 44.1%) with high activity on BILAG-2004 but a low SLEDAI-2000 score and fewer patients (four; 4.3%) with low activity on BILAG-2004 but a high SLEDAI-2000 score.
The treatment of these patients with discordant BILAG-2004 and SLEDAI-2000 scores are summarised in tables 4 and 5. Of those patients with high activity on BILAG-2004 but low SLEDAI-2000 scores, 48.6% had their treatment increased. The results were similar when data from the local rater were used (data not shown).
We looked at the effect of using different cut-off scores used to define active disease with SLEDAI-2000 (tables 4 and 5). With a lower cut-off SLEDAI-2000 score, the number of patients with high activity on BILAG but low SLEDAI scores becomes less, particularly when the cut-off score was 4 or below. However, the proportion of these patients who had their treatment increased remained the same (about 50%). Even at the SLEDAI-2000 score of zero, there were four patients (4.3%) with high activity on BILAG-2004 and two (50%) of them had their treatment increased. The lowering of the cut-off SLEDAI-2000 scores did not make any difference in the number of patients with low activity on BILAG-2000 but high SLEDAI-2000 scores who had their treatment increased. Therefore, it appears that SLEDAI-2000 index is less able to capture active disease requiring increased treatment as compared with the BILAG-2004 index.
This study represents the first study of the reliability of the SLEDAI-2000 index in routine clinical practice and demonstrated good inter-rater agreement; therefore, this index is reliable in the assessment of SLE disease activity, which is reassuring. This is consistent with the results of reliability studies involving the original SLEDAI index.7 12 13 19 However, the reliability of the SLEDAI-2000 index was not as good as would be expected with disagreement in the scores between the two raters in 46% of patients. This is despite training being provided to the raters and the fact that this index is considered to be the least complicated disease activity index.
More importantly, it appears that SLEDAI-2000 is less able at detecting disease activity requiring increase in treatment when compared with BILAG-2004, which is not surprising as it has far fewer items than BILAG-2004 (24 vs 97). SLE is a multisystem disease that may affect any organ system and has varied manifestations within the patient and between patients; therefore, a comprehensive index is required to capture all manifestations of active disease. Unfortunately, SLEDAI-2000 fails to capture several clinically important manifestations of disease activity such as peripheral neuropathy, myelopathy, interstitial alveolitis and haemolytic anaemia. Apart from that, some manifestations could not be scored in SLEDAI-2000 as the criteria and definitions set out were too stringent. From our experience, we found it difficult to score for organic brain syndrome, arthritis, pleurisy and pericarditis in SLEDAI-2000 despite patients having these manifestations as they were unable to meet the criteria set out in the definition. As an example, to score for pleurisy in SLEDAI-2000, the requirements are pleuritic chest pain with pleural rub, effusion or pleural thickening. However, it is not uncommon for patients with SLE who had pleurisy to present with just pleuritic chest pain in the absence of pleural rub, pleural effusion or pleural thickening. The situation is the same for arthritis, pericarditis and organic brain syndrome. Other contributing factors include the weighting system used and its inability to distinguish the different severity of manifestations. Furthermore, it is unable to detect improvement or worsening of a manifestation as this can only be recorded as either absent or present. For example, thrombocytopenia (defined as a platelet count <100×109 per litre) has a weighted score of 1. It does not differentiate between severe thrombocytopenia (platelet count <25×109 per litre) and mild thrombocytopenia (platelet count >50×109 per litre). The former would warrant an increase in treatment, which is not the case for the latter but both would derive the same score in SLEDAI-2000. Owing to the weighted score of 1, patients with only severe thrombocytopenia (platelet count <25×109 per litre) as the clinical manifestation of disease activity would have a total score of 1 (or 5 if the patient also has low complements and elevated anti-dsDNA antibodies), which falls short of the cut-off of 6 for active disease.18 We have shown that even with a lower cut-off, the SLEDAI-2000 index still fails to capture significant numbers of clinically important manifestations of active disease.
We have used the increase in treatment as the gold standard for active disease as there is no other good alternative standard available. With this as the benchmark, we are looking at clinically significant manifestations of active disease that are being treated, which makes it very unlikely that the BILAG-2004 index is overestimating disease activity. Although the BILAG-2004 index was developed based on the principle of intention to treat, using change in treatment as the gold standard will not bias the analysis in favour of BILAG-2004 as change in treatment is not taken into account in the scoring scheme (only the presence of active manifestations will influence the scoring). Furthermore, the scoring of the index was not available to the local rater when the treatment decision was made. To further minimise this possible bias, we have used the external rater score for analysis as the treatment decisions were made by the local rater.
As the BILAG-2004 index was developed as an ordinal scale index, it was not intended for individual system scores to be summated to provide a global score. We felt that the best way to represent overall disease activity in any individual patient was to use the highest score achieved by any system within the index. This seems logical as any patient with any system scoring a grade A or B should be categorised as having active disease regardless of how many systems have a score of A or B. In fact, using this overall score would create a ceiling effect and may actually put BILAG-2004 at a disadvantage from an analysis point of view. For example, a patient with mild mouth ulcers and mild inflammatory arthritis would score grade C in Mucocutaneous and Musculoskeletal systems leading to an overall score of grade C but may have treatment increased with hydroxychloroquine or a small increase in corticosteroid dose.
The result of the ability to capture active disease needs to be interpreted with caution as this is a cross-sectional study that only provides a snapshot of disease activity at the time of assessment. This does not take into account the level of disease activity before the assessment, and this may explain the reduction in treatment in patients with high activity if the current level of activity represents an improvement from a previous higher level of activity, such as from category A to B in BILAG-2004 or SLEDAI-2000 score of 18 to 10. Other unaccounted factors that will influence treatment decision include treatment history (particularly if there has been recent initiation of cytotoxic treatment where a further increase in treatment is unlikely) and patients’ opinion (such as refusal to increase treatment). One method to overcome this is to use the physician’s intention of treat (rather than actual change in treatment) but this may bias the result in favour of BILAG-2004 as the physician may record an intention to increase treatment when manifestations are recorded in the BILAG-2004 index.
Further study with a larger number of patients is required to determine the optimal SLEDAI-2000 cut-off score for active disease. This is clearly important as it is used in clinical studies to differentiate patients between the two disease states (active or inactive) and to determine eligibility for inclusion in clinical trials. Apart from that, the sensitivity of change for both of these indices would need to be established with a longitudinal study.
This study was supported by a grant from the Arthritis Research Campaign (Grant No. 16081). We would like to thank the nurse specialists of all participating centres, the Wellcome Trust Clinical Research Facility (Birmingham), Lupus (UK) and Arthritis Research Campaign for their support.
Funding: AP was funded by an unrestricted educational grant from Actelion Pharmaceuticals. CG received consulting fees and/or honoraria from Bristol Meyer Squibb, Genentech, Immunomedics, Roche, UCB Pharma and Aspreva.
Competing interests: None.