European League Against Rheumatism and are jointly supporting multiphase development of systemic lupus erythematosus (SLE) classification criteria based on weighted criteria and a continuous probability scale. Prior steps included item generation, item reduction and hierarchical organisation of candidate criteria using an evidence-based approach. Our objectives were to determine relative weights using multicriteria decision analysis (MCDA) and to set a provisional threshold score for SLE classification. An SLE Expert Panel (8 European, 9 North American) submitted 164 real, unique cases with a wide range of SLE probability in a standardised format. Using the candidate criteria, experts scored and rank-ordered 20 representative cases. At an in-person meeting, experts reviewed inter-rater reliability of scoring, further refined criteria definitions and participated in an MCDA exercise. Based on expert consensus decisions on pairwise comparisons of criteria, 1000minds software calculated criteria weights and rank-ordered the remaining 144 cases based on their additive scores. The score of the lowest-ranked case for which complete expert consensus was achieved defined the provisional threshold classification score. Inter-rater reliability of scoring cases with the candidate criteria was good. MCDA involved 74 pairwise decisions and was repeated for the arthritis and mucocutaneous domains when the initial ranking of some cases did not match expert opinion. After criteria weights and additive scores were recalculated once, experts reached consensus for SLE classification for all cases scoring>83. Using an iterative process, the candidate criteria definitions were refined, preliminary weights were calculated and a provisional threshold score for SLE classification was determined.
- systemic lupus erythematosus
- clinical research
Statistics from Altmetric.com
A multinational effort to develop new classification criteria for systemic lupus erythematosus (SLE) for clinical research, jointly supported by the European League Against Rheumatism (EULAR) and American College of Rheumatology (ACR), is underway. The overarching goal is to develop a system that identifies potential participants for clinical research studies, requiring a degree of homogeneity among subjects while simultaneously dealing with the extreme heterogeneity of SLE.1 The aim was to design a system with the maximum combination of sensitivity and specificity for SLE, retaining face validity. While the classification criteria are not intended for diagnosis or clinical care, it is acknowledged that the only available ‘gold standard’ for the presence of SLE is expert clinician opinion.
A 12 member Steering Committee was formed with input from EULAR and ACR leadership to oversee a four-phase process.2 In Phase 1, items were generated using a Delphi exercise,3 early SLE cohort4 and SLE patient survey5; and antinuclear antibody (ANA) was evaluated as a potential entry criterion.6 7 During Phase 2, the list of potential criteria was narrowed using nominal group technique.8 9 Phase 3 began with a literature review for test performance characteristics of candidate criteria and data-driven organisation of criteria into domains.1 This report outlines the latter part of Phase 3: criteria weighting and threshold score identification through a consensus-based multicriteria decision analysis (MCDA) approach.10–12 The goal was to develop a criteria system producing a continuous measure of the relative probability that a case (ie, particular combination of clinical features) could be characterised as SLE, and a provisional threshold score above which a case could be definitely classified as SLE for clinical research.13 14 Phase 4 involves the determination of the final threshold, followed by validation of the classification system.
An international panel of SLE experts collected and rank-ordered patient case scenarios, participated in an in-person consensus meeting and held post-meeting email and telephone discussions.
SLE expert panel
The Steering Committee invited 6 additional experts (3 European, 3 North American) to form a 17 person SLE Expert Panel (‘SLE experts’) to assist with this phase and establish external validity of the criteria development process. SLE experts were senior clinicians focused on SLE, many of whom direct SLE clinics at their institutions, and senior clinical investigators with expertise in SLE.
Development of patient case scenarios
Each of the 17 SLE experts submitted 10 deidentified real cases based on adult patients from his/her own cohort in a standardised online form using REDCap (Research Electronic Data Capture), a secure, web-based application for research studies.15 Each expert was asked to submit five cases with ‘definite’ or ‘likely’ SLE and five cases in which they considered but ultimately did not diagnose SLE and/or diagnosed a condition mimicking SLE such as rheumatoid arthritis, other inflammatory arthritis, Sjögren’s syndrome, antiphospholipid antibody syndrome or viral infection. ANA≥1:80 was required of all cases. The REDCap form included three options for each clinical and laboratory criterion: yes (present), no (absent) and unknown.
Rank ordering and scoring of cases
From 164 deidentified cases, three authors of this manuscript (KHC, RPN, SKT) chose a representative sample of 20 reflecting a range of possible SLE cases. Each case was abstracted into standardised paragraph format. Laboratory tests that had not been performed were treated as unknown. SLE experts were asked to rank the cases based on their confidence that the case should be classified as SLE. This exercise introduced SLE experts to the challenge of assessing the relative influence of individual criteria in pointing towards or away from SLE.
SLE experts then scored the 20 cases using a standardised REDCap form reflecting the draft SLE classification criteria as of September 2016, based on the Phase 2 nominal group technique exercise16 and subsequent work by the Steering Committee.1 The REDCap form included 10 domains; each domain included 2–6 options. Experts were provided written instructions for scoring and a list of proposed definitions for each criterion. The instructions specified that within each domain, criteria were ordered from least to most supportive of SLE and if multiple criteria were present in one domain only the single criterion furthest down the list (ie, most supportive of SLE) should be scored. The instructions specified that a criterion should not be scored if a cause more likely than SLE existed (eg, other autoimmune disease, malignancy, medication). Criteria did not need to occur simultaneously and could occur before or after the detection of ANA≥1:80 as long as another explanation more likely than SLE did not exist.
In-person consensus meeting, November 2016
During a 1.5-day in-person meeting, RPN and AH moderated discussions among SLE experts leading to consensus decisions. Goals of this meeting included achieving full consensus on criteria definitions, calculating criteria weights via a MCDA exercise and establishing a provisional threshold score for SLE classification.
Review of case scoring and criteria refinement. Experts reviewed a summary of the REDCap scoring exercise. Discrepancies in scoring individual cases were discussed in depth to understand the underlying reasons. Criteria definitions were discussed in the context of these discrepancies and refined based on consensus agreement.
MCDA to determine weights. The MCDA exercise is based on the PAPRIKA method (Potentially All Pairwise RanKings of all possible Alternatives),17 as implemented by 1000minds software (http://www.1000minds.com). This method and software have been used extensively since 2010 for developing classification criteria.10 11 18 Experts voted on a series of pairwise decisions about hypothetical cases, each defined by two criteria from two domains. For example, hypothetical case A: ‘oral ulcers’ (mucocutaneous domain) and ‘acute pericarditis’ (serositis domain) versus hypothetical case B ‘alopecia’ (mucocutaneous domain) and ‘pleural effusion’ (serositis domain). Experts were asked to decide whether they would more likely classify hypothetical case A or B as SLE, presuming all else was equal about the cases. Voting was conducted anonymously, but where opinions diverged cases were discussed until full consensus was reached. Consensus opinion was based on the specificity of each manifestation for SLE and how much its presence would increase the likelihood of SLE (although specificity for some manifestations has not been formally evaluated, as discussed in Ref. 1). Such pairwise-ranking questions were repeated with different pairs of hypothetical cases—always involving trade-offs between different combinations of criteria, two at a time—until enough information about expert preferences had been collected to determine relative criteria weights for all criteria. Each time experts ranked a pair, all other cases that could be pairwise ranked via the logical property of ‘transitivity’ were identified and eliminated. For example, if experts ranked hypothetical case A over B and B over C, then by transitivity A is also ranked over C (and experts are not asked to choose between A and C). This procedure ensures the number of pairwise-ranking questions posed is minimised, and experts end up having pairwise ranked all possible cases defined on two criteria at a time. Consensus decisions were entered into 1000minds software, which uses linear programming techniques to derive weights for each criterion.17
Assessment of the face validity of the weights. Criteria weights were summed to produce an additive score for each case. Only the highest-weighted criterion in each domain was counted towards the additive score, as specified in the instructions (Box 1). The remainder of the 164 cases were scored and arranged in rank order from highest to lowest score. SLE experts reviewed a spreadsheet listing the criteria present in each case and anonymously voted whether they would classify each as SLE. For cases where expert opinion differed, RPN facilitated discussion to achieve full consensus about case classification. Cases were discussed in descending rank order (confidence that the case should be classified as SLE) until agreement on classification could not be reached.
Determination of an upper threshold score. The score of the last case for which the group achieved consensus on classification as SLE was the initial threshold.
Review of cases below the threshold. The cases with scores immediately below the initial threshold were individually reviewed. The threshold thus functioned as a way to focus the discussion on these ‘borderline’ cases, and the individual criteria present in each of these. SLE experts reached consensus that several of these cases should have been classified as SLE. Experts discussed discrepancies between expert opinion and the initial weights assigned to some of the criteria.
Weighting and upper threshold revision. The MCDA exercise was repeated once for those criteria whose calculated weights were inconsistent with expert opinion. Weights for all criteria were recalculated using 1000minds and additive scores were recalculated. SLE experts again anonymously voted on classifying each case as SLE, followed by discussion facilitated by RPN to achieve consensus. The score of the last case for which expert consensus was achieved was the provisional full consensus upper threshold score. Phase 4 involves further refinement of the upper threshold score.
Provisional SLE classification criteria organisation and definitions
A history of a positive ANA by Hep 2 immunofluorescence ≥1:80 is required for consideration of a person for SLE classification.
For each criterion, do not score if a cause more likely than SLE exists (such as infection, malignancy, medication, rosacea, endocrine disorder, other autoimmune disease).
Occurrence of a criterion on at least one occasion is sufficient.
Criteria need not occur simultaneously.
At least one clinical criterion must be present.
Within each domain, only the highest weighted criterion is counted towards the total score.
Clinical domains and criteria
Fever:>38.3°C with no other source identified.
Thrombocytopaenia: Platelets<100 000/mm3.
Autoimmune haemolysis: (1) evidence of haemolysis, such as reticulocytosis, low haptoglobin, elevated indirect bilirubin, elevated LDH and (2) positive Coomb’s (direct antiglobulin) test.
Delirium: characterised by (1) change in consciousness or level of arousal with reduced ability to focus and (2) symptom development over hours to <2 days and (3) symptom fluctuation throughout the day and (4) either (4a) acute/subacute change in cognition (eg, memory deficit or disorientation) or (4b) change in behaviour, mood or affect (eg, restlessness, reversal of sleep/wake cycle and so on).
Psychosis: characterised by (1) delusions and/or hallucinations without insight and (2) absence of delirium.
Seizure: primary generalised seizure or partial/focal seizure, with independent description by a reliable witness. If EEG is performed, abnormalities must be present.
Non-scarring alopecia, observed by a clinician*
Oral ulcers, observed by a clinician*
Subacute cutaneous lupus (SCLE) or discoid lupus (DLE): SCLE is characterised by annular or papulosquamous (psoriasiform) cutaneous eruption observed by a clinician,* usually photodistributed. If skin biopsy is performed, typical changes must be present.26 DLE is characterised by erythematous-violaceous cutaneous lesions with secondary changes of atrophic scarring, dyspigmentation, often follicular hyperkeratosis/plugging (scalp), observed by a clinician,* leading to scarring alopecia on the scalp. Lesions have a preference for the head and neck, especially the conchal bowl, but may be found in nearly any location. If skin biopsy is performed, typical changes must be present.26
Acute cutaneous lupus: Malar rash (localised) or maculopapular rash (generalised) observed by a clinician,* with or without photosensitivity. If skin biopsy is performed, typical changes must be present.26
Pleural or pericardial effusion: imaging evidence (such as ultrasound, X-ray, CT scan, MRI) of pleural or pericardial effusion or both
Acute pericarditis: ≥2 of: (1) pericardial chest pain (typically sharp, worse with inspiration, improved by leaning forward), (2) pericardial rub, (3) EKG with new widespread ST-elevation or PR depression, (4) new or worsened pericardial effusion on imaging (such as ultrasound, X-ray, CT scan, MRI)
Synovitis in ≥2 joints: characterised by joint swelling and tenderness, observed by a clinician*
Proteinuria>0.5 g/24 hours: on 24 hours urine collection or spot urine protein-to-creatinine ratio representing >0.5 g protein/24 hours
Renal biopsy with Class II or V lupus nephritis, per International Society of Nephrology/Renal Pathology Society (ISN/RPS) 2003 classification27
Renal biopsy with Class III or IV lupus nephritis, per International Society of Nephrology/Renal Pathology Society (ISN/RPS) 2003 classification27
Immunological domains and criteria
Anticardiolipin IgG (>40 GPL units) or anti-β2GP1 IgG (>40 units) or lupus anticoagulant positive
Low C3 or low C4
Low C3 and low C4
*Direct observation may include physical examination or review of a photograph.26
Determining a lower threshold score
SLE experts attempted to set an upper threshold for definite SLE classification and a lower threshold for very low probability for classification. Individuals with scores falling between these two thresholds might be candidates for inclusion in observational studies or SLE prevention trials. Due to insufficient time at the November 2016 meeting, the lower threshold was addressed in emails, secondary exercises and conference calls in the next 2 months. SLE experts were asked to rate the cases that fell below the upper threshold score as ‘probable SLE’, ‘possible SLE’ or ‘unlikely SLE’. The score of the case for which≥70% indicated ‘unlikely SLE’ was assigned as the lower threshold.
At the in-person meeting, SLE experts agreed that classification as SLE means a patient is appropriate for inclusion in SLE clinical research—and that classification as SLE should not guide clinical decisions about SLE diagnosis or treatment. Experts agreed that the threshold score should have high specificity for SLE, ensuring a high degree of homogeneity among classified patients and facilitating comparisons across clinical studies. SLE experts reached consensus that patients with overlap syndromes could be classified as SLE if they met SLE classification criteria, allowing clinical investigators to decide whether to include or exclude patients with overlap syndromes in specific research studies.
Review of scoring and criteria refinement
There was considerable inconsistency between SLE experts using the REDCap form to score cases. Each expert scored a total of 200 items (20 cases, 10 domains); all 17 experts scored 127/200 (64%) domains exactly the same. Reasons for discrepant data entry included human error in data entry, not following the instructions, variability in interpreting the candidate criteria based on context and different interpretations of criteria definitions (see online supplement 1 for details).
Review of the rank-ordering exercise
There was agreement on the cases that the majority of SLE experts ranked the highest and lowest, but a spectrum of ranking for cases in between (figure 1). This reflected the different relative weights that individual experts attached to particular criteria.
MCDA exercise to determine consensus weights using 1000 minds software. SLE experts anonymously voted on 74 pairs of hypothetical cases. Sometimes it was agreed that hypothetical cases A and B were equally likely to be SLE. For a handful of pairwise comparisons, consensus could not be reached and the decision was to defer that comparison and approach their relativity from other pairwise comparisons. Significant changes to the criteria during this stage included:
Mucocutaneous and musculoskeletal domains. SLE experts decided that observation by a clinician should be required for consistency with other clinical domains. The definition of clinician-observed was broadened to include physical examination or review of a photograph.
Neurological domain. Due to disagreement over whether seizure or cranial neuropathy was more specific for SLE (the SLICC19 and ACR 198220 manuscripts did not present the specificity of these individual items), and because the prevalence of cranial neuropathy is very low in SLE (and none of the 164 patient cases had cranial neuropathy), the group reached consensus to remove cranial neuropathy.
Renal domain. SLE experts decided that Class VI lupus nephritis was not specific for SLE based on clinical experience and lack of published data, and agreed on removing Class VI nephritis. Importantly, since historical manifestations are included in the scoring system, previous evidence of class II, III, IV or V lupus nephritis would be fully accounted for. These steps resulted in the updated definitions depicted in Box 1.
Face validity of the weights and initial upper threshold score
The additive score ranged 0–201 for the 164 cases. SLE experts reviewed the cases in order from highest to lowest score and reached consensus on classifying the 69 highest-scored cases as SLE. The group was unable to reach full consensus for a case with a score of 70; this patient had oral ulcers, leucopaenia, low C3 or C4 and positive anti-dsDNA. The last case for which experts reached consensus (17/17 votes) for classification as SLE had a score of 71, and an initial upper threshold score was set as >70.
Revising criteria weights and provisional upper threshold score
The experts reviewed cases scored 60–70. Many had arthritis and most experts had voted to classify them as SLE. Therefore, the group felt that the weight assigned to arthritis was too low. After reviewing the specific criteria present in these cases, the mucocutaneous domain was reorganised based on expert consensus: acute cutaneous lupus was assigned the most influential position because it is most specific, and subacute cutaneous lupus and discoid lupus were grouped together and less influential than acute cutaneous lupus. Anonymous voting was repeated for pairwise comparisons including arthritis and mucocutaneous criteria. 1000minds software recalculated relative weights for all criteria and rescored all cases using the revised weights.
After this second round of MCDA, arthritis received a greater weight than prior, now identical to the weight of pleural or pericardial effusion. Acute cutaneous lupus was assigned the same weight as acute pericarditis and anti-dsDNA (table 1). The group repeated the anonymous voting exercise and reached consensus about the 82 highest-scored cases. Experts were unable to reach full consensus for the same case that determined the initial threshold. As that case now had a score of 83 using the revised weights, a 100% specific provisional consensus threshold was set as >83. Provisional criteria weights resulting from the MCDA exercise are shown in table 1.
Lower threshold score
SLE experts individually rated the 82 cases below the upper threshold score; the distribution of expert opinion is shown in figure 2. The score of the case for which ≥70% indicated ‘unlikely SLE’ was 27. Only 7 of 52 unique cases (13.5%) included in this exercise would be classified as ‘unlikely SLE’ based on this lower threshold, and the remaining 86.5% would potentially be candidates for inclusion into observational or preventive studies. Through a series of telephone calls and emails, it became clear that expert opinion varied considerably concerning the cases below the upper threshold. Additionally, the terms ‘probable’, ‘possible’ and ‘unlikely’ were not being uniformly interpreted. The SLE experts decided against assigning a lower threshold because it would exclude only a few cases from clinical studies.
In Phase 3 of this SLE classification criteria development project, we applied a consensus-based, data-driven MCDA approach to assign criteria weights and identify a threshold score for SLE classification among adults for clinical research. This exercise resulted in provisional criteria weights that have face validity and are additive, providing a continuous measure of increasing likelihood for SLE based on combinations of criteria. While full consensus of the 17 SLE experts was reached for cases scoring >83 points, it became evident that expert opinions varied for cases with mid-range or low scores. Many cases with scores just under 83 were still considered SLE by the majority of experts, but in an additional exercise focusing on cases below the threshold for definite SLE, very few were deemed ‘unlikely SLE’ by ≥70% of experts.
This stage was largely based on the items resulting from the Phase 2 nominal group technique exercise16 and evidence from our literature review of the sensitivity and specificity of the individual candidate criteria.1 These efforts followed rigorous data-driven and expert-guided criteria development methodology in order to ensure high face and content validity of the items, and high discriminant validity of the criteria set.21 22 However, our literature review also revealed knowledge gaps about the sensitivity and specificity of some of the newly proposed criteria, thus expert consensus opinion was critical for decision making.
Consistent with developing other systems of classification criteria,23 24 there were significant discrepancies in ranking 20 cases regarding likelihood of SLE classification. Discussions centred on two aspects: (1) the precision and thus specificity of clinical and serological manifestations and (2) attribution of manifestations to SLE versus other connective tissue diseases. Some experts expressed concern about misinterpretation of rosacea as acute cutaneous lupus and about false positive anti-dsDNA via ELISA, each of which would reduce the specificity of the proposed classification system. To address these concerns, SLE experts agreed to include detailed definitions for each criterion to mitigate the risk of misinterpreting clinical signs and symptoms. Because particular laboratory assays (eg, Farr method for anti-dsDNA) are not uniformly available in all clinical settings, SLE experts decided that the testing method would not be specified, enabling SLE classification in a wide range of clinics.
The attribution of manifestations to SLE was discussed at length. For some cases, SLE experts were uncertain about how to interpret particular findings when SLE and another disease, such as primary antiphospholipid syndrome or Sjögren’s syndrome, seemed equally likely. It became apparent that not all these decisions could be made with certainty and that SLE experts from different centres could reach opposing conclusions. The criteria system allows for SLE classification in patients with overlap syndromes (eg, SLE with secondary Sjögren’s) as long as manifestations are considered to be equally or more likely due to SLE than the other condition.
The decision to exclude Class VI lupus nephritis was unanimous, given the lack of specificity of this end-stage finding. The discussions leading to the consensus elimination of mononeuropathy and cranial neuropathy were of greater interest. It was first mentioned that the specificities of these entities differed and that mononeuropathy is not specific for SLE. The group reached full consensus to eliminate mononeuropathy; cranial neuropathy was initially retained. The group then discussed that cranial neuropathy is a very rare presenting sign in SLE25 and none of the 164 cases had cranial neuropathy. Experts reached a unanimous decision that the low prevalence of cranial neuropathy in SLE warranted its elimination.
Using a data-driven approach based on literature review1 combined with an expert-driven MCDA process based on real patient cases, this third phase of the SLE classification project has led to precisely defined criteria with individual weights derived through consensus decisions by 17 international SLE experts. The individual criteria weights have face validity, and taken together they depict current expert understanding of SLE. The provisional threshold sets a high bar for SLE classification (100% specificity), and Phase 4 will consider the appropriate balance between specificity and sensitivity before finalising the threshold. The provisional classification criteria and threshold resulting from Phase 3 are being refined and validated in a large, distinct set of patient cases to finalise the project.
EULAR and ACR jointly provided support for this effort. The Steering Committee includes Martin Aringer, Sindhu Johnson, Thomas Dorner, Dimitrios Boumpas, Marta Mosca, Josef Smolen, David Wofsy, Diane Kamen, Karen Costenbader, David Daikh, Rosalind Ramsey-Goldman, David Jayne. The authors would like to thank Amy Turner for her outstanding organisational support of this project and Alison Hendry for her expertise and assistance during the ACR 2016 meeting. This work was previously presented at the EULAR 2017 Annual Congress and the ACR 2017 Annual Meeting.
Handling editor David S Pisetsky
Contributors SRJ, DJ, JSS, RPN, MA and KHC were responsible for planning this work. SKT, SRJ, DTB, DD,TD, BD, SJ, DLK, WJM, MM, RR-G, GR-I, MS, MU, DW, JSS, RPN, MA and KHC conducted this work. SKT, SRJ, DTB, DD,TD, BD, SJ, DLK, WJM, MM, RR-G, GR-I, MS, MU, DW, JSS, RPN, MA and KHC contributed to manuscript preparation.
Funding This work was jointly supported by the European League Against Rheumatism and the American College of Rheumatology. SKT’s work on this project was supported in part by the Lupus Foundation of America Career Development Award and NIAMS L30 AR070514. SRJ was supported by a Canadian Institutes of Health Research New Investigator Award. SJ was supported by a grant from the Danish Rheumatism Association (A-3865).
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.