Supplement 1. Reasons for discrepancy in scoring cases using an online REDCap form.

·    Human error in data entry. For example, 4/17 experts did not score thrombocytopenia although platelet count was <11,000/mm3.

·    Not following the instructions. Four SLE experts selected “dsDNA not present” when the case stated “anti-dsDNA was >400”. Discussions highlighted that these experts had been concerned about false positive serologies since the method of testing was not specified and therefore did not score those serologies, which was inconsistent with instructions.

In other cases, some experts did not select the criterion most supportive of SLE when more than one criterion was present within a domain. Within the Renal domain, renal biopsy was positioned further down the list (i.e. more supportive of SLE) than proteinuria, yet some experts scored proteinuria when Class III lupus nephritis was also present.

The instructions indicated not to score arthritis if anti-CCP was >3x upper limit of normal, but some experts scored arthritis despite a high-positive result. The group discussed challenges with the proposed arthritis definition: anti-CCP and/or radiographs are not obtained for all patients; some centers may intentionally not perform these tests if a positive result decreases the likelihood of SLE classification; and including these items in the definition could create difficulty for classifying patients with overlap syndromes (e.g. “rhupus”) as SLE. Furthermore, specific instructions concerning anti-CCP were redundant with the overall instructions. The group reached consensus to remove mention of anti-CCP and radiographic erosions from the arthritis criterion.

·    Variability in interpreting the candidate criteria based on context. Some experts believed that in a patient with malar rash and autoantibodies, unexplained fever was not necessarily attributable to SLE, but that in a patient with pericarditis, malar rash, and autoantibodies, unexplained fever would be attributable to SLE. The group reached consensus that the approach to scoring should be maintained as, “If in doubt and not thought more likely to be due to another cause, attribute it to SLE”.

·    Differing interpretations of criterion definitions. For some experts, the phrase “pleuritic chest pain” was not illustrative of chest pain typical for pericarditis and thus they did not score pericarditis. The group agreed to adopt the wording in the European Society of Cardiology 2015 Guidelines for acute pericarditis.[14]