Background Systemic Lupus Erythematosus (SLE) is a quite rare disease, making it a challenge to achieve an adequate sample size for many research questions.
Objectives Our aim was to employ data mining techniques to identify a classification algorithm using national register data in Sweden.
Methods Clinically confirmed SLE cases were pooled from 4 major clinical centers in Sweden (Linköping, Lund, Stockholm and Uppsala), excluding cases with <4 ACR criteria. We identified a population of non-SLE comparators by randomly selecting individuals from the National Population Register matched on age, sex and county to individuals with SLE from national registers. Both cases (N=940) and comparators (N=24,370) were restricted to adults (>16y) living in Sweden Jan 1, 2010. Demographics, comorbidities, prescriptions and autoimmune disease family history were obtained from multiple registers and linked to the study population. Two different methods (classification tree analysis and penalized least absolute shrinkage and selection operator (LASSO) regression) generated an algorithm to identify cases from non-cases in men and women separately. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated. Sensitivity analyses used a second general population comparator group and elastic net regression to allow collinear predictors.
Results Both the classification tree and LASSO methods identified only 1 variable as the best discriminator between cases and non-cases: ≥1 SLE ICD code in a specialized clinic (sensitivity=0.99, PPV=0.98 in women). Results in men were similar. Standardized PPVs were high. Sensitivity analyses added several predictors, such as inpatient and outpatient visits and medications.
Conclusions The use of an SLE ICD code in a specialized clinic to distinguish between cases and non-cases performed well. Additional algorithms were identified in sensitivity analyses. It is possible to identify individuals with SLE among thousands of people in Sweden, however validation studies (currently underway) are necessary to assess the magnitude of potential misclassification.
Disclosure of Interest None declared