Background Electronic medical records (EMR) have emerged as a large-scale data collection option for observational studies. These huge data registries create new opportunities to study the rheumatologic phenotype and the real-life implications for these diseases. EMR usage poses new questions of quality management, such as how to reliably identify patients with the disease or phenotype of interest, as well as bioinformatic tools to handle the magnitude of data.
Thusfar, the algorithms used to identify cases are often not validated and overly simplify by using one financial code, or they are well-validated but require very specific information which hampers the applicability to other datasets.
Objectives Aim I: Test the accuracy of the identification of patients with rheumatoid arthritis (RA) using the financial coding system.
Aim II: Develop a simple and precise algorithm to select patients with RA that is easy to implement at other centers.
Methods Aim I: Out of the 16,183 Rheumatology patients in the Leiden out-patient EMR system 400 charts were randomly selected and reviewed for the rheumatologic diagnosis. Next, the charts were reviewed for 200 randomly selected patients that were labeled as RA in the financial system.
Aim II: To enable generalizability, only codified data that was obtained at regular outpatient clinic visits was used. Lasso regression was applied to identify the most discriminative variables.
Results Aim I: Since 2008, 16,183 patients were enrolled in the EMR system of the Leiden rheumatology outpatient clinic. 2,845 of these patients were classified as having RA in the financial system. 63/400 (16.3%) of the reviewed charts concerned patients with RA. The majority (n=57) were registered as having RA in the financial system. Still, 33% of the patients with the financial code RA did not have RA.
Aim II: Using Least Absolute Shrinkage and Selection Operator (LASSO) regression anti CCP, MTX prescription and number of visits were identified as the most discriminative variables. Combining these with the presence of the financial code for RA improved the algorithm from an accuracy of 67% to 90%.
Conclusions The vast majority of patients that are classified as having RA are registered as such in the financial system. However, a substantial number of patients are registered as RA in the financial system are not classified as RA in clinical charts. Using widely available data on anti-CCP status, MTX prescription and visit count improved the selection of RA patients from a 67% to 90% accuracy. The combination of these variables provides a widely applicable algorithm, as they are broadly registered in Rheumatology clinics.
Subsequent replications are ongoing.
Disclosure of Interest None declared