Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying

Objectives This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries. Methods Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis. Results A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%–100% to 60%–100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%–91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively. Conclusions In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings.


Diagnosis stratification:
With the present round of the data quality (DQ) assessment, we aim at stratifying the output based on the diagnosis.Namely, the goal is to perform the whole analysis (detailed below) for each of the following cohorts: • All AAV patients • EGPA patients • GPA patients • MPA patients 1.First document the date the data for DQ analysis was extracted 2. Uniqueness a.First report the total number of patient IDs (including duplicates).If the registry is encounter (or visit) based, also report the total number of visits (including duplicates).These will act as denominators.b.Report the number of duplicate entries for any patient identifier codes, as a raw number.c.If the registry is encounter / visit based -report the number of duplicate encounters, as a raw number.
d. Report the number of patients who have been entered more than once with separate IDs.First identify possible cases by finding individuals who share both the same date of birth and gender.Then further compare these individuals either by hand or using other variables (such as approximate date of diagnosis, date of death) to determine if a duplicate was entered.(Don't remove duplicates yet, this will be part of the next stage of data quality improvement).3. Consistency a.For each of the following 'core DQ' variables, please count the number of cases where the variable of interest is in the correct data type (e.g.characters, binary, numeric, integer, date).We presume there should be one value per patient for these variables.?creadiagnosis fvc:testValue ?crea.} # Query 11a: Event-rate of ESKD.This query calculates the number of events of ESKD in four 3me intervals and the sum of total pa3ent follow-up 3me in each 3me frame.End of follow up is date of ESKD or date of last visit, whichever occurs first.It assumes good data quality, no "nega3ve" follow up 3me etc.All this is stra3fied per diagnosis.If no event is occurring in any of the 3meframes, the diagnosis will not show.It assumes that end-of-follow-up is at date of death if occurring before.
PREFIX fvc: <hGp://w3id.org/FAIRVASC#>PREFIX bvas: <hGp://w3id.org/BVAS#>PREFIX xsd: <hGp://www.w3.org/2001/XMLSchema#>PREFIX fn: <hGp://www.w3.org/2005/xpath-func3ons#>SELECT (SUM(?personTime_1YR) as ?totalpersonTime_1YR)BMJPublishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) . Quantify missing data for the last two of the core DQ variables as follows: i.How many 'dates of death' values missing amongst deceased patients (and also report the number of deceased patients) ii.How many 'ESKD date' values missing amongst ESKD patients (and also report the number of ESKD patients) 5. Correctness -This is a measure of how much the entered data adheres to its source.Check whether the core DQ variables are correct for 10 real patients against a 'gold standard' source e.g.clinical record, report as a percentage for each variable Specification on BVAS, creatinine and CRP at diagnosis: these variables are considered as values at diagnosis only if they were measured within a two week timespan from the diagnosis date (i.e.within two weeks before diagnosis or two weeks after diagnosis).
x. Date of end stage kidney disease (ESKD) b.Plausibility tests: return the number of cases for which the following statements are "true" (for 3b.i please return also the N°of cases with available Date of Birth and Date of Death data, and similarly for 3b.ii too) i.Is date of death >= date of birth (if patient deceased) ii.Is date of death >= date of diagnosis (if patient v.Is CRP (at diagnosis) within a plausible range?(e.g.0 -1000 mg/L) 4. Completeness a. Quantify missing data for the first eight core DQ variables.Please report as number of complete cases (i.e.number of cases with available data for the variable of interest).Please note: it might be the case that, in your registry, the absence of induction treatment data reflects the lack of any induction treatment (in the clinical history of the patient) rather than an actual missingness due to the data not entered.This might require a check of the clinical record.b

Table 2 .
Registry defini3ons of end-stage kidney disease BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJ BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJ