Article Text
Statistics from Altmetric.com
The last decade has seen the development or renewal of classification criteria in sundry rheumatic diseases, including systemic lupus erythematosus (SLE),1 rheumatoid arthritis2 and axial spondyloarthritis (axSpA).3 These criteria seek the laudable aim of standardising the populations included in clinical trials and observational cohorts for research purposes. But to what extent have the benefits of classification criteria been realised? Have there been unintended consequences as their profile has grown? And could we better use criteria to achieve the desired end of facilitating the implementation and interpretation of research findings to enable their translation into clinical practice?
We are all familiar with the refrain of key opinion leaders when they present their update on the management of a disease—‘These are classification criteria, not diagnostic criteria …’ But how often have we heard the same speaker move smoothly on to state, ‘… but I find them helpful in clinical practice too’? Indeed, the abstract of the original paper describing the validation and final selection of the classification criteria for axSpA concluded with the statement that ‘The new Assessment of SpondyloArthritis international Society (ASAS) classification criteria for axial spondyloarthritis (SpA) … may help rheumatologists in clinical practice in diagnosing axial SpA in those with chronic back pain’.3
However, the dangers of applying classification criteria to clinical practice for the purpose of diagnosis are easily demonstrated. The criteria are invariably developed and validated in specialist centres where there is a high prior (pretest) probability of the disease, and they are evaluated based on a clinical diagnosis of the disease in question made by experts in that condition, after alternative causes or explanations for the patient’s symptoms have been excluded. When the criteria are applied to a different population—for instance, in primary care—where there is a low prior (pretest) probability of the disease, they result in a large number of false-positive diagnoses. Take the example of axSpA: a systematic literature review and meta-analysis identified seven studies that examined the ASAS criteria’s performance, with a pooled sensitivity 82%, specificity 87% and positive likelihood ratio of 6.2.4 Applying these figures in a secondary care setting with a prior (pretest) probability of 0.3 (30%), results in a posterior (post-test) probability of 0.72 (72%) of those who fulfil the criteria. However, if the criteria were applied in a low-risk population, with a prior (pretest) probability of 0.05 (5%), then 75% of the people fulfilling the criteria would be given a false-positive diagnosis (figure 1).
This is not a theoretical concern that is without practical implications: we can see the dangers worked out in practice in the literature. For instance, the authors of a study of chronic low back pain in primary care began with the flawed premise that the new ASAS classification criteria ‘will improve early diagnosis’. They studied young patients with chronic low back pain and found that 86/363 (24%) patients fulfilled the ASAS classification criteria for axSpA. The authors concluded that chronic low back pain in young patients is ‘frequently caused by undiagnosed axial SpA’. Of note, only 17/86 (20%) of the identified cases were HLA-B27-positive: it is highly likely that most of those fulfilling the ASAS classification criteria were false-positive diagnoses based on the inappropriate use of the classification criteria.5
The pitfalls of the inappropriate use of classification criteria for the purpose of diagnosis are well known; but have the putative benefits of their use in research been realised? Let us use axSpA as an example again: like most of the inflammatory rheumatic conditions, axSpA is a spectrum that ranges from early/mild to severe/late disease, such that only some patients with severe disease ultimately exhibit the classical radiographic features of ankylosing spondylitis. Restricting recruitment to pivotal phase III studies to those who fulfilled the modified New York criteria for ankylosing spondylitis led to approval for tumour necrosis factor inhibitor (TNFi) therapy being restricted to this population. Recognition of the limitations of the modified New York criteria were part of the driving force behind the development of the ASAS classification criteria for axSpA. Additional studies of patients with non-radiographic axSpA were required by the Food and Drink Administration (FDA) before approval was granted for these patients—but these are part of the spectrum of the same condition6 and it makes no sense to require drug development programmes to study different subpopulations based on their classification status, rather then their clinical diagnosis. The tail has started to wag the dog!
Moreover, clinicians are left with the conundrum of how to apply the results of clinical studies to their practice. Using the fulfilment of classification criteria as an inclusion criterion for a clinical study risks the unnecessary exclusion of patients from the study who in real life could benefit (eg, by limiting participation to those who meet the modified New York criteria) or the inappropriate inclusion of many patients without the disease (eg, if the ASAS criteria were used to identify suitable recruits in primary care populations). How are clinicians to decide whether the results of a study apply to their patient? In clinical practice, we (ought to) make decisions based on the patient’s clinical diagnosis (based on history, examination, investigations and the exclusion of alternative explanations), not on whether he/she fulfils the classification criteria. It follows that the most appropriate inclusion criterion for a clinical trial ought to be a clinical diagnosis of sufficient disease activity/severity to warrant the treatment being studied.
Updated European League against Rheumatism/American College of Rheumatology (EULAR/ACR) classification criteria for SLE have recently been published. The criteria use weighted scores for different clinical, immunological and pathological features in several domains; the accrual of 10 or more points leads to a classification of SLE. This creates a dichotomy of those who do/do not meet the classification criteria, but diagnosis is not binary. First, does it ensure the standardisation of a population of patients with ‘definite SLE’? We know that the disease is heterogeneous, and that ethnicity, type of organ involvement and immunology are all strongly associated with differing outcomes—patients with neuropsychiatric lupus are very different from patients with predominantly cutaneous disease, even if both fulfil the classification criteria. Second, does the dichotomy result in the exclusion of some patients with a clinical diagnosis of SLE from studies? Without doubt! This has led to the development of the concept of ‘incomplete lupus’ representing patients with mild or early disease, and those with a restricted number of clinical features. These patients also have significant morbidity that requires research and treatment but they are excluded from studies which use the fulfilment of classification criteria as an entry criterion. Hence we see randomised controlled trials being designed to test treatment efficacy in patients defined as having ‘incomplete lupus’.7
Classification criteria for RA were updated in 2010. These have been broadly welcomed as an improvement (in terms of sensitivity and applicability) in early disease. However, their use in clinical trials has led to some distortion of the populations who have been studied. For example, it is known that patients with seropositive RA have a worse long-term prognosis than those with seronegative disease; there is also plentiful evidence that higher disease activity over time is associated with poorer outcomes. Yet when we look at patients enrolled in trials in early RA we may observe a paradox—patients who are seronegative have higher disease activity (eg, in ARCTIC, seronegative patients had DAS28=3.9 at baseline, compared with DAS28=3.4 in seropositive patients).8 The reason may be quite simple—the presence of strong positive rheumatoid factor or anticitrullinated protein antibodies (ACPAs) contributes three points towards the six required to fulfil the classification criteria for ‘definite’ RA. Denied these three points, patients who are seronegative must have higher disease activity if they are to fulfil the criteria for RA. Put another way, two patients who have identical clinical features and acute phase response may be classified differently according to their autoantibody status, one thereby being eligible for a study and the other being excluded. Now it may be that ACPA-positive disease is fundamentally different from seronegative disease, but if that is true it would make more sense to include patients on the basis of their ACPA status rather than conflating this with the fulfilment of the classification criteria.
These examples demonstrate that the use of classification criteria in clinical practice to make a diagnosis is inappropriate. It risks overdiagnosis, over-referral and overtreatment in populations with a low prior (pretest) probability of disease. In research, the use of the fulfilment of classification criteria as the grounds for entry to a clinical trial leads to the distortion of the study population and inappropriate restrictions on the label when drugs are licensed. How then can classification criteria be used most effectively? They should be used as a lens through which the study population can be viewed, such that different studies can be compared and contrasted. The main entry criterion for a clinical trial ought to be a clinician diagnosis with sufficient disease activity (or severity) to justify the planned treatment. At first sight, the proposal to revert to using a clinical diagnosis as an entry criterion to a clinical study risks a return to the ‘bad old days’ with the danger that investigators from different backgrounds with varying levels of expertise would recruit inappropriate patients into studies. This is to misunderstand the problem because, as we have shown, a high prior (pretest) probability of disease is a necessary prerequisite for the appropriate use of classification criteria. If the clinical diagnoses cannot be trusted, then the fulfilment of the classification criteria will not guarantee a high posterior probability of the disease either. As the authors of the ACR/EULAR SLE classification criteria point out, scoring is a process that requires ‘diligence and clinical experience’.1 The robustness of the diagnoses on a population level will be derived from describing the cohort in detail, including the proportion that fulfils the relevant classification criteria. This has several advantages: (1) It will help to avoid imposing an artificial dichotomy on conditions that represent a continuous spectrum. (2) If necessary, a priori subgroup analyses can be planned to compare those who fulfil the criteria with those who don’t. (3) It will allow nuanced characterisation of research cohorts (eg, in SLE, describing the average number of points scored, and in which domains).
Much useful work has been undertaken during the development of classification criteria, and we should not ‘throw the baby out with the bath water’. But clinicians must resist the temptation to use the criteria in patient diagnosis, and the use of the criteria to define which patients are recruited into clinical trials needs to evolve.
Footnotes
Handling editor Josef S Smolen
Twitter @StefanSiebert1
Contributors All authors were involved in writing the article and in the decision to submit it for publication.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests NB: Research grants from Vifor, GSK, Pfizer, Novartis. Advisory board: Lilly, MSD. Speaking fees: Roche, Vifor, AbbVie. SS: GRAPPA global steering committee, ASAS member; research grants, speaker or consultancy fees from AbbVie, BMS, Boehringer-Ingelheim, Celgene, GSK, Janssen, Novartis, Pfizer and UCB.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.