Objective To compare the performance of the American–European Consensus Group (AECG) and the newly proposed American College of Rheumatology (ACR) classification criteria for Sjögren's Syndrome (SS) in a well-characterised sicca cohort, given ongoing efforts to resolve discrepancies and weaknesses in the systems.
Methods In a multidisciplinary clinic for the evaluation of sicca, we assessed features of salivary and lacrimal gland dysfunction and autoimmunity as defined by tests of both AECG and ACR criteria in 646 participants. Global gene expression profiles were compared in a subset of 180 participants.
Results Application of the AECG and ACR criteria resulted in classification of 279 and 268 participants with SS, respectively. Both criteria were met by 244 participants (81%). In 26 of the 35 AECG+/ACR participants, the minor salivary gland biopsy focal score was ≥1 (74%), while nine had positive anti-Ro/La (26%). There were 24 AECG−/ACR+ who met ACR criteria mainly due to differences in the scoring of corneal staining. All patients with SS, regardless of classification, had similar gene expression profiles, which were distinct from the healthy controls.
Conclusions The two sets of classification criteria yield concordant results in the majority of cases and gene expression profiling suggests that patients meeting either set of criteria are more similar to other SS participants than to healthy controls. Thus, there is no clear evidence for increased value of the new ACR criteria over the old AECG criteria from the clinical or biological perspective. It is our contention, supported by this report, that improvements in diagnostic acumen will require a more fundamental understanding of the pathogenic mechanisms than is at present available.
- Sjögren's syndrome
Statistics from Altmetric.com
Sjögren's Syndrome (SS) is a chronic, systemic disease that may be second only to rheumatoid arthritis in prevalence among the rheumatic autoimmune diseases.1 ,2 The principal manifestations of the disease are dry eyes and dry mouth resulting from immune-mediated damage and dysfunction of the lacrimal and salivary glands3 ,4 which develop a characteristic lymphocytic infiltrate that can be objectively measured with a focus score.5 Approximately 67% of patients with SS have circulating autoantibodies to anti-Ro (SSA) and/or anti-La (SSB).6 Extraglandular manifestations, which include vasculitis, peripheral neuropathy, renal tubular acidosis, pulmonary involvement, lymphoproliferative disease and/or immunological abnormalities, are present in a subset of patients and found most commonly among those with high levels of anti-Ro and anti-La autoantibodies.7 ,8
The diagnosis of SS commonly requires a multidisciplinary approach and may be difficult to establish. sicca symptoms are common, non-specific, and there is no gold standard diagnostic test. For research purposes, 11 sets of classification criteria have been proposed since the mid-1960s.9–19 The last of these, the 2002 revised American–European Consensus Group (AECG) Classification Criteria, have had widespread acceptance and adoption in clinical and research studies of SS, having been cited >1500 times.20 They consist of six criteria, two subjective and four objective (table 1).19 In 2012, the American College of Rheumatology (ACR) endorsed a new set of preliminary criteria proposed by the Sjögren's International Collaborative Clinical Alliance (sicca).21 ,22 These criteria are centred around three objective features (table 1).
We undertook this study to compare the new ACR criteria to the revised AECG criteria in a cohort of participants with sicca symptoms that have been carefully evaluated for SS.
The participating individuals were evaluated in the Sjögren's Research Clinic at Oklahoma Medical Research Foundation or at a similar clinic at the University of Minnesota. Participants were self-referred or physician-referred. Each potential clinic participant was interviewed via phone by trained personnel who assessed the presence of ocular and oral symptoms by asking the six standardised and validated17 questions in the subjective criteria of the revised AECG Classification Criteria (table 1).19 In order to be eligible for an appointment at the clinic, at least one ocular and one oral question had to be answered affirmatively. The exclusion criteria for evaluation at the clinic were also based on the recommendations of the AECG (table 1).19 Additionally, we excluded individuals who presented with known current pregnancy or inability to provide informed consent.
With very few exceptions, participants were evaluated in a single morning clinic visit using standardised protocols. Patients underwent an oral exam consisting of measurement of stimulated and timed whole unstimulated salivary flow (WUSF), a lip biopsy and collection and storage of saliva. Participant evaluation did not include sialography or scintigraphy. The ocular specialist performed ocular surface staining with lissamine green and fluorescein, an unanaesthetised Schirmer's I test, and collection and storage of tears. The ocular vital dye score was determined using the quantitative dot-counting method23 rather than by descriptive features,24 and the score for each section was recorded independently before generating a final score for each eye. Blood samples were collected for general laboratory tests and extraction of DNA, RNA and serum. A physician completed a detailed history and physical examination, including general medical, rheumatological and neurological evaluations. If patients gave a history of a past diagnosis of rheumatoid arthritis, mixed connective tissue disease, systemic sclerosis, myositis, primary biliary cirrhosis, multiple sclerosis, or systemic lupus erythematosus, classification criteria for these illnesses were specifically ascertained by history, medical record review and testing for the corresponding autoantibodies.
All procedures were approved by the Oklahoma Medical Research Foundation and University of Minnesota Institutional Review Boards. Each participant provided written informed consent prior to entering the study.
The dentist performed lip biopsies to obtain minor salivary glands in all patients, unless slides from a previous biopsy were available and contained sufficient tissue for re-examination by our pathologists. A portion of each specimen was formalin-fixed and paraffin-embedded, sections were cut and stained with hematoxylin-eosin, while other fragments were cryologically preserved. Two dental pathologists reviewed the specimens independently; the results were compared and a consensus reading was generated. The lymphocytic infiltration of the glands was graded by focus score.5
Clinical laboratory and serology
Anti-Ro/SSA and anti-La/SSB autoantibodies were determined by multiple methods. Additionally, all patients were tested for rheumatoid factor (RF), antinuclear antibodies (ANA), precipitins for autoantibodies associated with other connective tissue disorders, hepatitis C serology, complete blood count (CBC) with differential, immunoglobulin profile and urinalysis (see online supplementary text).
Each study participant was classified according to both the revised AECG,19 and to the newly proposed ACR criteria.21 We eliminated from analysis the participants that did not have results for all the features of both classification systems with the exception of sialography and scintigraphy (table 1).
Peripheral blood mRNA transcript measurements
Global gene expression profiles comprising transcript levels for >15 000 loci were compared in a subset of 180 participants (see online supplementary text).
Performance of the tests was assessed via sensitivity, specificity, positive predictive value and negative predictive value estimated by considering the AECG criteria as the ‘gold standard’, and summarising the results with exact binomial 95% CI. McNemar's Test of paired samples was used to assess whether the two sets of criteria were significantly different with respect to dichotomous variables. The κ statistic was used to quantify the degree of agreement between the new classification criteria and the AECG criteria. Details of the statistical analyses for the gene expression data are available in the online supplementary text.
The initial cohort of participants evaluated at either the Sjögren's Research Clinic at Oklahoma Medical Research Foundation or the Sjögren's Clinic in the University of Minnesota comprised 837 individuals. Of these, 646 had all data points of both AECG and ACR classification criteria and, thus, constitute the study cohort. The demographic characteristics of both cohorts are comparable in makeup with respect to age, sex, race and ethnicity (see online supplementary table S1).
We tabulated the presence or absence of each of the six AECG classification criteria for SS and each of the three ACR criteria (summary in table 2; details in online supplementary table S2). Of the 646 study participants, 279 and 268 patients were classified as SS according to AECG and ACR criteria, respectively. Of the 303 participants classified by either system as SS, 244 (81%) individuals met both sets of criteria.
The comparison of the new ACR classification criteria with the AECG criteria (table 2) shows that they are not significantly different (McNemar's test of paired samples: p=0.19) and there was a concordance rate of 0.81 (95% CI 0.77 to 0.86) based on the κ statistic. The analysis of the sensitivity, specificity, positive predictive value and negative predictive value of each set of criteria was done using the other criteria as the gold standard and was similar for both classification systems. The sensitivity of the ACR criteria was 87.5 (95% CI 82.9 to 90.9) with a specificity of 93.4 (95% CI 90.3 to 95.7); the positive predictive value was 91.0 (95% CI 86.8 to 94.0) and the negative predictive value was 90.7 (95% CI 87.2 to 93.4). Thus, 12.5% (35 of 279) of participants classified as SS under the AECG criteria were not considered SS when evaluated by the ACR criteria; conversely, 8.9% (24 of 268) met only the ACR criteria.
The differences between how the two systems classified the sicca participants revolved around which objective measures of ocular and oral involvement were included in addition to the histology and Ro/La serology. Namely, the van Bijsterveld (vBS) grading system of ocular staining, Schirmer's I test, and WUSF volume for the AECG criteria, and a different version of the first of these (the sicca ocular staining score or OSS) for the ACR criteria plus positive ANA (≥1:320) and positive RF as an alternative measure of serological activity.
We compared the performance of these measures in the participants that met and did not meet either or both sets of criteria. When analysing the characteristics of the SS patients as defined by AECG criteria that were excluded by ACR criteria (ACR−/AECG+ participants; table 3), the most striking feature was that 26 of the 35 (74.3%) of them had a minor salivary gland biopsy with a focus score ≥1, while the other 9 (25.7%) had positive Ro and/or La autoantibodies. These patients met AECG criteria by having, in addition to either histopathological or serological criteria, subjective ocular and oral symptoms plus either an abnormal Schirmer's test and/or an abnormal WUSF test (table 3). They did not meet the ACR criteria because they had only one of the histopathology or serology criteria but did not have an abnormal ocular staining examination.
Alternatively, there were 24 participants classified as SS by ACR criteria but not AECG criteria (ACR+/AECG− participants; table 3), and they met criteria mainly due to differences in the scoring of the ocular staining (n=17): the ACR criteria use the OSS23 which is abnormal at a ≥3 score out of 12 possible points, while the AECG criteria use the vBS score24 that is abnormal with a score ≥4 out of 9 possible points (figure 1). Seven of the ACR+/AECG– participants met the ACR criteria but not the AECG criteria by having positive ANA plus RF.
The performance of each individual test was assessed in three subsets of participants: (1) classified by ACR criteria, (2) classified by AECG criteria and (3) classified as having SS by either one or both sets of criteria (see online supplementary table S3). As expected, the tests performed consistently across all groups and in summary, the Schirmer's I test had a low sensitivity (range 0.49–0.54) with higher specificity (0.71–0.73), while the WUSF had both low sensitivity (0.59–0.65) and specificity (0.52–0.57). On the other hand, the serology and histopathology performed well, with sensitivities of 0.63–0.64 and 0.84–0.86, respectively, and specificities of 0.94–0.96 and 0.89–0.95, respectively.
The most important difference in individual test performance was in the evaluation of keratoconjunctivitis sicca by ocular surface staining. The use of the AECG scoring system by vBS resulted in a sensitivity of 0.57–0.61 with a specificity of 0.70–0.71. The ACR OSS scoring very significantly improved the sensitivity (0.80–0.90) but at the expense of the specificity (0.45–0.51) (see online supplementary table S3). We compared the number of participants that had a positive score by one method versus the other and found highly significant differences (table 4). When assessing this difference in any patient that was classified as having SS (by either or both sets of criteria, n=303), 23% of those having a positive OSS did not have a positive vBS (p<1×10−6). This difference was 24% (p<1×10−6) if all participants were included, irrespective of whether they were classified as SS or not. An intermediate result was obtained if the OSS was considered abnormal at a cut-off of ≥4 rather than ≥3 : 58 participants (13.5%) went from being OSS (+) to OSS (−), (p=0.001); 53 of these 58 were AECG (−).
When evaluating gene expression profiles for participants meeting only one set of criteria, those who met both sets of criteria, and healthy controls, we found that the participants who met criteria for SS by one or more sets of criteria tended to cluster together and were distinct from controls (figure 2). Furthermore, using low-stringency criteria designed to maximise determination of differences, we found no gene expression difference between the participants meeting both sets of criteria versus those meeting only one set of criteria.
The pathophysiological mechanisms underlying SS are still poorly understood, and accurately determining who does and does not have SS is difficult.25 In the clinical setting, the diagnosis of SS relies on interpreting and integrating all aspects of the patient's history, test results and the expert opinion of the clinician. For research purposes, many classification systems have been proposed in the last few decades, and the coexistence of more than one system may lead to heterogeneity and confusion in the interpretation of research studies. As has recently been highlighted by Vitali et al20 the SS community should be striving for the common goal of reaching a final agreement on classification criteria for the disease. The first steps in this direction are to evaluate the performance of the new criteria and compare them to the currently used AECG criteria in external cohorts of patients and controls. Such comparison in our uniformly evaluated cohort of patients presenting with sicca helps serve this purpose.
This cohort has been evaluated in a homogeneous and standardised manner at two research clinics in Oklahoma and Minnesota by a multidisciplinary team of experts. The evaluation includes all the exams and laboratory procedures detailed in the AECG and ACR classification criteria, including the subjective components of the AECG criteria, and both ocular staining scoring systems.19 ,21 We did not assess the revised Japanese Ministry of Health criteria,12 ,26 because they are intended as an aid for clinical diagnosis and not for research classification which is the aim of our SS clinics. Additionally, they have not been tested in a non-Japanese population, and include additional invasive procedures which we felt were not justified for our participants.
It is relevant to note that there are differences in the enrolment strategy of our cohort in comparison to the sicca cohort. The most important difference is that while in the sicca cohort only 79% of the participants had both subjective dry eyes and dry mouth,22 the totality of our participants responded affirmatively to at least one ocular and one oral dryness question of the AECG criteria as evidence of symptoms of oral and ocular dryness.
The two clinics have so far evaluated 837 individuals, but only 646 for whom we had all the data points for both sets of criteria were included in the current analysis. Of these, 303 participants were classified as having SS by either one or both sets of criteria but almost 20% of the participants met only one set of criteria for SS, and not the other. This level of disagreement between the two classification systems is similar to that reported by Shiboski et al21 These patients would have been excluded from any study based on only one of the classification methods; thus, knowing their characteristics becomes relevant to future research.
In the case of patients classified as having SS by the AECG criteria only (n=35), two-thirds of them had a minor salivary gland lip biopsy consistent with SS and the remaining one-third had positive Ro/La serology. They did not meet ACR criteria because they did not have keratoconjunctivitis sicca, and their objective measures of dryness were confined to either the biopsy or the serology but not both (table 3). While we have not used formal expert consensus methodology, we believe that most experts would agree that SS is present in a person presenting with subjective dry eyes, subjective dry mouth, a confirmatory minor salivary gland biopsy or positive Ro/La serology plus at least one additional objective measure of dryness, be it a positive Schirmer's I test or an abnormal WUSF test.
Conversely, the individuals who met only the ACR criteria (n=24) did so because of alternative positive serology status or differences in the evaluation of keratoconjunctivitis sicca. Seven of them (29%) met criteria by having positive ANA/RF but negative anti-Ro/La as one of two criteria. Again, we have not done formal expert testing, but it is unlikely that these individuals would be considered to have SS based on expert opinion, especially without information about sicca symptoms. The remaining 71% met criteria for keratoconjunctivitis sicca by ACR but not by AECG criteria.
It has been proposed that it would be useful to know if the OSS developed for the ACR criteria can be substituted by the AECG vBS.21 Few already established cohorts that we are aware of, if any, are currently able to directly compare the performance of the vBS with the OSS. In cohorts that were evaluated before the publication of the OSS in 2010, determining the OSS would require access to the breakdown of the scoring of each eye: individual scores for medial and lateral bulbar conjunctiva and cornea plus the description of patches of confluent staining, staining in the papillary area and presence of filaments (figure 1). The vBS does not take into consideration these last three features,24 which add three possible points to the score of each eye in the case of the OSS. The vBS is considered abnormal with a score of ≥4 out of 9 possible points24 while the OSS is positive at ≥3 out of 12 points.23
We are in the unique position of having recorded separately each of the 12 possible scoring points for each eye in all our cohort participants. Thus, we were able to determine both their vBS and OSS scores and compare the performance of each system. To reduce interobserver and intraobserver variability inherent to the vBS scoring system,24 the vital dye score for each section of the ocular surface was determined using the sicca dot counting method. While there are no studies validating the conversion of this scoring method with the traditional vBS technique, the two are similar; we felt that an objective scoring method would be more meaningful and reproducible in the context of multiple observers. As expected, participants were more likely to have an abnormal OSS score than vBS, resulting in ∼25% of patients having a positive OSS but negative vBS. This difference was highly significant in all cohort participants and in patients who were classified as having SS by one or both sets of criteria (p<1×10−6). The OSS is superior in including true positive cases but has a poor performance ruling out those who do not have SS (ie, it is very sensitive but has poor specificity); the opposite is the case for the vBS. It is pertinent to note that only a minor proportion of cases of keratoconjunctivitis sicca are due to SS.27 It remains to be seen how other prospective cohorts evaluate these two scoring systems vis à vis, in order to determine what the optimal threshold should be. It is noteworthy that one of the main goals of the development of new classification criteria by the sicca consortium was to come up with a system that has high specificity to avoid exposing unaffected individuals to the potentially serious adverse effects of novel investigational therapies.21
The two tests that performed the best across all comparison groups were the minor salivary gland biopsy and anti-Ro/La serology, which performed similarly to reports in previous studies.21 ,28 The Schirmer's and WUSF tests while less useful in distinguishing true SS patients from participants with non-Sjögren's sicca syndrome, are easy to perform and are non-invasive. It has recently been suggested that more emphasis should be given to tests that in addition to identifying true cases and excluding unaffected individuals, can be done at early stages, multiple times, and with minimal distress to the participant.29 In the future, we may see salivary gland ultrasonography playing this role in SS.30 But with the current sets of criteria, there is an interesting difference in terms of accessibility; while the ACR criteria require evaluation by a practitioner specialising in eyes and by a practitioner who can perform a lip biopsy, both the Schirmer's and the WUSF tests can be performed in a standard medical office without the need for sophisticated equipment or medical specialists. Thus, patients can be assessed for subjective dry eyes and dry mouth, the presence of autoantibodies along with Schirmer's and WUSF testing by a rheumatologist. If AECG criteria are not met with such an assessment, then biopsy and eye examination can be pursued. In some clinical care settings or research situations that do not include exposing the selected participants to the risk of significant adverse events (such as some therapeutic trials), a stepwise approach such as this may be useful and cost effective.
The comparison of the AECG criteria with the proposed ACR classification demonstrates that neither system is clearly superior to the other when classifying a patient with SS; a finding already reported in the initial publication of the ACR criteria.21 The lack of highly sensitive, specific and reproducible criteria may, in part, be due to our current limited understanding of SS physiopathology; such knowledge would provide the most rational basis for disease classification. In the current setting, the ACR criteria may be best suited for stricter studies focused on high specificity to reduce the risk of drug-related toxicity, while the AECG criteria may be applicable to broader use, particularly in less risky medical research, or in non-treatment clinical or translational research settings. Moreover our findings of similar gene expression profiles across all possible patients affected by SS, which is different from what is observed in healthy controls, supports our notion that modifying classification using only clinical criteria is not likely to lead to consequential improvements in our ability to identify patients with SS. We believe that such improvements in diagnostic acumen will require a more fundamental understanding of the pathogenic mechanisms than is at present available.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Handling editor Tore K Kvien
Correction notice This article has been corrected since it was published Online First. Occurrences of ‘SICCA’ have been changed to lower case (‘sicca’).
Acknowledgements We are grateful to all the individuals with SS and those serving as healthy controls who participated in this study. We would like to thank the following individuals for their help in the collection and ascertainment of the samples used in this study: Erin Rothrock, Judy Harris, Sharon Johnson, Sarah Cioli, Nicole Weber, Dominique Williams, Wes Daniels, Cherilyn Pritchett-Frazee, Kylia Crouch, Laura Battiest, Justin Rodgers, James Robertson, Thuan Nguyen, Amanda Crosbie, Ellen James, Carolyn Meyer, Amber McElroy, Eshrat Emamian, Julie Ermer, Kristine Rohlf, Joanlise Leon, Anita Petersen, Danielle Hartle, Jill Novizke, Ward Ortman, Carl Espy, Beth Cobb, Gudlaug Kristjansdottir and Marianne Eidsheim. We would also like to thank Stuart Glenn and Jared Ning for their ongoing assistance in developing and maintaining the computational infrastructure used to perform this study.
Contributors All authors of the manuscript contributed to: conception and design, or analysis and interpretation of data; drafting the article or revising it critically for important intellectual content and final approval of the version to be published.
Funding This publication was made possible by grants 5R01 AR50782 (KLS), P50 AR0608040 (KLS, CJL, RHS, and ADF), 5U19 AI 082714 (KLS, CJL), 5R01 DE018209 (KLS, JBH), 5R37AI024717-22S1 (JBH, AR). The contents are the sole responsibility of the authors and do not necessarily represent the official views of the NIH. Additional funding was obtained from the Phileona Foundation (KLS) and the Oklahoma Medical Research Foundation (CJL and KLS). DUS received funding from an unrestricted grant from Research to Prevent Blindness to the University of Oklahoma Department of Ophthalmology.
Competing interests None.
Ethics approval Oklahoma Medical Research Foundation Internal Review Board and University of Minnesota Internal Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The authors would consider requests for the data generated in this project through collaborative arrangements.