Article Text

2016 American College of Rheumatology/European League Against Rheumatism classification criteria for primary Sjögren's syndrome
1. Caroline H Shiboski1,
2. Stephen C Shiboski1,
3. Raphaèle Seror2,
4. Lindsey A Criswell1,
5. Marc Labetoulle2,
6. Thomas M Lietman1,
7. Astrid Rasmussen3,
8. Hal Scofield4,
9. Claudio Vitali5,6,
10. Simon J Bowman7,
11. Xavier Mariette2,
12. the International Sjögren's Syndrome Criteria Working Group
1. 1University of California, San Francisco, California, USA
2. 2Université Paris-Sud, AP-HP, Hôpitaux Universitaires Paris-Sud, INSERM U1184, Paris, France
3. 3Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma, USA
4. 4Department of Veterans Affairs Medical Center, Oklahoma Medical Research Foundation, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, USA
5. 5Istituto Villa San Giuseppe, Como, Italy
6. 6Casa di Cura di Lecco, Lecco, Italy
7. 7University Hospitals Birmingham, NHS Foundation Trust, Birmingham, UK
1. Correspondence to Caroline H Shiboski, Department of Orofacial Sciences, Box 0422, Room S612, 513 Parnassus Avenue, University of California San Francisco, San Francisco, CA 94143, USA; caroline.shiboski{at}ucsf.edu

## Abstract

Objectives To develop and validate an international set of classification criteria for primary Sjögren's syndrome (SS) using guidelines from the American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR). These criteria were developed for use in individuals with signs and/or symptoms suggestive of SS.

Methods We assigned preliminary importance weights to a consensus list of candidate criteria items, using multi-criteria decision analysis. We tested and adapted the resulting draft criteria using existing cohort data on primary SS cases and non-SS controls, with case/non-case status derived from expert clinical judgement. We then validated the performance of the classification criteria in a separate cohort of patients.

Results The final classification criteria are based on the weighted sum of five items: anti-SSA/Ro antibody positivity and focal lymphocytic sialadenitis with a focus score of ≥1 foci/4 mm2, each scoring 3; an abnormal Ocular Staining Score of ≥5 (or van Bijsterveld score of ≥4), a Schirmer's test result of ≤5 mm/5 min and an unstimulated salivary flow rate of ≤0.1 mL/min, each scoring 1. Individuals with signs and/or symptoms suggestive of SS who have a total score of ≥4 for the above items meet the criteria for primary SS. Sensitivity and specificity against clinician-expert—derived case/non-case status in the final validation cohort were high, that is, 96% (95% CI92% to 98%) and 95% (95% CI 92% to 97%), respectively.

Conclusion Using methodology consistent with other recent ACR/EULAR-approved classification criteria, we developed a single set of data-driven consensus classification criteria for primary SS, which performed well in validation analyses and are well suited as criteria for enrolment in clinical trials.

• Sjøgren's Syndrome
• Disease Activity
• Treatment

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

• This criteria set has been approved by the American College of Rheumatology (ACR) Board of Directors and the European League Against Rheumatism (EULAR) Executive Committee. This signifies that the criteria set has been quantitatively validated using patient data, and it has undergone validation based on an independent data set. All ACR/EULAR-approved criteria sets are expected to undergo intermittent updates.

• The ACR is an independent, professional, medical and scientific society that does not guarantee, warrant, or endorse any commercial product or service.

Sjögren's syndrome (SS) is a multisystem autoimmune disease characterised by hypofunction of salivary and lacrimal glands and possible systemic multi-organ manifestations. It is primarily managed by rheumatologists, in collaboration with ophthalmologists and oral medicine/pathology specialists.

None of the 11 classification/diagnostic criteria sets for SS published between 1965 and 20021–11 had been endorsed by the American College of Rheumatology (ACR) or European League Against Rheumatism (EULAR). During the past decade, the most commonly used classification criteria have been the American-European Consensus Group (AECG) criteria,11 which have proven useful in research and clinical practice. In 2012, new classification criteria developed using the NIH-funded Sjögren's International Collaborative Clinical Alliance (SICCA) registry were published after being provisionally approved by the ACR.12 These criteria were designed for classifying individuals for enrolment in clinical trials and the target population used for their development and validation consisted of individuals with signs and symptoms suggestive of SS. Subsequent analyses to compare the ACR and AECG criteria, performed in a cohort of patients at the Oklahoma Medical Research Foundation (OMRF), revealed a high level of concordance.13 Although both criteria sets include similar items, the AECG criteria allow substitutions of some alternative items and the use of symptoms of dry eyes and mouth in classifying patients. The provisional ACR criteria are based solely on objective tests and with symptoms considered as inclusion criteria for the target population to whom the criteria should apply.

While some treatments may improve symptoms and prevent complications of SS, currently there is no cure. However, the recent development of new therapeutic options for the management of various autoimmune diseases is promising for SS patients. Well-defined entry criteria and end points that allow measurement of the effect of new treatments are needed for the development of new therapies. Disease activity indices for SS end points, i.e., the EULAR SS Patient Reported Index and EULAR SS Disease Activity Index (ESSDAI), have recently been developed and validated by the EULAR Sjögren's Task Force.14–17 The need for international consensus on classification criteria has recently been recognised by the SS scientific community.18 This international criteria set should be established using considerations and approaches published by both ACR and EULAR, in order to be approved by both organisations.19 ,20

In 2012, investigators from the SICCA team and the EULAR Sjögren's Task Force formed the International Sjögren’s Syndrome Criteria Working Group. The objective was to develop classification criteria for primary SS that combined features of the ACR and AECG criteria, using methods consistent with those recommended by the ACR and EULAR. We describe herein the development and validation of the resulting criteria, which have been approved by the ACR and EULAR. Consistent with our goal of producing criteria to aid in recruitment for clinical trials, we focused on primary rather than secondary SS. Patients with the latter would typically not be eligible for experimental treatments for SS.

## Methods

### Overview

Our methods rely on both data and expert clinical judgement and mirror those used for the development and validation of the 2010 ACR/EULAR criteria for rheumatoid arthritis21 ,22 and the 2013 ACR/EULAR criteria for systemic sclerosis.23 ,24 The approach is outlined schematically in figure 1 and described below.

1. We generated a preliminary list of candidate items based on the AECG and ACR criteria and guided by analyses of existing datasets (item generation). This list was finalised in two meetings of the International SS Criteria Working Group, held concurrently with the 2013 International Symposium on SS and the 2013 ACR Annual Meeting.

2. We used multi-criteria decision analysis (MCDA)25 to reduce the number of candidate criteria items, assign preliminary weights (item reduction and weight assignment) and help define a draft criteria set.

3. We tested and adapted the draft criteria using a development cohort with primary SS disease status, as determined by clinician-expert assessment of clinical vignettes.

4. We then tested the performance of the classification criteria in a similarly defined, but separate, validation cohort of patients.

5. We also tested the performance of the classification criteria in a subset of individuals whose SS case versus non-SS case status was difficult to determine (see below).

Figure 1

Overview of the methodology used for the definitive set of Sjögren's syndrome (SS) classification criteria, based on both data and expert clinical judgement. Item generation was derived from both the 2002 American-European Consensus Group (AECG) criteria and the 2012 American College of Rheumatology (ACR) criteria. * International SS Criteria Working Group meetings held during the 2013 International Symposium on Sjögren's Syndrome (ISSS) in Kyoto, Japan, and the 2013 ACR Annual Meeting in San Diego, California, USA. †The multi-criteria decision analysis (MCDA) survey was performed using 1000Minds software. ‡Disease case and non-case status in both the development and the validation cohorts were derived from expert clinical judgement based on clinical vignettes. ANA, antinuclear antibody; FS, focus score (computed from labial salivary gland biopsy in the presence of focal lymphocytic sialadenitis); OSS, Ocular Staining Score; RF, rheumatoid factor; UWS, unstimulated whole saliva flow rate; VB, van Bijsterveld.

### International Sjögren’s Syndrome Criteria Working Group

The working group (see appendix A) comprised 55 clinician-experts including 36 rheumatologists, 10 oral medicine/pathology specialists and nine ophthalmologists, as well as two patient advocates (from the USA and Europe). The methodology team consisted of a statistician (SCS) and two epidemiologists (CHS and RS). Approximately half of the clinician-experts were from Europe (Denmark, France, Greece, Italy, The Netherlands, Norway, Spain, Sweden and the UK) and, among the other half, most were from North and South America (the USA and Argentina), with the remainder from Japan.

### Item generation

Extensive statistical analyses were performed within the SICCA dataset, with input from the working group to better understand the similarities and differences between the AECG and ACR criteria sets. Concomitantly, statistical analyses comparing the ACR and the AECG criteria were performed within the OMRF cohort and a high level of concordance was identified (91% concordance among 646 OMRF participants, including 244 who met both sets of criteria and 343 who did not meet either).13

Considering the high degree of concordance between the AECG and ACR criteria and the fact that the components in both criteria sets overlap to some degree, there was general agreement on many of the key items for inclusion. However, some tests were included in the AECG but not in the ACR criteria (Schirmer's test, unstimulated whole saliva (UWS) flow rate, sialography, salivary scintigraphy) and others were included in the ACR but not in the AECG criteria (antinuclear antibody (ANA) titre and rheumatoid factor (RF) status). Also, ocular dryness was measured using the van Bijsterveld score (VBS)26 in the AECG criteria and the Ocular Staining Score (OSS)27 in the ACR criteria, although these tests measure ocular staining (the former with lissamine green and the latter with lissamine green (for conjunctiva) and fluorescein (for cornea)). The comparative analyses performed both in the SICCA and the OMRF cohorts, and presented to the working group, guided the generation of a final list of candidate items. It was agreed that all items originally included in both the AECG and the ACR criteria, except ANA titre and RF status, would be initial candidate items. The decision to exclude ANA and RF was based on analyses showing that an extremely small number of individuals who met the ACR criteria were negative for anti-SSA/SSB (anti-Ro/La) but positive for ANA (titre ≥1:320) and RF.13

### Item reduction and weight assignment

Relative ranking of selected items reflecting clinician-expert opinions was based on a web-based MCDA survey administered using 1000Minds software.25 ,28 This approach, based on pairwise ranking of alternatives (each defined using selected criteria items), has been described previously.29 The resulting item weights were normalised as percentages and used to define an additive score (see below) reflecting the likelihood of assigning disease case status.

### Development and validation patient cohorts

Three prospective cohorts of individuals with signs and/or symptoms suggestive of SS have been recruited over the past 10 years by teams of investigators who are now members of the International SS Criteria Working Group. These cohorts include (1) the SICCA cohort, comprising 3514 patients (including 1578 individuals who meet the ACR classification criteria for primary SS) recruited from Argentina, China, Denmark, India, Japan, the UK and the USA (co-principal investigators CHS and LAC), (2) the Paris-Sud cohort, which consists of 1011 patients (including 440 individuals who meet the AECG criteria for primary SS) recruited in Paris (principal investigator XM) and (3) the OMRF cohort, which includes 837 participants (including 279 individuals who meet the AECG criteria for primary SS) evaluated at either the Sjögren's Research Clinic at OMRF or the Sjögren's Clinic at the University of Minnesota (principal investigator K. Sivils, PhD (OMRF)).

These cohorts share several key characteristics that make them appropriate for criteria development: inclusion criteria required that participants have signs and/or symptoms suggestive of SS, warranting a comprehensive evaluation by a multidisciplinary team of SS clinicians. In addition to symptom-related data, objective tests with respect to oral, ocular and systemic/serologic end points had been performed using similar procedures, as described below.

#### Oral tests

Labial salivary gland (LSG) biopsy was performed to identify focal lymphocytic sialadenitis and obtain a focus score.30 UWS flow rates were measured using standard methods.31 ,32

#### Ocular tests

The OSS was obtained using lissamine green and fluorescein. Other ocular tests included Schirmer's test and measurement of tear breakup time. Ocular staining was assessed with the VBS in the Paris-Sud cohort, the OSS in the SICCA cohort and both methods in the OMRF cohort. The Paris-Sud cohort investigators also used fluorescein and collected data on the individual OSS components, so the OSS could be computed subsequently. Thus, data from the Paris-Sud and OMRF cohorts could be analysed to establish a conversion algorithm between both scores as follows: for lower scores (ie, scores of 1−3), the VBS was equal to the OSS, but VBS grades of 4, 5 and 6 were equivalent to OSS grades of 5, 6 and 7, respectively. For assessment of the clinical vignettes, ocular staining was expressed as the OSS, ranging from 0 to ≥7. A group of four ophthalmologists from France, the USA and the UK, including 3 of the authors, formed an ad hoc working group that interpreted the analyses performed on the Paris-Sud data (ML and TML) and on the OMRF data (AR). Together, they derived the conversion algorithm between the OSS and the VBS described above. In addition, since a VBS of four (previously used in the AECG criteria) was equivalent to an OSS of five, the group agreed to modify the OSS threshold to five in the new criteria set. This threshold has also been shown, as part of subsequent analyses of the SICCA data, to be more specific for diagnostic purposes than the previous score of 3 (data not shown).

#### Serologic assays

Serologic studies included testing for anti-SSA/SSB (anti-Ro/La), ANA, RF, IgG and complements C3 and C4.

Cohort PIs were each asked to provide a dataset that consisted of a random sample of 400 individuals, with equal numbers of primary SS cases and non-cases (using their own diagnostic definition) and case status not revealed in the dataset. The combined datasets thus comprised 1200 individuals with well-characterised data on the phenotypic features of SS. Clinical vignettes describing each individual's relevant features in text form were computer-generated using a program written in R (V.3.2).33 Vignettes described each individual with respect to age, sex, reported symptoms, clinical signs, test results including ANA titre, RF, IgG, C3, C4, anti-SSA/Ro and anti-SSB/La status, OSS for each eye, Schirmer's test result for each eye, whether the LSG biopsy revealed focal lymphocytic sialadenitis and focus score (see online supplementary figure S1). Ocular symptoms were defined according to the AECG definition, as a positive response to at least one of the following questions: (1) Have you had daily, persistent, troublesome dry eyes for more than 3 months? (2) Do you have a recurrent sensation of sand or gravel in the eyes? (3) Do you use tear substitutes more than three times a day? Oral symptoms were defined as a positive response to at least one of the following questions: (1) Have you had a daily feeling of dry mouth for more than 3 months? (2) Do you frequently drink liquids to aid in swallowing dry food?

### Assessment of SS case/control status

We excluded four vignettes selected randomly from the study population to obtain 1196 vignettes that were randomly distributed into 26 surveys, each containing 46 individual vignettes. Research Electronic Data Capture (REDCap)34 was used to administer each survey to two clinician-experts, under blinded conditions. Twenty-six pairs of clinician-experts participated in the first survey exercise and each pair completed one survey. They were instructed to review each vignette and asked if they thought the patient described had primary SS. Possible responses were ‘yes’, ‘no’ and ‘not sure’. Concordant yes/no responses were used to assign case/non-case status; concordant ‘not sure’ responses were interpreted as non-gradable vignettes. All vignettes with discordant answers (yes/no, yes/not sure or no/not sure) were included in a second round of surveys that were each sent to a third clinician-expert (nine clinician-experts contributed to the second round of surveys). Concordance was then defined as two concordant answers of the three, with a vignette defined as a primary SS case if there were two ‘yes’ answers and as a non-SS control if there were two ‘no’ answers. Vignettes that received three discordant answers (yes/no/not sure) were considered ‘difficult-to-classify cases’ and were combined into a third survey sent to eight clinician-experts, all of whom were members of the steering committee. These difficult-to-classify cases were defined as SS cases if the majority of clinician-experts (five or more out of eight) responded ‘yes’ to a vignette and as non-SS controls if the majority responded ‘no’.

### Randomisation of vignettes across development and validation cohorts

Each of the 1196 vignettes was assigned a unique identification number and the vignettes were randomly divided into two groups of 598, with one to be used as development cohort and the other for validation purposes. Clinician-experts who completed the surveys were blinded with regard to the origin (development or validation set) of the clinical vignettes.

### Testing and adaptation of the draft criteria

We conducted exploratory analyses of the clinician-expert rankings derived from the MCDA survey to characterise distributions of item-specific weights. Results were summarised graphically and using summary statistics. We also performed analyses linking vignette items from the development cohort with corresponding clinician-expert outcome classifications, restricted to individuals with clinician-expert-assigned case/non-case outcomes. Conditional random forest classifiers35 were used to obtain variable importance rankings for (1) all vignette items and (2) binary indicators corresponding to the items and used in the MCDA survey.

Based on results from exploratory analyses, we defined several candidate classification criteria, focusing on the items selected by clinician-experts for the MCDA survey. Criteria were defined based on scores computed as weighted sums of binary indicators of presence/absence of items, with weights reflecting relative importance. In addition to the MCDA-derived weights, we used logistic regression models fitted to the development sample to derive alternate weights from item-specific coefficients. Cut-off values for case designation for candidate criteria were computed using receiver operating characteristic (ROC) methods applied to clinician-expert-defined outcomes in the development dataset. For each candidate item, two cut-off values were identified using a generalised Youden index.36 For the first cut-off value, sensitivity and specificity were weighted as equally important; for the second, specificity was weighted as twice as important as sensitivity.

We held a final meeting of the International SS Criteria Working Group to present and discuss testing and adaptation of the draft criteria results. A summary report was subsequently sent to all members, including those who could not attend the meeting. A REDCap survey was administered to the entire panel of clinician-experts, seeking consensus on the final draft criteria prior to validation.

### Criteria validation

Validation of candidate criteria was based on ROC analyses using the validation sample, restricted to individuals with clinician-expert-assigned case/non-case status. We separately assessed classification performance in the subset of difficult-to-classify cases. Performance was summarised using estimated sensitivity and specificity with accompanying 95% confidence intervals (95% CIs) and area under the curve (AUC) statistics.

## Results

### Distribution of responses and item weights in the MCDA survey

Fifty-two clinician-experts participated in the MCDA survey. Table 1 shows the item weights for each of the seven items (note that weights are normalised to sum to 1, yielding a proportion interpretation). Figure 2 presents the distribution of item weights across experts. The curves in the figure are smoothed kernel density estimates that have a relative frequency interpretation similar to that used in histograms. The results indicate that an LSG biopsy showing focal lymphocytic sialadenitis with a focus score of ≥1 and anti-SSA/SSB (anti-Ro/La) positivity received the highest average weights, followed by OSS, UWS, Schirmer's test result, oral symptoms and ocular symptoms, respectively. Weight distributions for ocular/oral symptoms, Schirmer's test result/UWS and focus score/anti-SSA/SSB (anti-Ro/La) were remarkably similar in both mode and variability.

Table 1

Estimated weights for three alternate criterion scores, based on the development vignette data

Figure 2

Distributions of clinician-expert-assigned weights for seven items included in the multi-criteria decision analysis (MCDA) survey. Curves are smoothed kernel probability density estimates and the vertical scale can be interpreted similarly to relative frequency histograms. OSS, Ocular Staining Score; UWS, unstimulated whole saliva flow rate.

### Case status assessment in the development and validation cohorts

The first round of surveys yielded 819 concordant and 377 discordant responses (see online supplementary figure S2). The concordant responses provided 415 primary SS cases and 377 non-SS controls. The 377 vignettes with discordant responses were included in a second round of nine surveys assigned to nine clinician-experts, providing a third response to each discordant vignette. This yielded an additional 151 primary SS cases and 125 non-SS controls (with two of the three responses being concordant). When reconciling identification numbers among the vignettes initially randomly assigned to be used in either cohort, the first two rounds of surveys yielded 288 primary SS cases and 248 non-SS controls in the development cohort and 278 primary SS cases and 254 non-SS controls in the validation cohort.

The 72 vignettes in the second round of the survey that received three discordant responses were included in a third round of surveys administered to the eight members of the steering committee who were also clinician-experts. These provided a pool of 49 difficult-to-classify cases that received a majority of concordant responses (five or more out of eight) after the third round of survey: 35 primary SS cases and 14 non-SS controls.

### Criteria development

Random forest variable importance rankings based on the clinician-expert classifications of the development dataset vignettes are shown in figure 3. Results based on all vignette variables, as well as the binary indicators consistent with items included in the MCDA survey, are shown. Rankings corresponded well with results from the MCDA survey and clearly indicated the relatively greater importance of objective measures such as the LSG focus score and antibody results in expert classification decisions. Oral and ocular symptoms did not affect classification performance, reflecting the observation that >94% of individuals had at least one symptom.

Figure 3

Importance of variables for random forest classification of clinician-expert case/non-case designations in development data vignettes. Analyses based on all vignette variables (A) and restricted to binary indicators consistent with the multi-criteria decision analysis survey items (B) were performed. ANA, antinuclear antibody; OSS, Ocular Staining Score; RF, rheumatoid factor UWS, unstimulated whole saliva flow rate.

An initial criteria score was developed as a weighted sum of the seven items in the MCDA survey, based on the average weights reported in table 1. We used logistic regression models to develop an alternate empirical criteria score for the development data, focusing on the items used in the MCDA survey but including indicators for anti-SSA/Ro and anti-SSB/La positivity as separate variables. Scores were computed using weights based on rescaled regression coefficients from a model in which items representing significant predictors of case status were retained.37 Oral and ocular symptoms and anti-SSB/La positivity were excluded because they did not affect classification performance based on the random forest variable importance rankings from the clinician-expert classifications of the development dataset vignettes (figure 3B). Furthermore, oral and/or ocular symptoms had been part of the inclusion criteria for participation in the three patient cohorts; therefore, a group decision was made that oral and/or ocular symptoms or suspicion of SS based on one of the domains of the ESSDAI would be preliminary requirements for applying the new SS classification criteria. The decision to exclude anti-SSB/La as an item was also based on group discussions and on a study demonstrating that the presence of anti-SSB/La without anti-SSA/Ro antibodies had no significant association with SS phenotypic features, relative to seronegative participants.38

ROC analysis of the MCDA score yielded an AUC value of 0.96 and two alternate cut-offs for case classification (table 2). ROC analysis of the logistic score yielded an AUC value of 0.98 and two alternate cut-offs for case classification. We also considered a modified version of the logistic score that assigned equal weights to the OSS, Schirmer's test result and UWS items, reflecting clinician-expert opinions that UWS should be weighted similar to the Schirmer's test result and for greater consistency with the results of the MCDA survey (table 1). The ROC analysis yielded similar results to the logistic score (AUC 0.98) (table 2).

Table 2

Cut-off values, sensitivity, specificity, κ-statistic, AUC values and agreement with existing AECG and ACR criteria sets, for three candidate criterion scores

Table 2 also presents κ-statistics measuring agreement between outcome classifications based on the three alternate criterion scores and classifications with the existing AECG and ACR criteria. Results indicate high levels of agreement, with the strongest values obtained from the logistic and modified logistic scores with a cut-off selected to weight sensitivity and specificity equally.

The REDCap survey, seeking consensus on the final draft criteria, yielded 98% clinician-expert consensus on use of the modified logistic score as the basis for final draft criteria, with case status based on a score of ≥4, and agreement to move forward with validation of these criteria. The final criteria definition is presented in table 3.

Table 3

American College of Rheumatology/European League Against Rheumatism classification criteria for primary Sjögren’s syndrome: The classification of primary Sjögren's syndrome (SS) applies to any individual who meets the inclusion criteria,* does not have any of the conditions listed as exclusion criteria,† and has a score of ≥4 when the weights from the five criteria items below are summed

### Validation of candidate criteria

We compared the validation and development data with respect to key variables, including their associations with outcome classification. Overall agreement was quite high, indicating no apparent major differences in the two datasets (see online supplementary table S1). Initial validation of the selected criteria was based on estimated sensitivity and specificity using the clinician-expert responses in the full validation dataset. Sensitivity was 96% (95% CI 92% to 98%) and specificity was 95% (95% CI 92% to 97%). Validation was also performed in the subset of 49 difficult-to-classify cases and non-cases, for which sensitivity was 83% (95% CI 66% to 93%) and specificity was 100% (95% CI 77% to 100%).

## Discussion

We present herein an international set of classification criteria for primary SS, developed and validated using approaches approved by both ACR and EULAR committees that oversee classification criteria. These criteria are applicable to any patient with at least one symptom of ocular or oral dryness (based on AECG questions)11 or suspicion of SS due to systemic features derived from the ESSDAI measure16 with at least one positive domain item. The criteria do not apply to individuals with a prior diagnosis of a condition (from a pre-specified list) that would exclude participation in primary SS therapeutic trials because of overlapping clinical features or interference with criteria tests. The new classification criteria are based on five objective tests/items. Individuals are classified as having primary SS if they have a total score of ≥4, derived from the sum of the weights assigned to each positive test/item (with focal lymphocytic sialadenitis with focus score ≥1 and anti-SSA/Ro positivity having the highest weights (3 each) and OSS of ≥5 (or VBS of ≥4) in at least one eye, Schirmer's test result ≤5 mm/5 min in at least one eye and UWS flow rate of ≤0.1 mL/min having a weight of 1 each). We found that the criteria perform very well when validated using vignettes describing patients with primary SS status defined by expert opinion. The criteria retained high sensitivity and specificity in a subset of 49 vignettes for which case/non-case distinction was difficult.

The form of the proposed criteria improves on previous criteria, in that they are based on a weighted sum of items, with weights derived from consensus expert opinion and analyses of patient data. Also, positive serology for anti-SSB/La in the absence of anti-SSA/Ro is no longer considered a criteria item. For instance, in the validation cohort, 15 individuals were anti-SSB/La-positive in the absence of anti-SSA/Ro and focal lymphocytic sialadenitis on LSG biopsy and thus would have been classified as non-SS using the new criteria. However, 12 of them would have been classified as having primary SS based on both the AECG and the 2012 ACR criteria and this would very likely have been a misclassification.

Improvements from the 2012 ACR criteria include the addition of Schirmer's test and the UWS, the use of a higher threshold for the OSS (≥5) and the optional use of the VBS as an alternative to the OSS (in cases when an ophthalmologist trained in the OSS is not available). Additional modifications include removal of high-titre ANA and positive RF as items. Improvements from the 2002 AECG criteria include oral and ocular symptoms being considered part of eligibility determination (ie, eligibility of individuals to be assessed for SS using the criteria) rather than serving as criteria items, the OSS being included as an alternative to the VBS and sialography and salivary scintigraphy being omitted. Furthermore, the new criteria consider systemic signs and B-cell activation biomarkers (determined using the ESSDAI) in inclusion eligibility determination, which will allow diagnosis of systemic and earlier forms of the disease when SICCA features are not already present. Compared with the AECG criteria, exclusionary conditions have also been updated. IgG4-related disease has been added; hepatitis C infection requires confirmation by PCR and pre-existing lymphoma is allowable, since diagnosis of SS is sometimes made after a prior lymphoma occurrence.

Strengths of our approach include the following: (1) assignment of criteria item weights combined consensus methods for quantifying expert opinion with confirmatory statistical analysis of real patient vignettes classified by clinician-experts; (2) the working group was international and represented a range of clinical specialties (65% rheumatologists, 18% oral medicine/pathology specialists and 16% ophthalmologists) and (3) our methods have been successfully applied in the development and validation of ACR/EULAR classification criteria for rheumatoid arthritis21 ,22 and systemic sclerosis.23 ,24 Another advantage of these methods is that they are adaptable to future modifications of the criteria that may arise with the adoption of new diagnostic tests, such as parotid ultrasonography or improved serologic assays. For example, some research suggests that it may be important to distinguish between monospecific antibody assays to Ro 60 or Ro 52,3942 although further validation studies will be needed before they can be used for patient classification. A shared limitation, common to criteria for many rheumatic diseases, is the use of expert clinical judgement in the absence of an objective ‘gold standard’ for defining the disease and the associated effect of the resulting ‘circularity’ on measured performance of criteria sets.

The primary application of classification criteria is recruitment in clinical trials and studies. Although our study focused on classification of primary SS, the proposed criteria may be applicable to SS associated with other autoimmune diseases. However, further research is needed to confirm this.

The landscape of SS has changed in recent years, due to both the recently validated disease activity indices and the availability of new therapeutic agents. Using methodology consistent with other recent ACR/EULAR-approved classification criteria, we developed a single set of data-driven consensus classification criteria for primary SS, which performed well in validation and are well suited as entry criteria for clinical trials.

## Acknowledgments

The authors would like to express their appreciation to Steve Taylor and Kathy Hammitt (Sjögren's Syndrome Foundation) for hosting three of the meetings of the International SS Criteria Working Group, Dr Frédéric Desmoulins for his important work in preparation of the Paris-Sud cohort dataset and Mi Lam for her contribution in preparation of the Sjögren's International Collaborative Clinical Alliance dataset. They are very grateful to Paul Hansen and Franz Ombler, the developers and owners of the 1000Minds software (https://http://www.1000minds.com), who granted them an Academic Award, providing both access to and technical support for their software. They also express their greatest appreciation to all participants who enrolled in the three patient cohorts used for development and validation of the criteria and to the clinician-expert members of the international working group for attending meetings, providing valuable input as part of these meetings and responding to several rounds of surveys, including grading multiple vignettes.

## Appendix A: The International Sjögren's Syndrome Criteria WORKING GROUP

Members of the International Sjögren's Syndrome Criteria Working Group, in addition to the authors, were as follows: Drs AM Heidenreich, H Lanfranchi and C Vollenweider (Argentina); Dr M Schiødt (Denmark); Drs V Devauchelle, JE Gottenberg and A Saraux and patient representative Maggy Pincemin (France); Dr T Dörner (Germany); Dr A Tzoufias (Greece); Drs C Baldini, S Bombardieri and S De Vita (Italy); Drs K Kitagawa, T Sumida and H Umehara (Japan); Drs H Bootsma, AA Kruize, TR Radstake and A Vissink (the Netherlands); Dr R Jonsson (Norway); Dr M Ramos-Casals (Spain); Dr E Theander (Sweden); Drs S Challacombe, B Fisher, B Kirkham, G Larkin, F Ng and S Rauz (UK) and Drs E Akpek, J Atkinson, AN Baer, S Carsons, N Carteron, T Daniels, B Fox, J Greenspan, G Illei, D Nelson, A Parke, S Pillemer, B Segal, K Sivils, EW St Clair, D Stone, F Vivino and A Wu and patient representative Kathy Hammitt (USA).

## References

View Abstract

## Footnotes

• Handling editor Tore K Kvien

• Competing interests CHS has received consulting fees from the Pasteur Institute (less than $10 000). SCS has received textbook royalties from Springer Publishing (less than$10 000). ML has received consulting fees from Alcon, Allergan, MSD, Sanofi, Santen and Thea (less than $10 000 each). HS has received consulting fees from UCB and Eli Lilly (less than$10 000 each). SJB has received consulting fees from Celgene, Eli Lilly, Glenmark, GlaxoSmithKline, MedImmune, Novartis, Ono, Pfizer, Roche, Takeda and UCB (less than \$10 000 each).