Statistics from Altmetric.com
Systemic sclerosis (SSc; scleroderma) is a heterogeneous disease whose pathogenesis is characterised by three hallmarks: small vessel vasculopathy, production of autoantibodies, and fibroblast dysfunction leading to increased deposition of extracellular matrix.1 The clinical manifestations and the prognosis of SSc vary, with the majority of patients having skin thickening and variable involvement of internal organs. Subsets of SSc can be discerned, that is, limited cutaneous SSc, diffuse cutaneous SSc, and SSc without skin involvement.1
In the absence of a diagnostic test proving the absence or presence of SSc, several sets of classification criteria have been developed.2–6 The purpose of classification criteria is to enrol, in research studies, patients who in fact have the disease as determined using a uniform definition.7 Classification criteria are not synonymous with diagnostic criteria but will almost always mirror the list of criteria that are used for diagnosis.7 However, classification criteria generally are more standardised and less inclusive than physician diagnosis.
The current standard classification criteria for SSc are the 1980 preliminary criteria for the classification of systemic sclerosis (scleroderma), developed by the American College of Rheumatology (ACR).2–4 ,8 These classification criteria were developed using patients with longstanding SSc. As a consequence, patients with early SSc and ∼20% of patients with limited cutaneous disease do not meet the criteria and are excluded from clinical studies.1 ,9 ,10 Since the development of the 1980 criteria, knowledge regarding SSc-related autoantibodies has improved.11–13 In addition, characteristic nailfold capillary changes have been found to be associated with SSc, and nailfold capillaroscopy is widely accepted as a diagnostic tool.10 ,14–17 In 1988, LeRoy et al11 proposed new criteria that included clinical features, autoantibodies, and capillaroscopy results, highlighting the differences between the two main SSc subsets. In 2001, LeRoy and Medsger proposed to revise the classification criteria to include ‘early’ SSc, using nailfold capillary pattern and SSc-related autoantibodies.6 It also has been demonstrated that the addition of nailfold capillary abnormalities and telangiectasias to the ACR SSc criteria improves their sensitivity.9 ,18
Because of the insufficient sensitivity of the 1980 criteria and advances in knowledge about SSc, the ACR and the European League Against Rheumatism (EULAR) established a committee to provide a joint proposal for new classification criteria for SSc. The aims were to develop criteria that (1) encompass a broader spectrum of SSc including patients whose disease is in the early stage as well as those in the late stage; (2) include vascular, immunologic, and fibrotic manifestations; (3) are feasible to use in daily clinical practice; and (4) are in accordance with criteria used for diagnosis of SSc in clinical practice.7 These criteria are intended to be used by rheumatologists, researchers, national and international drug agencies, pharmaceutical companies, or any others involved in studies of SSc. Our objective was to develop a set of criteria that would enable identification of individuals with SSc for inclusion in clinical studies, being more sensitive and specific than previous criteria.
The development and testing of the classification system for SSc was based on both data and expert clinical judgment. First, candidate items for the classification criteria were generated using consensus methods and evaluated using existing databases.19 ,20 Second, multicriteria decision analysis was used to reduce the number of candidate criteria and assign preliminary weights.21 The classification system was repeatedly tested and adapted using prospectively collected SSc cases and non-SSc controls, and compared against expert clinical judgment. Third, the classification criteria were tested in a validation cohort and tested against preexisting criteria sets.
Item generation and reduction
One hundred sixty-eight candidate criteria were identified through two Delphi exercises. A 3-round Delphi exercise and a face-to-face consensus meeting using nominal group technique facilitated reduction of the 168 items to 23.19 Using a random sample of existing databases (SSc (n=783) and control patients with diseases similar to SSc (n=1 071), all based on physician diagnosis), the candidate criteria were found to have good discriminative validity.20
Item reduction and weighting
Draft classification system
A face-to-face meeting of four European and four North American SSc experts was held to further reduce items and assign preliminary weights using multicriteria decision analysis. The number of experts was limited in advance to 8, and they were invited based on geographic representation, knowledge from a scientific and a practical diagnostic viewpoint, and availability. At the meeting, the experts determined by consensus to which cases the criteria should be and should not be applied, and which items are sufficient to allow classification of a patient as having SSc (sufficient criteria). They then participated in a multicriteria decision analysis to further reduce the 23 items and assign preliminary weights.21 The experts were presented hypothetical pairs of cases with 2 of the 23 items at a time (eg, Raynaud's phenomenon positive and abnormal nailfold capillaries absent vs Raynaud's phenomenon negative and abnormal nailfold capillaries present, all other manifestations being considered equal) and they were asked to individually vote electronically on which case of the pair was more likely to be SSc. The result of the votes was immediately presented. If there was no complete agreement among the experts, considerations were discussed and a second round of voting was conducted. As a result of the repeated choices between two alternative cases, items were ranked, and weights for the items were derived using 1000 Minds decision-making software.21 Additional details about the methods are available in ref. 22.
Initial threshold identification
The committee prepared summaries of 45 SSc cases, with a concentration of cases that were difficult to classify. These were presented to 22 SSc experts who classified the cases as definite SSc or not. The draft classification system derived from the multicriteria decision analysis was applied to the 45 cases, resulting in a score for each case. The ranking of cases by the SSc experts and the ranking of cases based on the scores provided with the draft scoring system were examined. Higher scores in the scoring system were expected to relate to a higher probability that the experts would classify the case as SSc. Using these results, an initial threshold score for SSc was identified.
Reduction and testing of iterative changes
In the next step, the committee reduced the number of items, simplified the weights, and modified the threshold score. First, data on the candidate items were prospectively collected at 13 SSc centres in North America and 10 in Europe, using standardised case record forms. Data from 368 consecutive patients with SSc (diagnosis based on physician opinion) were collected, of whom half were to have had SSc for a maximum of 2 years (based on the time from the first non-Raynaud's symptom) in order to include early SSc. Data from 237 consecutive control patients with a scleroderma-like disorder (eosinophilic fasciitis (also called Shulman's disease or diffuse fasciitis with eosinophilia), scleromyxedema, systemic lupus erythematosus, dermatomyositis, polymyositis, primary Raynaud's phenomenon, mixed connective tissue disease, undifferentiated connective tissue disease, generalised morphea, nephrogenic systemic sclerosis, and diabetic cheiroarthropathy) were also collected. From these 605 patients a random sample of 100 SSc cases and 100 controls (50% from North America and 50% from Europe) was selected to form the derivation sample. The remaining 268 cases and 137 controls formed the validation sample. Institutional research ethics board approval was obtained for the collection of patient data.
The committee then met and made iterative changes to the draft system, which they continually applied in real time to the derivation cohort derived as described above. Using the derivation cohort, the scoring system was simplified by removing items that occurred with low frequency or were redundant, by aggregating similar items and then transforming the weights to obtain single digits. The preliminary score threshold was adjusted to account for the weight simplification. The impact of all proposed changes was evaluated by assessing changes to sensitivity and specificity of the criteria in the derivation cohort. The reference standard to test the sensitivity and specificity was the diagnosis by the SSc expert who submitted the case(s) and control(s).
At the same time, the changes in the classification system were also tested in 38 difficult-to-classify cases. Consequently, weights of some items were adjusted to align the scoring system with the reference standard formed by the opinions of the SSc experts as to which cases were to be classified as having SSc.
The final classification system was independently tested using the validation sample of SSc cases and controls. Sensitivity and specificity were calculated for the 1980 ACR preliminary classification criteria for SSc,3 the classification criteria proposed by LeRoy and Medsger in 2001,6 and the newly developed classification criteria. Exact binomial confidence limits were calculated for sensitivity and specificity. The ACR criteria and LeRoy/Medsger criteria were compared with the new criteria using 2×2 tables with McNemar's χ2 test and continuity correction. The criteria sets were also tested separately using only the subgroup of patients with a disease duration of ≤3 years. Further, the classification system was validated against the expert consensus on the set of 38 selected cases.
Draft classification system
The experts concluded that ‘skin thickening of the fingers of both hands extending proximal to the metacarpophalangeal joints’ was sufficient to classify a subject as having SSc. Further, patients with ‘skin thickening sparing the fingers’ are classified as not having SSc. It was agreed that the criteria should be applied to any patient considered for inclusion in an SSc study, without further specifications. Items with relatively low weights were deleted, and items considered to be from a similar domain were clustered (eg, fingertip lesions encompasses ulcers and pitting scars; lung involvement encompasses interstitial lung disease and pulmonary hypertension). Using conjoint analysis, the number of items was reduced from 23 to 14 and all items were assigned weights. The 14 resulting items (with weights) were as follows: bilateral skin thickening of the fingers (sclerodactyly) (weighted 14 if distal to a proximal interphalangeal joint only, 22 if whole finger), puffy fingers (weighted 5), fingertip lesions (weighted 16 if pitting scars or 9 if digital ulcers), finger flexion contractures (weighted 16), telangiectasia (weighted 10), abnormal nailfold capillaries (weighted 10), calcinosis (weighted 12), Raynaud's phenomenon (weighted 13), tendon or bursal friction rubs (weighted 21), interstitial lung disease/pulmonary fibrosis (weighted 14), pulmonary arterial hypertension (weighted 12), scleroderma renal crisis (weighted 11), oesophageal dilation (weighted 7), and SSc-related autoantibodies (weighted 15 if anticentromere antibody present, anticentromere pattern seen on antinuclear antibody testing, or anti–topoisomerase I (also called anti–Scl-70) or anti–RNA polymerase III present).
Initial threshold identification
Comparison of the case ranking from the scoring system and by the experts revealed that most of the experts (≥75%) considered the cases with a score of >55 (except for 1 case) to be SSc. Similarly, most experts (≥88%) considered cases with a score of <40 not to be SSc. With scores between 40 and 55 there was more diversity of opinion. Thus, it was concluded that the initial threshold would be a score of ≥56.
Reduction and testing of iterative changes
The 14 items in the scoring system were reduced to 9 while maintaining sensitivity and specificity in the derivation sample. The items deleted were finger flexion contractures, calcinosis, tendon or bursal friction rubs, renal crisis, and oesophageal dilation. Puffy fingers or sclerodactyly were combined into one item, and pulmonary arterial hypertension and interstitial lung disease were also combined into one item, resulting in 7 items for the scoring system. In the derivation sample, with reduction of the 14 items to 7, the sensitivity and specificity were 0.93 and 0.94, respectively. Weights were simplified by dividing each weight by five and rounding to the nearest integer. The threshold for this simplified scoring system was determined to be 9. The resulting sensitivity and specificity were 0.97 and 0.88, respectively.
Weights were further adjusted to align the scoring system with the experts’ opinions (SSc or not SSc) on each of the 38 difficult-to-classify cases. To improve the specificity of the classification criteria, an exclusionary criterion was added: patients with a diagnosis that better explains their manifestations should not be classified as having SSc. These revisions resulted in the correct classification of all patient profiles judged to have SSc by the majority of experts.
The SSc classification criteria
The new classification criteria are presented in table 1. The table shows the one criterion that, if present, is sufficient for classification as SSc, the two exclusionary criteria, and the seven items with a combined threshold above which cases are classified as SSc. The classification criteria may be applied to patients who may have SSc and are being considered for inclusion in an SSc study. As noted above, they are not to be applied to patients who have a SSc-like disorder that better explains their manifestations; and patients with ‘skin thickening sparing the fingers’ are not classified as having SSc.
If a patient has skin thickening of the fingers of both hands that extends proximal to the metacarpophalangeal joints, the classification system assigns nine points for this one item alone, which is sufficient to classify the patient as having SSc with no further application of the point system needed. Otherwise, the point system is applied by adding the scores for manifestations that are ‘positive.’ The items are skin thickening of the fingers, fingertip lesions, telangiectasia, abnormal nailfold capillaries, pulmonary arterial hypertension and/or interstitial lung disease, Raynaud's phenomenon, and SSc-related autoantibodies. Two items, skin thickening of the fingers and fingertip lesions, include two different possible manifestations. If a patient has both manifestations, the score for the category is the higher score of the two manifestations. For example, in the item skin thickening of the fingers, if a patient has both manifestations, that is, puffy fingers (weighted 2) and sclerodactyly (weighted 4), the total score for the item would be 4. The maximum possible score is 19, and patients with a score of ≥9 are classified as having SSc. The definitions of the items used in the criteria are provided in table 2.
Table 3 shows the characteristics of the validation sample (268 patients with SSc, 137 controls). The sensitivity and specificity of the new SSc classification criteria were compared with those of the 1980 ACR classification criteria and the classification criteria proposed by LeRoy and Medsger, and the results are shown in table 4. The sensitivity and specificity of the new SSc criteria were, respectively, 0.91 and 0.92 in the validation sample. The new criteria performed better than the two earlier classification schemes in terms of sensitivity and specificity (p=0.01 vs the 1980 ACR criteria, p=0.004 vs the LeRoy/Medsger criteria). The area under the receiver operating characteristic curve of the classification system tested against presence of SSc in the validation sample was 0.81 (95% CI 0.77 to 0.85). The performance of the new criteria in patients with disease of ≤3 years’ duration was similar to the performance in the group overall (table 4).
The classification system was additionally tested against expert opinion (n=16 experts), using the set of 38 selected cases (table 5). All of the cases scoring ≥9 were considered SSc, whereas cases scoring <9 were regarded as not being SSc or were controversial. With the proposed system all of these cases were classified in accordance with consensus-based expert opinion. All cases that were classified as SSc with the 1980 ACR criteria were also classified as SSc with the new criteria, as were several cases not classified as SSc with the 1980 ACR criteria.
A classification system for systemic sclerosis is needed to ensure that all patients categorised as having SSc for inclusion in studies do indeed have the disease, based on specific defined characteristics. The major reason to revise the 1980 ACR criteria was that those criteria lacked adequate sensitivity, especially in patients with early SSc or with limited cutaneous SSc.9 ,10 ,18 The proposed classification criteria are superior and exhibit greater sensitivity and specificity compared to the 1980 criteria and the classification criteria proposed by LeRoy and Medsger. All profiles of patients who were considered to have SSc by a majority of experts were indeed classified as SSc with the new classification system, and the new system is more inclusive and also performs well in patients with early disease (≤3 years since diagnosis).
The newly developed classification system includes disease manifestations of the three hallmarks of SSc: fibrosis of the skin and/or internal organs, production of certain autoantibodies, and vasculopathy. The four items comprising the 1980 ACR classification criteria (scleroderma proximal to the metacarpophalangeal joints, sclerodactyly, digital pitting scars (not pulp loss), and bilateral basilar pulmonary fibrosis)3 are also included, as are the items in the criteria proposed in 2001 by LeRoy and Medsger (Raynaud's phenomenon, autoantibodies, nailfold capillaroscopy abnormalities, skin fibrosis).6
The new criteria include one criterion that alone is sufficient for classification as SSc: skin thickening of the fingers extending proximal to the metacarpophalangeal joints, which is similar to the 1980 criteria. If the single sufficient criterion is not fulfilled, the point system is applied and patients with a score of ≥9 are classified as having SSc. All items in the classification criteria represent measurements that are performed in routine clinical practice. The criteria are meant for inclusion of SSc patients in studies, not for SSc diagnosis. Although the list of items in the classification criteria mimics the list of items one usually uses for diagnosis, in practice the diagnosis of SSc may also be informed by items not included in the classification criteria, such as tendon friction rubs, calcinosis, and dysphagia. Consequently, patients classified as having SSc are a subset of patients being diagnosed as having SSc, with diagnosis being more sensitive. Ideally, there would be no difference between diagnosis and classification criteria.
As intended, the new classification system incorporates the considerable advances made in the diagnosis of SSc. It includes the concept of specific serum autoantibodies such as anti–topoisomerase I, anticentromere, and anti–RNA polymerase III.15 ,23 There is the possibility that testing for additional SSc autoantibodies, such as anti-Th/To, anti–U3 RNP, and others, may become more widely available. The criteria also acknowledge the value of magnified nailfold visualisation in the diagnosis of SSc.14 ,15 Although capillaroscopy can be performed with highly specialised equipment such as videocapillaroscopy cameras, simple in-office ophthalmoscopes or dermatoscopes suffice for distinguishing between normal and abnormal nailfold capillaries.24 ,25 Capillaroscopy is now widely used, and considering the value of magnified nailfold visualisation in the diagnosis and management of SSc, these new criteria may encourage acquisition of this skill by physicians caring for SSc patients. Likewise, criteria for pulmonary artery hypertension have changed over the years. The ACR/EULAR committee recognises this, and the diagnosis of pulmonary artery hypertension should be based on the most recent accepted criteria from right-sided heart catheterisation.
Several items that are useful for recognising SSc in clinical practice, such as calcinosis, flexion contractures of the fingers, tendon or bursal friction rubs, renal crisis, oesophageal dilatation, and dysphagia are not included in the criteria. These were considered but did not substantially improve sensitivity or specificity. For example, renal crisis is a strong indicator of SSc, but its low frequency of occurrence makes it less useful for the purpose of classification.20 The committee considered a non–point-based additive system,8 such as the ACR systemic lupus erythematosus criteria26 or the 1980 ACR SSc criteria. We concluded, however, that assigning weights yielded superior results for SSc classification. Indeed, the weights were simplified to single-digit numbers to make the system easy to use even in the absence of a computing device. Similar weighted systems have been used for other rheumatic diseases.27 The committee also decided not to include ‘probable’ or ‘possible’ SSc in the classification.
Examples of profiles not captured by the 1980 ACR criteria that fulfilled the new classification criteria are combinations of skin thickening of the whole finger, SSC-related autoantibodies, and pulmonary arterial hypertension and/or Raynaud's phenomenon. A patient with Raynaud's phenomenon, autoantibodies, and abnormal nailfold capillaries is not classified as having SSc, although such a patient may develop SSc over subsequent years.6 ,15
Patients may have disease manifestations similar to those of SSc that are better explained by another well-defined disorder, such as nephrogenic sclerosing fibrosis, generalised morphea, eosinophilic fasciitis, scleredema diabeticorum, scleromyxedema, porphyria, lichen sclerosis, graft-versus-host disease, or diabetic cheiroarthropathy. We decided it was not necessary to develop criteria that differentiated SSc specifically from these conditions. Patients with some of these diseases were included in the validation cohort of patients with SSc-like disorders, and it is possible that specificity may have been slightly higher had they been excluded.
In developing the revised SSc classification criteria, we followed the recommendations and guidelines of the ACR and EULAR, which included (1) collaboration between clinical experts and clinical epidemiologists in criteria development, (2) evaluation of the psychometric properties of each candidate criterion, and (3) description of the test sample (origin of the patients and control subjects)7 ,28 Ideally, phases of criteria development should have a balance between expert opinion and data-driven methods; yet, there should be avoidance of circularity of reasoning (a bias that can occur when the same experts developing the criteria are the ones contributing cases and comparison patients).29 We included different experts at different steps in the development of the SSc criteria, to avoid circularity.
Testing and validating a classification system for SSc is difficult because there is no gold standard for defining a particular case as SSc; that is, there is no incontrovertible test or criterion. We relied on expert opinion for our gold standard, which is similar to the process used in the development of other criteria.8
In the absence of a gold standard, we developed and tested the proposed classification system against two standards of expert opinion: (1) the opinion of the clinician who selected cases for the North American and European derivation and validation cohorts, and (2) the combined opinion of a group of clinical experts in SSc. Both standards have strengths and weaknesses. Each individual clinician who selected cases had access to information that could have included aspects not captured by the forms, which were restricted to 23 particular manifestations. Data were obtained from several sites in Europe and North America, so this should improve generalisability and reduce selection bias. However, it is possible that other expert clinicians may have had a different opinion about particular cases. The consensus opinion of a group of experts who had the opportunity to discuss controversial cases strengthens the combined expert opinion. However, the group may not have been aware of some relevant information not included in the available data. It is also difficult for a group of experts to consider hundreds of cases in depth; however, this was managed by having the group consider in depth only those cases, or combinations of items, that appeared to be controversial. In this way, the expert group was able to form a consensus over the whole range of cases in the database. A key strength of the present work is the use of both standards for testing and validation of the proposed system.
The approach we used has other strengths and limitations as well. The methodology was state of the art, with validation by data and by expert opinion at every step. Various methods used in the development process have already been described.19 ,20 The criteria have face validity, because the items are routinely assessed in daily clinical practice and also were included in other important SSc classification criteria sets. The criteria allow for new developments in for example, autoantibody testing availability and/or new identification of scleroderma-associated autoantibodies, or assessment of nailfold capillaries. Formal conjoint analysis to derive the weights associated with items improved the sensitivity and specificity of the items, as was found also in the development of the recent ACR/EULAR criteria for the classification of rheumatoid arthritis.30
The criteria have not been validated in ethnic groups that are not common in North America and Europe. This will require further studies. Regarding clinical use, the number of items and weights may not be easy to remember, but wide availability and (electronic) applications can be developed. The SSc classification criteria steering committee and the expert consultants agreed that the criteria could allow for classification of patients with another rheumatic disease as also having SSc (eg, having both systemic lupus erythematosus and SSc, or rheumatoid arthritis and SSc, etc.). Although this is a possible limitation, it permits individual researchers to decide whether to include subjects who fulfill criteria for more than one rheumatic disease in any particular study.
The ACR/EULAR classification criteria for SSc perform better than 1980 ACR preliminary criteria in terms of both sensitivity and specificity. They are relatively simple to apply to individual subjects. These criteria may be endorsed as inclusion criteria for SSc studies. Validation in other populations is encouraged.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.