OBJECTIVE To develop criteria for disease activity in systemic sclerosis (SSc) that are valid, reliable, and easy to use.
METHODS Investigators from 19 European centres completed a standardised clinical chart for a consecutive number of patients with SSc. Three protocol management members blindly evaluated each chart and assigned a disease activity score on a semiquantitative scale of 0–10. Two of them, in addition, gave a blinded, qualitative evaluation of disease activity (“inactive to moderately active” or “active to very active” disease). Both these evaluations were found to be reliable. A final disease activity score and qualitative evaluation of disease activity were arrived at by consensus for each patient; the former represented the gold standard for subsequent analyses. The correlations between individual items in the chart and this gold standard were then analysed.
RESULTS A total of 290 patients with SSc (117 with diffuse SSc (dSSc) and 173 with limited SSc (lSSc)) were enrolled in the study. The items (including Δ-factors—that is, worsening according to the patient report) that were found to correlate with the gold standard on multiple regression were used to construct three separate 10-point indices of disease activity: (a) Δ-cardiopulmonary (4.0), Δ-skin (3.0), Δ-vascular (2.0), and Δ-articular/muscular (1.0) for patients with dSSc; (b) Δ-skin (2.5), erythrocyte sedimentation rate (ESR) >30 mm/1st h (2.5), Δ-cardiopulmonary (1.5), Δ-vascular (1.0), arthritis (1.0), hypocomplementaemia (1.0), and scleredema (0.5) for lSSc; (c) Δ-cardiopulmonary (2.0), Δ-skin (2.0), ESR >30 mm/1st h (1.5), total skin score >20 (1.0), hypocomplementaemia (1.0), scleredema (0.5), digital necrosis (0.5), Δ-vascular (0.5), arthritis (0.5), Tlco <80% (0.5) for all patients with SSc. The three indexes were validated by the jackknife technique. Finally, receiver operating characteristic curves were constructed in order to define the value of the index with the best discriminant capacity for “active to very active” patients.
CONCLUSIONS Three feasible, reliable, and valid preliminary indices to define disease activity in SSc were constructed.
- systemic sclerosis
- disease activity
- disease status criteria
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Criteria for the connective tissue diseases may serve a variety of functions. Classification and subclassification criteria are used to distinguish between patients with or without a specific disease, and to differentiate subgroups of patients within a disease cluster. Disease status criteria can be separated into damage criteria and activity criteria; damage reflects irreversible lesions either induced by the disease itself or by treatment, whereas activity implies the potential reversibility of the lesions. Prognostic criteria, including severity criteria, are intended to separate subjects with a predicted good or favourable outcome from those with a poor predicted outcome. Finally, outcome criteria are intended to measure the overall impact of a disease.1 ,2
Systemic sclerosis (SSc) is a generalised disorder of the connective tissue characterised by widespread microvascular and vascular lesions and by the increased deposition of matrix components in the skin and internal organs, particularly the gut, lung, heart, and kidney.3-6 Classification and subclassification criteria were developed some time ago for SSc and are currently being used in clinical research to ensure that different centres are studying patients with the same clinical entity and to identify clinically, serologically, and prognostically distinct patients with SSc.7-9 In addition, various severity scores have been proposed,10-12 the most recent one, developed by Medsgeret al,13 has also been validated.
Disease activity, on the other hand, has only been roughly defined for SSc in a small number of studies based on either the clinical evolution during the period immediately preceding enrolment in the study or the extent of the disease, or changes in some of the laboratory parameters of immune-inflammatory activation.14-20 A reliable set of activity criteria has not yet been developed.
Defining disease activity in SSc is an issue of paramount importance, however. Such a definition would allow the clinician to distinguish patients requiring aggressive treatment from those in whom symptomatic treatment may be sufficient.21 Recently, two studies have been undertaken to define activity criteria for SSc. The first is being carried out by Furst and colleagues (personal communication) from the Scleroderma Clinical Trial Consortium, using a study design based on the Delphi technique—that is, a set of criteria drawn up by a team of experts is to be validated in prospective studies. The second is the present study set up by the European Scleroderma Study Group, in which activity criteria have been defined after the extensive evaluation of a large number of patients with SSc from different centres, using as the gold standard the assessment of disease activity blindly made by three experts based on the clinical charts. Here we present the results of the first part of our study—that is, the identification of disease activity variables, the development and the validation of preliminary activity indexes.
Materials and methods
Nineteen centres from 11 European countries agreed to participate in this multicentre prospective one year study. Investigators were asked to enrol a consecutive number of patients with SSc, all of whom satisfied American College of Rheumatology (ACR) criteria for the classification of SSc,7 and to fill out for each of them a standardised clinical chart in which epidemiological, clinical, laboratory, and other diagnostic data were to be recorded. Each participant was provided with guidelines in which the aim of the study, the protocol, and the symptoms, signs, and test results were carefully defined according to the criteria provided by the American Rheumatism Association (ARA, now the ACR).22 ,23
The chart consisted of four sections: section I for demographic and patient history data and sections II, III, and IV for data gathered at the time of enrolment, after six months, and after 12 months, respectively (table 1).
Section II was divided into 13 subsections containing 88 items (46 clinical, 31 laboratory, and 11 other diagnostic items). In addition, it included 11 Δ-factors designed to measure any change, as globally evaluated by the patient, in comparison with one month before enrolment, in the following SSc manifestations: generalised complaints (Δ–gen); articular/muscular (Δ-JM); cutaneous (Δ-skin); ocular (Δ-eye); cardiopulmonary (Δ-HL); vascular (Δ-vasc); gastrointestinal (Δ-gut); haematological (Δ-haem); renal (Δ-kid); neuropsychiatry manifestations (Δ-neur), and laboratory investigations (Δ-lab). Sections III and IV were analogous to section II, and were to be completed six and 12 months after the patient was enrolled, respectively. The Δ-factors in these two sections measured any change from the previous observation.
To develop the index, a gold standard for disease activity was established. Three members of the protocol management team examined sections I and II of the clinical charts under blinded conditions (that is, without any knowledge of the provenance of the charts or the drug regimens prescribed) and evaluated the disease activity for each patient on a semiquantitative scale (0 = no activity to 10 = maximal activity). The disease activity scores assigned by the three members were found to be the same in 11 patients, similar (that is, ±1) in 122 patients, and slightly different (that is, ±2) in 100 patients; thus there was no substantial difference between the three disease activity scores in 233/290 (80%) of the cases. The reliability of this scoring system was assessed by evaluating the intraclass correlation coefficient (ICC), which was found to be 0.684 (p<0.0001). The three evaluators then re-examined the clinical charts together in order to reach a consensus on the disease activity scores. These consensus scores were subsequently used as the “gold standard” to determine which chart items were most highly correlated with disease activity.2 ,25
In addition, two members of the protocol management team examined sections I and II of the clinical charts and evaluated disease activity on a qualitative scale for each patient (that is, “inactive to moderately active” or “active to very active” disease). This evaluation was designed to separate those patients considered to require only symptomatic treatment from those needing more aggressive measures. The reliability of this evaluation was assessed by Cohen's k coefficient: in 224/290 (77%) cases there was complete concordance between the two evaluators (Cohen's k=0.498). Therefore, this assessment may also be considered a reliable measure of disease activity. It may be noted that an element of systematic bias was found in this evaluation, as in most of the discordant cases member 1 had assigned the patient to the higher activity category and member 2 to the lower. Table 2 shows the results of their assessment. Therefore, the actual agreement may be considered higher than that demonstrated by the k evaluation.26 The two members then jointly re-examined all the charts and reached a consensus about the qualitative assessment of disease activity in each patient.
The data recorded in sections I and II of the clinical charts were stored in a database, and a statistical program (Statview) was used for the subsequent analysis.
Univariate analysis was performed to select the single items (signs, symptoms, laboratory and other diagnostic tests, Δ-factors) that were significantly associated with the consensus disease activity score. Multiple linear regression analyses were carried out to evaluate the combined performance of different sets of criteria in predicting the consensus disease activity score and to define the relative weight of each variable in terms of regression coefficients for the multivariate models.
In our initial study design we had planned to identify a set of activity criteria based on the first 60% of cases enrolled, and to validate these criteria in the following 40%. However, the high number of missing values for some items and the different methods used for the detection of other parameters (see part I of the study) forced us to modify our plans by ruling out from the analysis all the items with either of these aspects and adopting the jackknife statistical procedure for the validation process.27
Receiver operating characteristic curves (ROCs) were finally constructed to assess the efficiency of the resulting index in distinguishing patients with “inactive to moderately active” disease from those with “active to very active” disease, as defined by the consensus qualitative evaluation.
A total of 290 patients with SSc (244 female, 46 male; age 8–87 years, mean age 53) were enrolled in the study. The epidemiological and clinical features of this series are reported in part I of the study. Here, we briefly summarise the main findings.
As a result of the selection process, all the 290 patients satisfied the preliminary ACR criteria for the classification of SSc7; specifically, 165 satisfied the major criterion (scleroderma proximal to metacarpophalangeal joint), while 125 met at least two of the three minor criteria (pitting scars, sclerodactyly, or bibasilar lung fibrosis). One hundred and seventy three had the limited form (149 female, 24 male, age 21–87 years, mean age 55) and 117 the diffuse form (95 female, 22 male; age 8–86 years, mean age 49) of the disease, according to the criteria of Le Roy et al. 9
Table 3 shows the results of univariate analysis of the chart items that correlated with our gold standard for disease activity (the consensus disease activity scores). We ruled out from the analysis the items with a significant number of missing values (for example, pulmonary hypertension) and those that had been investigated by different methods (for example, lung interstitial involvement byx ray or high resolution computed tomography (HRCT)). Through multiple linear regression analysis we were then able to define three different sets of items that correlated with disease activity in all patients with SSc, in patients with dSSc, and in patients with lSSc, respectively (table 4). A satisfactory regression coefficient was obtained between each set of items and the gold standard in each patient group.
Our findings (summarised in table 5) allowed us to construct three weighted disease activity indices: one for the entire SSc population, one for the dSSc subset, and one for the lSSc subset. The weight of a given item was assigned on the basis of the regression coefficient (b) and adjusted so that the maximum value for each index was 10. The disease activity for a given patient could therefore be calculated by summing the weights of the criteria fulfilled.
We validated our criteria sets using the jackknife statistical approach. Thus we calculated in multiple linear regression analysis the regression coefficient between the consensus disease activity score and the disease activity items and that by the calculated index and the same items by leaving out one patient at a time. As the whole series is concerned, the regression coefficient between the score and the items (n=152) ranged from 0.831 to 0.846 (mean 0.837; SE 0.0002; confidence interval (CI) 0.836 to 0.837) and that between the index and the items ranged from 1.0 to 1.0. For lSSc, the regression coefficient between the score and the items (n=91) ranged from 0.763 to 0.787 (mean 0.778; SE 0.0004; CI 0.777 to 0.779) and that between the index and the items ranged from 1.0 to 1.0. Finally, for dSSc, the regression coefficient between the score and the items (n=74) ranged from 0.751 to 0.789 (mean 0.768; SE 0.0006; CI 0.766 to 0.769) and that between the index and the items from 1.0 to 1.0. In conclusions, all the three indexes were validated.
Predictably, a significant correlation was found between the activity index score calculated for each patient (by summing the weights of the criteria fulfilled) and the consensus disease activity score (previously assigned by the protocol management team) using Pearson's correlation coefficient (r=0.763, p<0.0001 for dSSc; r=0.763, p<0.0001 for lSSc; andr=0.830, p<0.0001 for the whole series). An activity score, however, constitutes a comparative ranking rather than an absolute number. It is therefore more appropriate to analyse the validity of this score on an ordinal scale and to test its efficacy by Spearman's rank correlation coefficient. This analysis also showed significant correlations between the index score and the consensus score (r s=0.760, p<0.0001 for dSSc; r s=0.787, p<0.0001 for lSSc; r s=0.835, p=0.0001 for the whole series); that is, the higher the index score, the higher the consensus score. These correlations were also validated by the jackknife technique. For the whole series, the correlation coefficient (r s) between the score and the index ranged from 0.832 to 0.844 (mean 0.835; SE 0.0002; CI 0.835 to 0.835). For lSSc, r s ranged from 0.752 to 0.804 (mean 0.787; SE 0.0006; CI 0.786 to 0.788). Finally, for dSSc, r s ranged from 0.749 to 0.791 (mean 0.760; SE 0.0008; CI 0.758 to 0.761).
Finally, we compared the disease activity index scores with the qualitative assessments of disease activity (“inactive to moderately active” or “active to very active”) made by the two protocol management team members. Figure 1 (A–C) shows the ROC curves for the entire SSc series, and for the dSSc and lSSc subgroups, respectively. For all three conditions an index of three showed a quite high specificity and a fairly good sensitivity.
We have constructed three preliminary sets of criteria to calculate disease activity in patients with SSc as a whole, in patients with dSSc, and in patients with lSSc, respectively. Defining disease activity in SSc is much more difficult than it is for other rheumatic and connective tissue diseases, such as systemic lupus erythematosus and rheumatoid arthritis, in which inflammation has a key role and flares and quiescent phases can be easily recognised.28-31 Patients with SSc, particularly those with the limited form, do not present such a clear picture. Nevertheless, from a pathophysiological point of view, two distinct stages can be defined in SSc: firstly, a potentially reversible stage in which activated cells directly or indirectly activate or damage endothelial cells and stimulate fibroblasts to overexpress genes encoding the extracellular matrix components; and secondly, a definitely irreversible stage in which vascular occlusion and interstitial fibrosis occur.32-34 The first stage reflects activity and the second, damage. The first stage is not marked by clear episodes, however, and the two stages are not mutually exclusive, whether the disease affects a single organ or different organs in the same patient. Finally, the two stages are difficult to distinguish, especially in those patients with longlasting, indolent SSc in its limited form.
The European Scleroderma Study Group did its best to overcome the problem of the unclear symptomatology of SSc in its study design. The standardised clinical chart drawn up by the group contained all of the symptoms, laboratory and other diagnostic parameters most widely used by clinicians treating this disease (for a total of 88 items), carefully defined according to the most authoritative sources available. A gold standard was then set by assessing disease activity in the patient series both semiquantitatively and qualitatively. Both these measures of disease activity were analysed and found to be reliable. Indeed, the disease activity scores calculated by the three protocol management team members were found to be significantly correlated with one another (ICC=0.684; p<0.0001). A good correlation was also found between the qualitative evaluations assigned to the patients by the two protocol management members. It should be noted that the level of agreement for the qualitative evaluation could actually be considered greater than that demonstrated by the k value, because we discovered a systematic bias (one member consistently gave a higher activity evaluation than the other) (table 2).26
The activity indexes developed by us have been derived from the charts of 290 patients with SSc recruited by 11 European centres. Because the prevalence of some disease aspects in the series from various centres showed a high variation (see part I of the study), it might be suggested that the variability can be explained by observer error and, consequently, that these aspects are not reliable. However, it should be emphasised that the observer error, if any, would probably have been random, causing significant misclassification and making it difficult to detect any association. The fact that the measures are useful in predicting the consensus activity score is therefore unlikely if there had been observer error. In addition, we must emphasise that all the participants in the study were experienced clinical investigators who had been provided with clear cut guidelines. Therefore, we believe that the variability in the prevalence of some items among various series must be ascribed to a different pattern of attendance at each centre as shown by differences in age, sex, and subset distribution.
It should be noted that all the items in the three indexes were searched for in a high percentage of the 290 patients investigated. The percentage of values missing ranged from 0 (for total skin score, scleredema, digital necrosis, and arthritis) to 6.2% (for Δ-vasc) and exceeded 15% only for hypocomplementaemia (19.7%). Because each of the indexes is made up of at least four items, we do not believe that the presence of one item with a slightly higher than acceptable percentage of missing values would seriously affect the results.
Of the items found to be related to disease activity, hypocomplementaemia is not commonly thought of as a laboratory parameter characteristic of SSc. However, both Seiboldet al 35 and Benbassatet al 36 found hypocomplementaemia in 12% and 22.5% respectively—that is, in percentages not different from that detected by us (14%).
For the Δ-factors, we chose to rely on patients' self reporting at enrolment and on both patients' and doctors' global assessment at six and 12 months' evaluation. Of the three sets of criteria, that for dSSc is based only on Δ-factors, those for the whole series and lSSc both contain three Δ-factors. Such an approach might be questioned because of lack of standardisation. However, it has been accepted as valid and important in assessing disease activity in rheumatoid arthritis.31 ,37
We have shown that the criteria identified in our study are correlated with disease activity. Significant correlations were found using both the Pearson and Spearman correlation coefficients between the disease activity indexes and the consensus activity score. Moreover, the construct validity of our indices was tested by jackknife statistical analysis29 but remains to be confirmed on separate groups of patients. ROC curves, constructed by plotting the value of the index score against the qualitative assessment of disease activity, were quite satisfactory. An index of three was found to identify, with a quite high specificity and fairly good sensitivity, those patients with active to very active disease in both disease subgroups and in the group of patients with SSc as a whole. Actually, an index of three would define active to very active disease with sensitivity ranging from 62 to 81% and specificity ranging from 86 to 93%. It will be important, therefore, to identify other parameters correlated with disease activity to improve the somewhat low sensitivity of the present, preliminary indices. At the beginning of the study we asked all participants to store aliquots of serum and plasma for each patient. In the next phase of this study we will analyse various laboratory parameters (such as circulating activation markers) in greater detail in order to improve the sensitivity of our indices.
It should also be noted that our disease activity indices may not be entirely comprehensive because no patients with SSc with renal crisis were enrolled. Defining activity criteria in such patients would appear to be straightforward, but still remains to be done on the basis of prospective studies. In addition, discriminant validity (that is, sensivity to change) still needs to be tested.
At its current stage our study has certain limitations. Our activity indices were constructed on the basis of the correlations of a series of diagnostic parameters with the disease activity scores arrived at by three protocol team members (the “gold standard”). The activity scores subsequently calculated using the indices were then compared with the gold standard disease activity scores. Therefore our study design has an inherent element of circularity. In addition, although this study was based on the analysis of data from a large number of actual patients with SSc, we cannot claim to have confirmed and validated an already existing consensus about the criteria for disease activity. Our preliminary indices represent a tool which may be used for further research in order to reach a consensus. In this sense, our study has analogies with the continuing study by Furstet al using the Delphi technique.
Activity criteria for a given disease must be valid, reliable, and easily measurable in a typical clinical setting. The indices presented in this paper were developed using two reliable measures of disease activity as the gold standard, and all the criteria included are reasonably easy for any clinician specialising in connective tissue diseases to determine during a routine evaluation of their patients. Moreover, because our activity criteria are based on data from real patients, they reflect everyday clinical practice and may be universally applied. Their use would also facilitate the gathering of comparable data in studies conducted by different groups. The fact that some examinations which specifically measure the extent of internal organ involvement (HRCT, echocardiography, upper gastrointestinal series) were not included among the criteria might slightly lessen the accuracy of the indexes, but in compensation make it possible for the indexes to be used in any clinical setting.
In conclusion, we have developed three preliminary sets of disease activity criteria, one for patients with SSc as a whole, one for those with limited SSc, and one for those with diffuse SSc. These indices appear to be simple and reliable; we are currently carrying out further analyses to confirm their construct validity and assess their discriminant validity (that is, sensitivity to change).
European Scleroderma Study Group (EScSG)
G Valentini, S D'Angelo, A De Luca, E Tirri (Second University of Naples, Italy); S Bombardieri, A Della Rossa, W Bencivelli, C Ferri (University of Pisa, Italy); AJ Silman (University of Manchester, UK); M Cagnoni, M Matucci Cerinic, S Generini (University of Florence, Italy); JF Belch (Ninewells Hospital and Medical School, Dundee, UK); CM Black (Royal Free Academy Hospital, London, UK); P Bruhlmann, S Enderlin (University Hospital, Zurich, Switzerland); L Czirják (University of Pecs, Hungary); AA Drosos (University of Ioannina, Greece); G Danieli, A Gabrielli, P Sambo (University of Ancona, Italy); G Tonietti, R Giacomelli, P Cipriani (University of L'Aquila, Italy); O Meyer, G Hayem (University of Paris VII, Paris, France); M Inanc (University of Istanbul, Turkey); NJ McHugh (Royal National Hospital for Rheumatic Disease, Bath, UK); H Nielsen (Rheumatology Unit, Hervel Hospital, Hervel, Denmark); S Todesco, F Cozzi, M Rosada (University of Padova, Italy); R Scorza, S Bazzi, M Carroni (University of Milan, Italy); A Sysa (University of Lodz, Poland); J Stork, R Becvar (Charles University, Prague, Czech Republic); FHJ van den Hoogen (University Hospital, Nijmegen, The Netherlands); PG Vlachoyiannopoulos (National University of Athens, Greece).