Objectives: To develop a new index for disease activity in ankylosing spondylitis (ASDAS) that is truthful, discriminative and feasible, and includes domains/items that are considered relevant by patients and doctors.
Methods: Eleven candidate variables covering six domains of disease activity, selected by ASAS experts in a Delphi exercise, were tested in a three-step approach, similar to the methodology used for the disease activity score in rheumatoid arthritis. Data on 708 patients included in ISSAS (International Study on Starting tumour necrosis factor blocking agents in Ankylosing Spondylitis) were used. Cross validation was carried out in the OASIS cohort (Outcome in Ankylosing Spondylitis International Study).
Results: Principal component analysis disclosed three factors with eigenvalues >0.75: patient assessments, peripheral joint assessments and acute phase reactants. Discriminant function analysis resulted in a correct classification in ∼72% of the cases (prior probability ∼50%). Regression analysis resulted in an index with five variables (total back pain, patient global assessment, duration of morning stiffness, C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR)). Three additional candidate indices were designed using similar methodology while omitting either ESR or CRP or patient global assessment. All four scores correlated with the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI; r = 0.67–0.80), patient (0.58–0.75) and physician’s global assessment (0.41–0.48) of disease activity. All four candidate ASDAS indices performed better than BASDAI or single-item variables in discriminating between high and low disease activity state, according to doctors as well as patients in the OASIS cohort.
Conclusion: The first steps in the development of a new assessment tool of disease activity in AS derived four candidate indices with good face and construct validity, and high discriminant capacity.
Statistics from Altmetric.com
Ankylosing spondylitis (AS) is a chronic inflammatory arthritis primarily affecting the axial skeleton, with a characteristic involvement of the spine and sacroiliac joints. Pain, stiffness due to inflammation and loss of physical function are hallmarks of the disease. Inflammation not only affects the spine but may also affect peripheral joints and entheses, heart, lungs, large bowel and eyes. The Assessment of SpondyloArthritis international Society (ASAS) has defined a core set of domains and instruments that covers the most important aspects of disease assessment in AS. Since the concept of disease activity encompasses such a wide range of measures or concepts, many experts in the field think that we do not have an instrument that appropriately reflects the status of disease activity in AS. Currently used single-variable parameters (eg, pain, stiffness, erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), patient global assessment) or constructs/indices (eg, Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)1) do not satisfy because they cover only part of disease activity, lack face and construct validity, are “too lenient”, are not sensitive to change, or are either fully patient or doctor oriented.
Only a disease activity index (score) can capture multiple important aspects of disease activity. Indices can be entirely expert based, including domains that have a high level of face validity. The BASDAI is an example of such an expert (including patients) based index, including six questions referring to fatigue, back pain, peripheral joint pain and swelling, enthesitis and severity and duration of morning stiffness. Such indices are widely accepted by clinicians, are easy to understand, but may not perform efficiently owing to variable redundancy (the phenomenon that separate variables cover the same aspect of the disease (high correlation)). Moreover, the various instruments are simply summed without taking the relative importance and dependency into account. Indices can also be statistically derived. The statistical process underlying the development of such indices assures an optimal collection of items, including item weight if necessary, but complexity and lack of face validity may jeopardise the implementation in clinical practice. The disease activity score (DAS28) used in rheumatoid arthritis (RA) is a good example of an appropriate index, because it has shown to perform well in clinical research and it has been implemented and accepted in clinical practice even though the DAS algorithm is rather complex.2 In general, and referring to the Outcome Measures in Rheumatology Clinical Trials (OMERACT) initiative, such indices should be truthful, discriminative and feasible.3
Here we present the development of a new disease activity score for patients with AS, making use of variables reflecting domains of disease activity that are considered important in the opinion of experts in the field of AS. A three-step statistical procedure is used to aggregate a weighted index that discriminates better than single item variables or existing indices between low and high disease activity.
Selection of items depicting disease activity in AS
To select relevant items that would thereafter be tested to derive the new disease activity score, a Delphi exercise was conducted in December 2005 to collect opinions of experts in the field of AS. Invitation to participate, including a link to a secure website hosting the survey, was sent by email to 85 ASAS members, including a number of patients, selected on the basis of their active interest in clinical research and care of patients with AS. After reading an introduction presenting the aim of the exercise and the procedure, experts were asked whether they considered the proposed disease domains and items relevant for assessing disease activity in a patient with AS. Invitations to the second and third rounds were sent only to experts who had completed the first round of the survey. Ten domains (pain, inflammation, acute phase reactants, global assessment, peripheral signs, fatigue, function, quality of life, plain radiography and spinal mobility) were tested in this exercise, each of them including one to six items (the total number of items was 29). Domains and items were selected if more than 80% of the responders thought it should be included in the subsequent analysis, and rejected if less than 20% considered it relevant. Questions with an intermediate level of agreement were proposed again in the next round. After the last round, all items with an agreement of at least 50% were considered selected. To increase consensus, aggregated results of the participants to the Delphi exercise in the former round(s) were presented to the expert before answering the second and third round.
Development of a disease activity score: principles
The methodology that was used for the development of the DAS in RA was followed.2 Based on a three-step statistical approach, this procedure aims at obtaining a limited set of single-variable parameters, optimally chosen and weighted, with satisfactory discriminatory ability (the ability to discriminate patients with low versus high disease activity).
Patients and data
The items selected in the Delphi exercise were further tested in the ISSAS (International Study on Starting tumour necrosis factor (TNF) blocking agents in Ankylosing Spondylitis) database.4 The ISSAS study has been described elsewhere in detail. In brief, this database includes demographic, clinical, metrological and laboratory data (collected by a research nurse or a doctor independently of the decision by the rheumatologist to start a TNF blocking agent) of more than 1200 patients from 10 countries world wide who were judged by a rheumatologist for their theoretical need to start TNF blocking therapy (“yes” or “no”). Of these, only the 731 patients with complete data in all Delphi-selected variables were further used in the statistical process since the chosen methodology does not allow for missing data. These 731 patients did not differ from the patients in whom at least one variable was missing with respect to age, sex and disease duration, or with respect to all disease activity variables that were tested (results not shown).
The underlying assumption for the current analysis was that a patient considered to be a candidate for treatment with anti-TNF had a sufficiently high level of disease activity. Each of the 145 involved rheumatologists agreed to include the first 10 consecutive outpatients with a diagnosis of AS, in order to preclude any selection bias. Overall, 49% of the included patients were considered to be candidates for a TNF blocking agent, and 51% were not. All ASAS core set measures of disease activity and severity (including patients’ self-assessments), joint counts, tender entheses count and acute-phase reactants scored, on average, higher in the anti-TNF candidates group. All variables selected in the Delphi exercise were available in ISSAS.
First, all variables were investigated for their suitability for parametric statistical analysis. Transformation of non-normally distributed variables was performed, using square root or logarithmic transformation in order to best fit a Gaussian distribution. Second, all variables were investigated with respect to covering the entire measurement range by comparing distributions, ranges, minimum and maximum values. All variables showed an appropriate representation of the entire scale range.
Data reduction: principal component analysis (PCA)
Different measures on the same patient may be highly correlated and may actually represent the same underlying construct (redundancy). Factor analysis examines interrelations among the variables, in order to distinguish factors reflecting the same construct. To identify sets of correlated variables, a principal components analysis (PCA) was performed on the selected variables. Varimax rotation was used to maximise the level of variance explained by each factor, and only factors with an eigenvalue >1 were used further (the eigenvalue reflects the variance accounted for by a factor). The factor loadings, a per-patient expression reflecting the values of the correlation variable, were saved for use in further analysis. The internal consistency of the resulting factors was evaluated by calculating the partial correlation between the item and the rotated factor and illustrates to what extent different variables measure the same underlying construct in each factor.
Discriminant function analysis (DFA)
To investigate the contribution (weight) and the optimal aggregation of the elicited factors in discriminating between high and low disease activity, DFA was performed using the factor loadings. The per-patient individual discriminant score (IDS) (a linear combination of all included factors) was saved for use in further analysis.
Linear regression analysis
Because the discriminant function with factors does not directly illustrate which instruments are most contributory, linear regression analysis with stepwise forward selection (of all variables selected by the experts) was performed with the IDS as the dependent variable. Only those variables selected by the stepwise procedure that together explained more than 95% of variation in the IDS were reported and used in the final constructed score. Weighting of each of these variables was obtained by taking the regression coefficient in a final linear model for that variable. This latter model was obtained by entering only the previously selected variables.
Validation of the candidate indices
Cross validation of the candidate indices was applied in the independent OASIS cohort (Outcome in Ankylosing Spondylitis International Study; a continuing long-term international observational study of patients with AS5): to test concurrent validity of the indices, correlations (r value) of the four candidate indices with the most relevant variables were calculated in ISSAS and OASIS databases. The discriminatory ability of the indices was compared using the approach of standardised mean difference between subgroups of patients with high versus low disease activity:6 a standardised mean difference quantifies the number of standard deviations by which the two groups differ, and allows a comparison of instruments that use different scales. In ISSAS the rheumatologists' judgment that a patient required a TNF blocking drug was used as an external construct for high disease activity. In OASIS, a patients and a physiciańs global assessment of disease activity of at least 6 on a 10 cm visual analogue scale was used as an external construct for high disease activity. To provide contrast, a visual analogue scale score of ⩽4 was considered low disease activity, and patients with values between 4 and 6 were omitted.
Sixty of the 85 solicited experts completed the first round of the survey, and 55 and 48 (of the 60 invited) completed the second and third rounds, respectively. After three rounds, 12 items covering seven domains were selected to be included in further analysis (table 1). After formal discussion during a meeting with ASAS members, the Bath Ankylosing Spondylitis Functional Index (BASFI) was excluded from further analysis. The prevailing reason for exclusion was that physical function is a reflection of both disease activity and damage and should not be included in an instrument which measures disease activity.
The PCA identified three factors with an eigenvalue >1, cumulatively explaining about two-thirds of the total variance (table 2). These factors reflected “patient-reported outcomes” (factor 1), “peripheral activity” (factor 2) and “laboratory” (factor 3). As presented in table 2, these three underlying constructs were clearly discernable, with for example a high correlation between factor 3 (laboratory) and the values for CRP and ESR (both >0.85), while all remaining items correlated weakly at best (all <0.20).
Discriminant function analysis
The factor loadings of the three derived factors were used as independent variables in the DFAs. All DFAs resulted in correct classification of ∼72% of the cases (high versus low disease activity compared with the predictive group membership as given by the discriminant model, while the prior probability was 50.6% in these 708 analysed patients.
The regression analysis with individual variables on the discriminant function scores identified the optimal composition of variables and weights, with an optimal number of five variables per index (tables 3–5). The best five-variable option included the patient’s assessment of back pain (BASDAI question 2), the patient’s global assessment of disease activity (Patient global) (Numerical Rating Scale), the duration of morning stiffness (BASDAI question 6), the CRP and the ESR.
Alternative candidate indices
The formerly described three-step process was performed four times, first with all selected variables included, and then three times with a set of variables lacking either CRP or ESR or patient global assessment of disease activity. In order to use a consistent methodology for the four developed scores, only the main three factors obtained in the PCA were used further, even though the cut-off point for the eigenvalue initially chosen (>1) was not always met. These additional analyses were done to meet criticism about feasibility (CRP and ESR in one index) and about the duplicity of an overall patient global assessment in combination with the other patient-reported items (patient global). Excluding CRP or ESR or “Patient global” consecutively resulted in three additional candidate indices, occasionally with different components: for instance, ASDAS B and C included an assessment of the involvement of peripheral joints (BASDAI question 3), while score D included the assessment of fatigue (BASDAI question 1) (tables 3–5).
For each of the four candidate scores, these five variables and the correlations with items in ISSAS and OASIS databases are shown in tables 3 and 4. Except for the swollen joint count, all aspects of AS disease activity were reflected by all four indices. Of note, all four indices showed high correlations (r>0.60) with patient global assessment of disease activity (physician’s global assessment was not recorded in ISSAS), and with both patient’s and physician’s global assessments in the independent OASIS database, while the correlation between patient’s and physician’s global assessment was only weak (r<0.35 in OASIS).
Discriminatory ability of the four candidate indices was compared with that of the BASDAI and of other variables (tables 6–8), in patients from the ISSAS and the OASIS databases:
Both in the ISSAS database and in the OASIS database (patient global >6) the candidate indices consistently showed better discriminatory ability—that is, higher standardised differences, as compared with single-variable items such as acute phase reactants or patient’s assessments, but also with the BASDAI. These higher values indicate a better ability of the developed indices to distinguish between patients with varying levels of disease activity, and consequently an expected increased ability in demonstrating contrast between patients with different levels of disease activity. Comparison of this discriminatory ability in patients from the OASIS database was more difficult, since only a small proportion of patients had a high level of disease activity (defined as a value >6 on a 10 cm visual analogue scale). In the latter situation, standardised mean differences are spuriously biased towards higher values whenever a patient shows outlying values in an item (for example, an extremely high ESR of 74 mm/1st h was measured in one of these six patients, and four out of the six patients had an ESR >40 mm/1st h, which explains why ESR was found to have an especially high discriminatory ability in these circumstances). Therefore, we conducted an additional comparison with a cut-off level for of high global disease activity at 4 in order to obtain a better balance in patient number per subgroup (table 9). All candidate scores showed a better discriminatory ability than the separate variables, thus confirming the original subgroup analysis.
This work by ASAS described the development of a new disease activity index in AS (the Ankylosing Spondylitis Disease Activity Score (ASDAS)) which performs well methodologically and has high face validity in clinical practice and research. To meet these aims, two approaches were combined: first, the items considered to be of most relevance were consensually selected by experts in the field, in order to obtain a high face validity. Second, the three-step process underlying the index design which was successfully applied in RA resulting in the widely used DAS assured an optimal methodological weighing of the most contributory variables. Validity and discriminatory ability of the derived scores could be confirmed in an independent dataset.7
Although the new scores were based on items entirely derived from the experts’ perspectives (Delphi exercise was answered by doctors only), all new indices correlated well both with doctor and patient perceptions of disease activity, in both cohorts tested. This observation confirms that symptoms related to AS (which are major determinants of the judgment about disease activity by the patient8) and assessments made by the doctor, are not necessarily reflecting the same construct, and that both perspectives should be included in a new index, without an obvious predominance of any construct (which is a commonly recognised weakness of the BASDAI).
Further evaluation of the performance of the four draft indices may help in choosing the most appropriate score. For example, indices A and D (which require measurements of both CRP and ESR) may be considered unfeasible, since ESR and CRP are rarely both collected in clinical practice. Exercises like this, however, may raise awareness of the fact that ESR and CRP, while considered as interchangeable acute phase reactants, may at least in part reflect different processes. Correlation between items is only approximately 0.5, as recognised here and in previous studies.8 9 Differences in variability across the measures as well as the rapidity of change may explain this rather low correlation.
Sensitivity to change as well as truth aspects of the draft indices need to be further evaluated. For example, the deliberate exclusion of spinal mobility assessments from the process at an early stage (in the Delphi exercise) avoids the potential entangling of reversible (inflammation) and irreversible (spinal damage) components in an index that supposedly reflects disease activity, but may raise concern in those who consider impairment of spinal mobility as part of disease activity.9–11
With regard to the inclusion of a measure of “peripheral” disease activity in the indices, it is remarkable that only two of the indices (scores B and C) include such an item (patient peripheral pain/swelling (BASDAI question 3)), while neither swollen nor tender joint count was retained by the statistical process. This absence is probably due to the infrequent involvement of peripheral joints in AS (only 20% of the patients in OASIS and 30% of the patients in ISSAS had at least one swollen peripheral joint), and to the fact that other variables associated with peripheral activity already capture the information (mean CRP, ESR and patient global assessment were all higher in patients with peripheral disease activity, in both cohorts of patients (data not shown)).
Another challenge will be to try to draw a parallel between the draft indices and what is considered the “real” level of disease activity of AS in an actual patient (truth of the instrument). However, there is not an appropriate “gold standard” for disease activity, and unlike the situation in RA in which disease activity predicts radiographic progression,12 13 the predictive relationship between disease activity and radiographic progression in AS is unclear. Recent publications failed to show any effect of the TNF blocking drugs etanercept and infliximab on the progression of syndesmophyte formation and growth, while these drugs suppress disease activity beyond any doubt, regardless of how disease activity was measured.14 15 So it seems as if there is no external construct against which the predictive validity of a disease activity index can be established in AS.
The final choice for one favoured index among the four that were developed should be made after additional examination of their respective performances in other available or new prospective cohorts of patients.
Competing interests: None.
See Editorial, p 1
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.