Article Text

Download PDFPDF

Outcomes in ankylosing spondylitis: what makes the assessment of treatment effects in ankylosing spondylitis different?
  1. M M Ward
  1. Correspondence to:
    Dr M Ward
    Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Building 10 CRC, Room 4-1339, 10 Center Drive, MSC 1468, Bethesda, MD 20892; wardm1{at}


There are four major challenges in the assessment of outcomes in patients with ankylosing spondylitis (AS) that are particularly relevant to the evaluation of new therapies. Firstly, measures of symptoms and impairment in AS are not specific for inflammatory processes, they also capture mechanical symptoms and fixed limitations. The non-specific nature of these measures may cause them to be less responsive and therefore less useful in determining treatment efficacy. Secondly, acute phase reactants have limited value as measures of AS activity and other surrogate markers have not yet been established. Thirdly, the assessment of the disease modifying potential of new therapies is hampered by the slow rate of spinal fusion. Fourthly, work disability has not be studied as an endpoint in clinical trials in AS, despite the fact that work disability is an important outcome in patients with AS. Research into ways to overcome these challenges in outcome measurement will help identify useful therapies and define the range of outcomes that they influence.

  • ankylosing spondylitis
  • work disability
  • spinal disease
  • surrogate marker

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Health outcomes include all the consequences of an illness on a person’s life. These include impairments (alternations in normal body structure or function), measures of health status (symptoms and functioning), quality of life, costs of illness, and mortality1,2 (table 1). Measures of impairments and health status are the outcomes most often used as endpoints in clinical trials. In ankylosing spondylitis (AS), specific measures of impairment and health status have been recommended as a core set to be included as endpoints in clinical trials3,4 (table 1).

Table 1

 Categories of health outcomes

The types of outcome measures used in AS are similar to those used for other types of inflammatory arthritis. However, AS presents special challenges in operationalising these outcomes into endpoints for clinical trials. These include:

  • non-specific nature of measures of symptoms and impairments

  • limited ability of common laboratory measures of inflammation to corroborate patient reported health status or clinician observed impairments

  • difficulty of testing disease modification due to the slow progression of spinal fusion

  • diversity of work disability that complicates testing in clinical trials.


Symptoms and signs

AS activity is reflected in patient reported symptom severity, limitations in functioning due to pain, stiffness, muscle spasm, or joint swelling, and signs on physical examination. Signs of AS activity include limited ease of movement due to pain and muscle spasm, local tenderness, and joint or tendon sheath swelling. In the absence of peripheral joint involvement, the assessment of AS activity can be difficult because substantial low back pain and stiffness can be present without notable local tenderness or spasm, and with little limitation in movement. Marked tenderness and spasm on physical examination tend to be specific markers of AS activity, but are likely not sensitive. Adding to the difficulty in using the physical examination to quantify AS activity is the lack of standardised measures of spinal tenderness or spasm, so that even when these signs are present, there is no accepted scheme to grade their severity. Proposals for measures such as the physician rated Spinal Pain Score need further study.5

Spinal and chest wall flexibility, tested using modified Schober’s test, lateral lumbar flexion, chest expansion, neck flexion and rotation, and occiput to wall distance, have been used as surrogate measures of spinal pain and stiffness, on the premise that inflammation and spasm will cause restricted movement. Although limitations in these measures may be due to inflammation, limitations may also be a consequence of spinal or costovertebral fusion. In patients with early AS who do not have spinal fusion, one may be more certain that limitations in these measures represent AS activity. However, among patients with some degree of spinal fusion, the relative contribution of reversible AS activity and irreversible fusion to the limitations may be difficult to resolve. Attribution of limitations in these measures to AS activity should be done only after considering the stage of AS and any recent temporal changes in symptoms that might have occurred. The dual aetiologies of limitations of spinal mobility may explain the reduced sensitivity to change of these measures; the irreversible limitations captured by these measures are not able to change with treatment.6–9 Schober’s test may also be limited by ceiling effects because it only measures forward flexion rather than flexion and extension.10

Given these problems, the assessment of AS activity has come to rely on patient reported measures of symptom severity. The symptoms most commonly assessed are those most often reported as problems by patients: pain, stiffness, and tiredness. The Bath AS Disease Activity Index (BASDAI) aggregates measures of these symptoms in a single score, and has been endorsed by the ASsessment of Ankylosing Spondylitis working group as the currently preferred patient reported measure of symptom severity.11,12 However, recent studies have reported that patients with mechanical low back pain have BASDAI scores similar to those of patients with AS.13,14 These findings suggest that, despite its name, the BASDAI is not specifically measuring inflammatory symptoms in patients with AS, but is likely also measuring mechanical symptoms and stiffness due to spinal fusion. This lack of specificity may also account for the finding that the distribution of BASDAI scores remains relatively stable with the duration of AS, even over 40 years, when mechanical symptoms would be expected to supervene.15 Although patient reported measures are validated measures of symptoms, the attribution of these symptoms solely to disease activity may not always be correct. As was the case for the physical examination, patient reported measures should be interpreted in the context of the duration of AS and the degree of spinal fusion present. These points also highlight the importance of the physician assessment in the evaluation of patients for treatment with tumor necrosis factor (TNF) antagonists, as BASDAI scores alone may provide a false positive assessment of AS activity in patients with longstanding AS.12,16

Laboratory tests

The importance of some of the shortcomings of the physical examination and patient reported measures of AS activity could be lessened if laboratory tests were available that corroborated, in a reliable and valid way, the degree of AS activity that was present. Laboratory tests are most useful as surrogate markers of disease activity when the test:

  • is abnormal in most patients with active AS

  • changes in parallel with other measures of AS activity

  • improves with know effective treatment

  • is not influenced by processes other than AS activity.

Unfortunately, no laboratory tests have been found that satisfy these criteria for the majority of patients with AS.

The erythrocyte sedimentation rate (ESR) and serum C-reactive protein (CRP) concentration have been the laboratory measures studied most often. Elevated levels of either test have been reported in fewer than half of patients with AS, and the degree of elevation in most patients is modest.17 In trials of tumour necrosis factor (TNF) antagonists, median values of the ESR at baseline have ranged from 25 mm/h to 34 mm/h, and median values of CRP from 1.2 mg/dL to 3.0 mg/dL.18–24 However, both measures have been found to be moderately sensitive to change in these trials. The extent to which peripheral arthritis, hip involvement or inflammatory bowel disease contributes to the modest associations of the ESR and CRP with AS activity is unclear. Studies have not examined the performance of these measures separately in patients with purely axial AS, which could be argued is the setting in which a laboratory marker of AS activity would be most useful.

Several new laboratory measures have been studied as potential biomarkers in AS25–38 (box 1). Some have been tested as diagnostic markers, and for others it is not clear if the marker was intended as a measure of AS activity or of structural damage. It would be unusual for a single measure to be both a good diagnostic marker and a good evaluative marker. Good diagnostic markers should be abnormal even in patients with low disease activity, which is not desirable for a measure used to evaluate disease activity.39 The selection of patients and the design of a study would differ substantially depending on the intended purpose of the biomarker.

Box 1: Laboratory tests recently studied as biomarkers in ankylosing spondylitis

  • Serum matrix metalloproteinase-3

  • Serum interleukin-6

  • Serum vascular endothelial growth factor

  • Serum cartilage glycoprotein-39

  • Serum C2C neoepitope

  • Serum aggrecan 846 epitope

  • Urine pyridinoline and deoxypyridinoline

  • Urine C-terminal telopeptide-1

In the search for new surrogate markers specifically intended to assess AS activity, consideration should be given to what the standard for comparison should be. As noted above, patient reported measures of symptom severity have limitations, but may be useful standards in carefully selected groups of patients. Physician global assessment may also be useful if the assessments are blinded to the laboratory results. Magnetic resonance imaging of the spine could be argued to be the most objective standard for comparison, but the expense and logistical difficulties of imaging limits its feasibility in the early phases of testing of surrogate markers. Markers which were promising in early studies that used clinical reference standards might be selected for testing in later studies using magnetic resonance imaging as the standard. Longitudinal studies that demonstrated changes in the marker with flares of AS activity or decreases in the marker in response to effective treatment would provide the most convincing evidence of the value of a surrogate marker of AS activity.40


Demonstration that an intervention could retard structural damage in AS would represent a major advance. Improvement of an inflammatory spinal lesion detected by magnetic resonance imaging may suggest the potential for disease modification, but the relation between bone marrow oedema and subsequent bone fusion is unclear.23,24,26,41–44 The best evidence for disease modification would be slowing or stopping the progression of bone fusion. Options for assessing bone fusion include the sacroiliac joints and the spine. The sacroiliac joints are not the optimal joints to study because visualisation is difficult, and because fusion of these joints occurs early, limiting the pool of patients for study. In contrast, the spine is more readily visualised, is affected later in the course of AS, and provides a large anatomical range over which changes can occur. Fusion of the spine also has functional consequences, so demonstration of disease modification here would be more clinically relevant than it would be in the sacroiliac joints.

Several methods have been developed to quantify the degree of spinal fusion in AS. The two method most commonly used are the Bath AS Radiology Index, which grades the lumbar and cervical spines globally, and the modified Stoke AS Spine Score, which details abnormalities at each lumbar and cervical disc space.45,46 However, spinal fusion progresses slowly in AS. Even the most sensitive radiographic scoring methods detect only small changes over two years.47 This limitation complicates the testing of disease modification using a clinical trial design, which rarely have durations of two years or longer, and which would potentially assign patients with active AS to an inactive treatment for a prolonged period. Solutions to this problem include selecting a subgroup of patients who have rapid progression for testing, in whom treatment effects might be observed in a shorter time period, or using an observational study design with historical controls.48 Other imaging modalities might also provide a measure that is more sensitive to change than plain x rays.


Work disability is an important outcome of AS because AS affects patients during their working years. Work disability accounts for 47–75% of the costs of AS to society, and is an important component to be considered in cost-effectiveness analyses of treatments.49–51 However, work disability has not been studied as a clinical trial endpoint. One difficulty is that the relevant endpoints differ with the stage of employment (table 2). Among those who are stably employed, the number of sick days, change in work hours or type of work, or need for help at work, could be tested as endpoints in controlled trials. These events are infrequent in non-selected patients with AS, with only 12.5% reporting a decrease in work hours and less than 4% reporting more than 10 sick days over a six month period.52 However, these events would be expected to be more common among patients with active AS, who would more likely be enrolled in clinical trials, and would likely be feasible as trial endpoints.

Table 2

 Approaches to studying work disability as a clinical endpoint in ankylosing spondylitis

Persons with threatened work disability would be individuals at high risk for job loss due to the activity of AS or their occupation. These patients could be identified by their expressed concerns about ability to work, use of prolonged sick leaves, or referral to vocational rehabilitation. Among these patients, the endpoint would be maintained employment. Although best tested in a controlled trial, this may not be feasible or ethical in patients with active AS. Observational cohort studies would be subject to bias against a treatment effect, because people with more active AS might be more likely not only to stop working but also to receive aggressive treatment.

Among those who are recently work disabled, the endpoint would be return to work. This event would not likely be captured in a controlled trial. A cohort design might be useful, because bias due to adverse selection for treatment would be less. In this situation, those who are treated aggressively might be expected to be less likely to return to work, so any evidence of an increase in employment would argue for the effectiveness of the treatment. Direct intervention would likely not be feasible in those with long-term work disability. Time series methods that examined changes in rates of permanent work disability at the population level might be the best approach to determine if the introduction of new treatments influenced the risk of long-term work disability.



  • Competing interests: none declared