Article Text


Treatment trials in ankylosing spondylitis: current and future considerations
  1. D van der Heijde1,
  2. J Braun2,
  3. D McGonagle3,
  4. J Siegel4
  1. 1Department of Internal Medicine, Division of Rheumatology, University Hospital Maastricht, Maastricht, The Netherlands
  2. 2Rheumazentrum Ruhrgebiet, Herne, Germany, Department of Gastroenterology and Rheumatology, Free University, Berlin, Germany
  3. 3Department of Rheumatology, The University of Leeds, Leeds, UK
  4. 4Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, Maryland, USA
  1. Correspondence to:
    Dr D van der Heijde, Department of Internal Medicine, Division of Rheumatology, University Hospital Maastricht, PO Box 5800, 6202 AZ Maastricht, The Netherlands;


Emerging treatment options in ankylosing spondylitis (AS) are giving new hope to patients with this chronic and potentially disabling disease. Clinical development of new treatments requires that rigorous and well controlled trials be conducted to demonstrate safety and efficacy. A number of classification systems have been developed in recent years as a result of enhanced understanding of the pathogenesis of AS. Although new outcome measures have been developed and a consensus has been reached on the use of assessment instruments in clinical trials, there is still need for improvement and implementation.

The ASsessments in Ankylosing Spondylitis (ASAS) Working Group has addressed some of these dilemmas by establishing a core set of domains for the evaluation of AS and by selecting specific assessment methods for each domain. They have also published improvement criteria for assessing short term improvement with symptom modifying antirheumatic drugs and are presently in the process of developing response criteria for disease controlling antirheumatic treatment. Various experts are also currently examining discrepancies and inadequacies of classification systems for AS. Imaging studies, magnetic resonance imaging, in particular, may provide better classification criteria in the near future.

In addition to consensus on outcome assessment and classification of AS, lessons learnt from clinical trials in rheumatoid arthritis (RA) may serve as a template for AS. Guidance provided by the United States Food and Drug Administration (FDA) for clinical trials in RA may be of particular use. The FDA has defined the claims that sponsors can receive for RA products and the clinical trial data that would be expected to be submitted to support such claims.

  • ankylosing spondylitis
  • outcome assessment
  • classification
  • clinical trials
  • AS, ankylosing spondylitis
  • ASAS, ASsessments in Ankylosing Spondylitis
  • BASDAI, Bath AS Disease Activity Index
  • BASFI, Bath AS Functional Index
  • BASMI, Bath AS Metrology Index
  • BASRI, Bath AS Radiology Index
  • CRP, C reactive protein
  • DC-ART, disease controlling antirheumatic treatment
  • DFI, Dougados Functional Index
  • ESR, erythrocyte sedimentation rate
  • FDA, Food and Drug Administration
  • MISS, MR imaging in seronegative SpA
  • MRI, magnetic resonance imaging
  • NSAID, non-steroidal anti-inflammatory drug
  • OMERACT, Outcome Measure in Rheumatoid Arthritis Clinical Trials group
  • RA, rheumatoid arthritis
  • SASSS, Stoke AS Spinal Score
  • SF-36, Short Form-36
  • SM-ARD, symptom modifying antirheumatic drug
  • SpA, spondyloarthropathy
  • VAS, visual analogue scale

Statistics from

The emergence of new treatment options in ankylosing spondylitis (AS), resulting from recent advances in immunology, biotechnology, and pharmaceutical science, has accelerated the need for universal standards to assess systematically treatment response and indications in AS. Clinical trials to evaluate the safety and efficacy of these new treatments should be conducted using instruments agreed upon for measuring short and long term outcome and classification systems for differentiating patient populations based on disease activity, disease pattern, and previous treatment. Standardisation of these aspects will facilitate the collection of conclusive, reproducible, and comparable study results, which in turn might impact future treatment strategies. Although consensus has not yet been established on many aspects of disease outcome, classification, treatment indications, and clinical trial end points for medical treatments in AS, much progress has been made over the past 50 years as the aetiology and pathophysiology of AS have been further elucidated. Recent efforts of various individual people and groups towards devising comprehensive uniform outcome assessments and classification systems and the impact of these efforts on the design and implementation of future treatment trials are reviewed herein.


Numerous assessment methods are currently available in AS, including laboratory measurements, metrology, radiographs, and questionnaires. In their review of published work, Bakker et al1 and van der Heijde et al2 documented more than 100 methods. Although the abundance of approaches suggests significant progress, continued work is needed to improve existing methods and to fill gaps within the spectrum of relevant outcomes.3 In addition, consensus on the selection of methods to apply in clinical trials and on guidelines for their use is needed.

In 1995 an international working group was formed to address these needs. The group, known as the ASsessments in Ankylosing Spondylitis (ASAS) Working Group, established a core set of domains for the evaluation of AS and selected specific assessment methods (instruments) for each domain.2,4 They subsequently published improvement criteria for assessing short term improvement with symptom modifying antirheumatic drugs (SM-ARDs) using outcome data from placebo controlled trials of non-steroidal anti-inflammatory drugs (NSAIDs).5 Presently, work is in progress to develop response criteria for disease controlling antirheumatic treatment (DC-ART) based on results of studies with infliximab and etanercept. The ASAS core set of domains, selection of instruments, improvement criteria for assessing short term improvement with SM-ARDs, and development of response criteria for DC-ART are reviewed in the subsequent sections.


Treatment studies in AS have often employed inconsistent and excessive numbers of assessment methods, some of which are not validated. This can create multiple dilemmas. Establishing a uniform minimum core set of variables for inclusion in all research projects may help prevent these dilemmas by ensuring that:

  • Chance occurrences of statistically significant differences between groups are minimised

  • Investigators do not introduce bias by selectively publishing only favourable variables

  • Comparisons can be made between studies

  • Meta-analyses can be performed.6

The ASAS Working Group, composed of clinicians, researchers, industry representatives with expertise in AS, and patients with AS from more than 20 countries around the world, used a combination of expert consensus and statistical approaches to develop a core set of domains. The group defined core sets for three different settings:

  • The evaluation of DC-ART

  • The evaluation of SM-ARDs and physical function

  • Use in clinical record keeping.

Table 1 lists the core set of domains for each of these settings. It should be noted that a number of other settings were initially considered based on all potential points of interest in AS. The three settings chosen reflect the most pertinent aspects while eliminating redundancy.2

Table 1

Core set for studies on DC-ART, SM-ARDs/physical therapy, and clinical record keeping2

The ASAS core sets for assessing outcome in AS were endorsed by the Outcome Measure in Rheumatoid Arthritis Clinical Trials group (OMERACT) and by the International League Against Rheumatism (ILAR) in 1998 and have been known since as the ASAS/OMERACT/ILAR core sets.7 Validation for aspects of truth (Is the measure truthful? Does it measure what is intended? Is the result unbiased?) and discrimination (Does the measure discriminate between situations of interest?), set forth by the OMERACT filter, are continuing for several instruments.8 However, results of the ASAS Working Group’s combined expert consensus and statistical approach have shown considerable similarity with the results of the purely statistical approach of Calin et al9 to the selection of a core set and to recommendations given by Bellamy10 in his book on clinical metrology of musculoskeletal diseases. These similarities suggest the acceptability of the ASAS core set, although further validation is required.


After determining the core set of domains for the previously mentioned settings, the ASAS Working Group proceeded to select specific instruments for each domain. One hundred and five instruments identified in the literature were evaluated for feasibility and relevance. Thirty five instruments were deemed not feasible or not relevant, or both, and were eliminated. The remaining 70 were ranked and discussed by members of the group to determine the final selection of instruments presented in table 2.4

Table 2

Specific instruments for each domain in core sets for DC-ART, SM-ARDs/physical therapy, and clinical record keeping4

Although the group selected only single instruments for each domain, combined indices such as the Bath AS Disease Activity Index (BASDAI), and the Bath AS Metrology Index (BASMI) need to be evaluated further. Overall, these combined indices scored high in percentage of feasibility and relevance. However, in some cases, single components within these indices scored low.4 Moreover, a combined score similar to the disease activity score in rheumatoid arthritis (RA) is under development. The inclusion of different domains included in the ASAS core set, which are not included in other combined indices, might be an advantage. A description and rationale for the ASAS instruments within each domain are detailed in the following sections and ASAS instruments are compared with other widely used instruments.

Physical function

The Bath AS Functional Index (BASFI) and Dougados Functional Index (DFI) were selected for evaluation of function. The BASFI, published in 1994, includes 10 items on ability to perform and cope with activities of daily living (table 3). Each activity in this questionnaire is scored on a 10 cm visual analogue scale (VAS). The mean of the 10 scales yields the total score.11 The DFI was first published in 1988 and consists of 20 Likert response items assessing the ability to perform distinct daily activities (table 3). A point score is assigned to each of the three possible answers (yes, with no difficulty=0; yes, but with difficulty=1; and no=2). The total score is calculated as a sum of the 20 item scores (range, 0–40) (table 3).12 Although not published as an official modification of the DFI, a five point Likert scale has since replaced the original three point scale. The questions are scored as either 0, 0.5, 1, 1.5, or 2, so that the total range is the same but allows for greater detection of change.

Table 3

The 10 BASFI items (the patients must indicate their level of ability for each of the 10 activities on a 10 cm VAS) and the 20 items of the DFI (all begin with “Can you . . .”) on a three point scale

Both self administered questionnaires have been shown to be valid and reliable measures of functional capacity in AS and are widely used. However, there are important differences between the two instruments, as well as limitations to their performance in specific situations. These are summarised in table 4.11–13 Presently, the BASFI is the most widely used instrument to assess physical function in AS.

Table 4

Differences and limitations of the BASFI and DFI based on literature review by Ruof and Stucki13


For the assessment of pain, two 10 cm VASs were selected: one for pain of the spine at night due to AS on average past week, and the other for pain of the spine (without time restraints) due to AS on average past week. The BASDAI, which has been used in a number of studies, contains three VAS items relating to pain and discomfort over the past week. The BASDAI items assess three locations of pain: overall pain in neck, back, or hip; overall level of pain/swelling in joints other than the neck, back, or hip; and overall discomfort from any areas tender to touch or pressure (table 5).14 The VAS on the peripheral joints combines pain and swelling; the last question is assumed to assess tenderness of the entheses. The swelling of the joints and the involvement of the entheses are assessed in separate domains in the ASAS core set. Although not included in the ASAS core set, there is also a pain measurement that is assessed by the doctor. This is, in fact, a combined assessment of tenderness on palpation in combination with limitation caused by pain and spasm. This gives a global grading from 0 to 4 for three areas in the spine: cervical, thoracic, and lumbar plus sacroiliac joints. Some think it is advantageous to have a doctor’s assessment of pain in addition to various measures assessed by the patient.

Table 5

BASDAI questionnaire items

Spinal mobility

The clinical relevance of spinal mobility in the assessment of AS depends upon two main factors: the stage of the disease and the method by which spinal mobility is measured. Thus, responses to questions about the relevance of assessing spinal mobility by ASAS members ranged widely, depending on how the questions were posed with respect to disease stage and method of measurement. Nevertheless, the majority of members agreed that over long periods of follow up, assessment of spinal mobility provides a sensitive measure of structural damage and disease activity. Chest expansion, modified Schober test, and occiput-to-wall distance were selected to represent the domain of spinal mobility. Whereas numerous modifications of the Schober test exist, the version recommended by the ASAS Working Group is performed as follows:

  • With the patient standing erect, make a mark on the back at the midpoint on an imaginary line joining the posterior superior iliac spines.

  • Make another mark 10 cm above the first.

  • Ask the patient to bend forward maximally, keeping the knees fully extended.

  • With the spine in fullest flexion, measure the distance between the two marks.

  • The normal distance is greater than 15 cm owing to stretching of the skin overlying the mobile lumbar spine.4

The BASMI is a combined measure to assess spinal mobility and hip function, and consists of the following items:

  • Tragus to wall

  • Lumbar flexion

  • Cervical rotation

  • Lumbar side flexion

  • Intermalleolar distance.15

Patient global

The patient global assessment, measured by a VAS on average for the past week, was selected. This differs from the Bath AS Global, which is a global assessment for one week and includes a VAS over six months.

Spinal stiffness

The selection for spinal stiffness was duration of morning stiffness of the spine past week. Two components of the BASDAI also assess spinal stiffness: overall level of morning stiffness from time of awakening past week and duration of morning stiffness from time of awakening past week (table 5). The average of these two questions represents the stiffness component of the BASDAI. In the analyses to develop the ASAS improvement criteria, this combined measure for spinal stiffness performed better than information on the duration of stiffness alone.

Peripheral joints and entheses

The ASAS Working Group selected a joint count for evaluation of peripheral joints (without grading or weighting of the joints). The 44 joints included in the count are:

  • Right and left sternoclavicular joints

  • Acromioclavicular joints

  • Shoulder joints

  • Elbows

  • Wrists

  • Knees

  • Ankles

  • 10 Metacarpophalangeal joints

  • 10 Proximal interphalangeal joints of the hands

  • 10 Metatarsophalangeal joints.

Only one index for entheses, the Mander enthesis index, was found in the literature.16 However, because members of the ASAS Working Group deemed this measure not feasible owing to the extensiveness of the instrument, no selection for entheses was made. One of the BASDAI questions deals with pain from entheses. Several modifications of the Mander index have been applied in recent trials. However, these modifications have not been validated. Currently, a simplified index to assess entheses is being developed and validated.

Acute phase reactants

Two recent evaluations of the validity of the erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP), by Ruof and Stucki17 and Spoorenberg et al,18 respectively, concluded that acute phase reactants do not comprehensively represent the disease process in AS and that their worth in AS clinical trials is limited. In contrast with RA, the ESR and CRP values are lower in AS and generally do not vary as much with respect to the severity of the disease. However, patients with AS with peripheral joint involvement and/or inflammatory bowel disease tend to have higher levels than patients with only axial involvement. Neither the ESR nor the CRP level had higher validity than the other. The advantages of lower costs, ease of performance, standardised testing, and promptness of results led to the selection of the ESR by the ASAS Working Group to represent the acute phase reactant domain.4,17,18 Interestingly, more recent evaluations suggest that acute phase reactants may reflect responsiveness in the evaluation of DC-ART. Thus, the perception of the usefulness of acute phase reactants in the assessment of AS might change in the near future with the availability of new information on the effect of biological treatments in AS.

Radiographs of spine and hips

The spine and sacroiliac joints are predominantly affected in AS. However, large axial joints, such as the hips and shoulders, and peripheral joints can also be involved. Plain radiographs of the cervical spine, lumbar spine, and pelvis (hip is included with the pelvis) were selected, although no evaluation for the thoracic spine and no scoring method have been recommended. In contrast with RA, for which radiographic change is an important end point and several validated scoring methods are available, radiographic change in AS is not a well established measure of outcome, and the choice of scoring method varies according to the type of study performed (for example, study of the natural history, prognosis, or effectiveness of treatment in AS).

The situation is complicated further by uncertainty about whether various radiographic abnormalities in AS represent the inflammatory disease process or whether they indicate healing. For example, fluffy periostitis or erosions are thought to represent inflammation, whereas bridging syndesmophytes are due to spinal ossification and are clinically asymptomatic. These are often present in late disease, and may be regarded as due to a non-inflammatory process.19 Further elucidation of the relationship between inflammation and new bone formation in AS may provide answers. In the meantime, these and other issues need to be addressed before a scoring method can be chosen. Additional data obtained from other radiographic modalities (for example, computed tomography), ultrasonography, magnetic resonance imaging (MRI), scintigraphy, and dual energy x ray absorptiometry), some of which are discussed below, are becoming available, which may give more insight into the underlying pathophysiological processes in AS.20

Two scoring methods have been fully described in the literature to date: the Stoke AS Spinal Score (SASSS) published by Taylor et al in 199121 and the scoring method published by Kennedy et al in 1995,22 which was subsequently modified in 1999 and named the Bath AS Radiology Index BASRI.23 Both the SASSS (table 6) and the BASRI (table 7) demonstrated feasibility and good reproducibility.20–26 In an unselected cohort of patients with AS, very little progression could be assessed by either method at a one and two year interval.26 More data are needed, especially from patients with a high likelihood of progression.

Table 6

SASSS scoring method

Table 7

BASRI scoring method


No specific instrument was selected for the assessment of fatigue because the ASAS members judged none of the four measures identified in the literature to be relevant. However, recent research has demonstrated the validity of the BASDAI VAS item on overall level of fatigue, with good reliability and sensitivity to change in the assessment of overall fatigue (table 5).27 If information is needed on several aspects of fatigue, the multifactorial fatigue index is a good alternative.28 The Short Form (SF)-36 method has recently been used to assess fatigue and other elements of physical function in patients with AS.29 Studies on the performance of other instruments to assess fatigue are in progress.


Symptom modifying antirheumatic drugs

Therapeutic studies in AS to date have not shown any retardation of structural damage. Thus, it is not possible to say whether drugs can be considered truly disease modifying. In recognition of this fact, investigators in AS have used terms such as symptom modifying antirheumatic drugs (SM-ARDs) and disease controlling antirheumatic treatment (DC-ART). Standard criteria for defining improvement in SM-ARD evaluation were developed based on groundwork laid down by the ASAS Working Group for assessing outcome in AS and on outcome data from five randomised controlled trials with NSAIDs. Measures representing at least four of the five AS core domains (physical function, pain, spinal mobility, spinal stiffness/inflammation, and the patient global assessment) were included in each of the five trials.12,30–33 Drawing from these data, relevant levels of change/improvement were defined within each of the five AS domains. A conceptual list of possible ways to define improvement in terms of some or all of the five core domains was then prepared. Clinical judgment, previous work in osteoarthritis and RA, and results of published clinical trials were applied to the development of the list, which consisted of 20 single item, multiple domain, and index definitions.5

The candidate definitions of improvement were tested and validated using χ2 tests. The sensitivity and specificity of each definition for the identification of actively treated patients was determined using a random two thirds of the clinical trial data. From these, a range of good performing candidates for response criteria was selected based on high χ2 test values and placebo response rates of 25% or less, and validated using the remaining one third of the data. Further validation was performed by examining the overlap between response rates and partial remission rates for each definition of response.

The resulting preliminary definition of short term improvement in AS incorporates four outcome domains (all scored on a scale of 0–100):

  • Physical function (BASFI score)

  • Pain (VAS pain score)

  • Patient global assessment (VAS global assessment score)

  • Inflammation (mean of the two morning stiffness related BASDAI VAS scores (first choice) or by morning stiffness duration with a maximum of 120 minutes (second choice)).

Improvement of AS is defined as improvement of ≥20%, and net improvement of ≥10 units, on a scale of 0 to 100 in each of three domains with no worsening of ≥20% and no worsening of ≥10 units in the fourth, whereas the reverse defines worsening of the disease. A definition of partial remission was also provided that requires the patient to have a low level of disease activity (that is, <20 units on a scale of 0–100 in each of the four domains).5

Arguments in favour of a single domain for defining improvement can be made because single domain definitions for improvement performed well in tests, and improvement in each of the four outcome domains tended to be consistent. However, a multiple domain definition is preferable owing to greater content validity and reliability.5 As a follow up and for further validation, a Delphi exercise was held among the ASAS members to investigate the clinical relevance of the criteria for short term improvement. Those patients who fulfilled the ASAS improvement criteria had a clinically relevant improvement by the judgment of the members. However, there was also a group of patients who did not fulfil the criteria but showed a clinically relevant improvement according to the ASAS members, implying that the criteria were strict.34

Disease controlling antirheumatic treatments

A similar approach for defining improvement in DC-ART evaluation is currently being applied based on outcome data from trials with infliximab and etanercept. Owing to the difference in selection of domains for DC-ART it can be assumed that the domains included in the improvement criteria will also be different. More data will become available in the near future.


The classification of AS has gradually evolved as knowledge of the clinical features, natural history, genetic predisposition, and pathophysiology of the disease have unfolded. Disease classification is vital because it governs not only the development and implementation of clinical trials, as is the subject of this paper, but also guides many aspects of disease management and international communications.

Currently accepted classification systems consist of those that provide criteria for classifying the spondyloarthropathies (SpAs),35,36 of which AS is a subtype, and those that provide criteria for classifying AS as a distinct disease entity.37 Because patients with AS often have highly variable clinical presentations and outcomes, the current classification systems often lack sensitivity and specificity for certain populations and situations. For instance, considerable debate is currently taking place about the diagnostic and staging criteria for AS and about the overlapping relationship of these criteria with classification criteria and with outcome assessment in AS. This debate and a new proposal for disease staging in AS are discussed in “Staging of patients with ankylosing spondylitis: a preliminary proposal” within this supplement (p iii19).

The results of a questionnaire sent to 30 international experts from countries in Europe and North and Central America surveying the experts’ opinions on nomenclature, disease classification, and study design for future trials illustrate the widespread discrepancies in the terminology and criteria for diagnosis and classification of AS used by rheumatologists (see “Building consensus on nomenclature and disease classification for ankylosing spondylitis” within this supplement (p iii61)). The discrepancies underline the need for further discussion and resolution. Experts are currently searching for solutions in histological, immunological, and imaging studies of AS.

MRI for diagnosis and as outcome

Imaging studies with MRI may also provide helpful and sensitive classification criteria and clues as to where to look for key pathogenic mechanisms in AS. In diagnosis, MRI has been used to visualise acute sacroiliitis, spondylitis, and spondylodiscitis in patients with SpA in recent studies.38–40 In contrastwith radiographic imaging, which may take several years to detect the effects of inflammation, bony changes, and ankylosis, MRI detects acute inflammation of the enthesis, bone, and synovium in addition to bony changes and ankylosis. This is best detected when applying contrast with gadolinium-DPTA or fat suppression techniques.

In relation to pathogenic concepts, MRI studies have shown that the earliest lesion in the sacroiliac joint in AS is subchondral osteitis,41 and that human leucocyte antigen (HLA)-B27 determines the severity of osteitis. MRI studies have also shown that osteitis or enthesitis is common at all diseased sites, including synovial joints, suggesting common unifying mechanisms for disease at disparate sites. Importantly, the reported MRI abnormalities have been validated as being representative of acute inflammatory processes within the bone.42

MRI has recently been used to investigate the impact of anti-tumour necrosis factor α treatment on axial, peripheral, and entheseal disease in patients with AS. In a study by Stone et al, improvement in axial MRI was seen in seven of eight patients after the first two infusions with infliximab.43 In an open label trial of etanercept in SpA,44 scoring was performed on paired MRI scans of entheseal lesions, osteitis in the sacroiliac joints, lumbar and cervical spine, and peripheral joints at baseline and week 24. Thirty eight of 44 lesions (86%) either resolved completely or improved after treatment, and no new lesions developed.

Grade II radiographic evidence of sacroiliitis, a hallmark of the SpAs, is currently a criterion for the diagnosis of AS based on the modified New York criteria.45 However, the difficulty of detecting early inflammation may be resolved with the use of MRI, and therefore improve current classification by providing earlier diagnosis and more accurate staging of AS. Dynamic MRI with gadolinium-DTPA enhancement has been shown to measure effectively the inflammatory process and its consequences. Bollow et al compared this method with inflammation quantified by cellular analysis of immunostained sacroiliac biopsy specimens, and found good correlation, showing that T cells and macrophages are frequent cells of early and active sacroiliitis in the SpAs.46

Realising the need for early diagnosis and intervention in SpA and for better methods of assessing bone abnormalities, an international multicentre collaborative of researchers has initiated the MR imaging in seronegative SpA (MISS) initiative to develop a scoring system for spinal disease and to study the use of MRI as an outcome measure.

As alluded to previously, radiographic scoring methods for AS and other SpAs are wrought with methodological issues, such as which abnormalities to score, which sites to include in the scoring method, which radiographic views to use, what order to score, and how to handle interobserver and/or intraobserver variation.20,47 Many of these MRI related issues will be examined by the MISS initiative. However, the overall advantages of MRI, including detection of early inflammation, better visualisation of lesions, especially cartilage and enthesitis, make it a useful assessment tool.7 Furthermore, the MRI changes in the bone are often quite marked, which may facilitate scoring. For example, κ values >0.8 for scoring plantar fasciitis have been reported.48


Recent promising results in preliminary AS trials with infliximab and etanercept have raised hopes that symptom modifying, disease controlling, and disability reducing indications can be obtained for these treatments. Future trials must be designed to determine whether such claims can be made. Trial duration, patient population, efficacy end points, assessment methods, and data analyses are among the many issues that will need to be considered. Although the Food and Drug Administration (FDA) has not, as yet, endorsed any specific outcome measures for trials in AS, the contributions of the ASAS Working Group and others towards developing comprehensive standardised outcome assessments and classification systems will help guide the design and implementation of future trials. Conversely, new measures and classification systems may be validated by the clinical data obtained in future trials.

The United States FDA, in collaboration with members of the academic community, pharmaceutical industry, and the public, has, from time to time, issued guidance documents to define the claims that sponsors can receive for their products and the clinical trial data that would be expected to be submitted to support such claims. Although a guidance document has not yet been developed for AS products by the FDA or by any other government or private entity within or outside the US, a guidance document on the clinical development of programmes for drugs, devices, and biological products for the treatment of RA was issued in 1999 by the US Department of Health and Human Services, FDA, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), and Center for Devices and Radiological Health (CDRH).49 A number of issues were addressed in the guidance document which may have application for AS. However, it must be emphasised that RA and AS are distinct disease entities for which direct associations and comparisons cannot always be made.


The FDA guidance document for treatment trials in RA identifies six claims that can be considered for therapeutic agents based on a wide range of potentially achievable outcomes:

  • Reduction in signs and symptoms of RA

  • Major clinical response

  • Complete clinical response

  • Remission

  • Prevention of disability

  • Prevention of structural damage.

Although relief of signs and symptoms has been the central therapeutic effect of therapeutic drugs marketed for RA since approximately 1997, the addition of claims for major clinical response, complete clinical response, and remission, helps to distinguish products further by providing a means for demonstrating patient benefit of greater magnitude than is needed for a claim of symptomatic relief. The claim for prevention of structural damage does not in itself define a patient benefit and therefore should be combined with one of the other claims. The claim for prevention of disability reflects the potential for long term benefits in the course of the disease.

Table 8 lists the duration and end points for the six claims suggested in the guidance document. Trials of at least six months’ to two years’ duration are recommended. These relatively lengthy trials are desirable for a number of reasons:

  • RA is a disease of long duration

  • Interventions that provide only short term benefit are less valuable to patients than those that provide long term benefit

  • Products with the potential to elicit antibody formation should be assessed for durability because antibodies may develop to block effectiveness.

Table 8

FDA guidance on design and conduct for RA trials

A trial of three months’ duration for the claim of reduction in signs and symptoms, however, is acceptable for products belonging to an already established pharmacological class (that is, NSAIDs). Regardless of the trial duration, methods that evaluate response over time are preferable to methods that incorporate only the baseline value and the final observation.

In addition to identifying potential claims for RA products and advising on the duration and efficacy end points for trials to support various claims, the guidance document provides detailed discussions of special considerations for phase I–IV trials, safety analyses, biological products, medical devices, and juvenile RA. For efficacy trials (phase III work), the guidance document considers global considerations, including patient selection, concomitant antirheumatic treatment and other concomitant treatments, stratification, blinding, and effects of dropouts and non-compliance; trial design considerations, including design of superiority, equivalence, and novel trials; and analytical issues, including handling dropouts, comparison with baseline outcome measures, and statistical considerations.

For biological products, characteristics and issues unique to biological agents as compared with traditional drugs are detailed, such as species specificity, toxicity response, and product homogeneity. In addition, the role of antibodies is also discussed. For instance, homogeneity of a biological agent often has a critical role in determining the activity and toxicity of a compound. Thus, biological agents should demonstrate consistency from lot to lot while under development and should be well characterised in order to be appropriately evaluated.

The comprehensive FDA guidance document for treatment trials in RA may serve as a model for AS by helping to identify key issues in the development of therapeutic drugs. Although some overlap exists between RA and AS, and a number of lessons can be learnt from experience in RA, issues unique to AS must also be identified and all issues must be addressed specifically for patients with AS. An FDA guidance document for AS is currently under development.


Until recently, treatment options for patients with AS have been limited. Despite the relatively high prevalence of the disease, no disease modifying agents have been available as they have been for RA. Promising preliminary results from studies with infliximab and etanercept have raised hopes that anti-tumour necrosis factor α agents may provide symptom control and disease modification for patients. However, universal standards for assessing treatment response and indications in AS are key to an evaluation of these agents. A number of challenges to standardisation have been identified and addressed. However, much work still needs to be done.


Dr Jeffrey Siegel contributed to the “Design and conduct of future treatment trials” section and to the “Clinical trials in RA” section of this article. The United States FDA does not endorse any specific outcome measures or clinical trial designs for AS.


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.