Article Text

Download PDFPDF

Reporting of long-term extension studies: lack of consistency calls for consensus
  1. Maya H Buch1,2,
  2. Daniel Aletaha3,
  3. Paul Emery1,2,
  4. Josef S Smolen3,4
  1. 1Section of Musculoskeletal Disease, Leeds Institute of Molecular Medicine, University of Leeds, Leeds, UK
  2. 2NIHR Biomedical Research Unit, Leeds Teaching Hospitals NHS Trust, Leeds, UK
  3. 3Division of Rheumatology, Department of Medicine III, Medical University of Vienna, Vienna, Austria
  4. 4Second Department of Medicine, Hietzing Hospital, Vienna, Austria
  1. Correspondence to Dr Maya H Buch, Section of Musculoskeletal Disease, Leeds Institute of Molecular Medicine, University of Leeds, 2nd Floor, Chapel Allerton Hospital, Chapeltown Road, Leeds LS7 4SA, UK; m.buch{at}


Double-blind, randomised controlled studies represent the gold-standard approach to determine the safety and efficacy of therapeutic interventions. In chronic conditions such as rheumatoid arthritis (RA), long-term data are vital to confirm maintenance of effect and identify potential safety signals. The recent introduction of numerous biological therapies for RA has been followed by various long-term extension (LTE) studies. Although useful, the design and method of analysis in such studies vary significantly, partly due to their complexity. This viewpoint highlights general considerations needed when undertaking a LTE study and illustrates the lack of consistency in studies of RA to date. It addresses issues of selection bias, patient discontinuation and missing data. Although used for safety reporting, the lack of adequate powering makes LTE studies of limited benefit. Ethical considerations and challenges are highlighted, including potential conflicts of interest. Finally, the authors suggest the need for consensus to ensure more reliable interpretation and application of data for clinical practice. Following the development of guidelines on reporting of clinical trials in RA and more recently, registry data, a similar approach for LTE studies would be a useful endeavour.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The central objective of a clinical trial is prospectively to assess the therapeutic impact on clinically measurable outcomes. Double-blind randomised controlled trials (RCT) are the gold-standard method for achieving this objective and are generally used to provide the supportive evidence for licensing new drugs. With respect to rheumatoid arthritis (RA), the US Food and Drug Administration as well as the European Medicines Agency (EMA) have provided guidance on the main outcomes of RCT including duration of therapy for agents to be considered for approval,1 2 enabling statistical powering of RCT for efficacy and also the potential to detect prevalent adverse events. However, as these trials are mostly of relatively short, usually 6–12 months, duration, a number of questions remain unanswered: (1) Is short-term clinical efficacy, sustained or increased over time? (2) Is functional and structural benefit sustained or changing over time? (3) Is short-term safety maintained with prolonged treatment exposure? (4) Is there a longer-term safety risk, including rare adverse events (that emerge over time)?

Answering such questions relies on the accrual of long-term effectiveness and safety data, not only to confirm initial findings, but also to record an acceptable risk–benefit ratio that supports a drug's continued use.

Over the past decade, several new agents have been licensed for the treatment of RA following various short-term trials,3,,8 with many followed by long-term extension (LTE) studies.9,,16 The designs of these studies vary profoundly, however, impeding reliable interpretation. While LTE studies are crucial in order to understand the long-term effects of therapies, their actual reporting remains fraught with challenges that require specific consideration.

This viewpoint takes a generic perspective on issues we consider important and relevant for the interpretation of LTE study results.

The complexity of LTE studies

Figure 1 illustrates the complexity underlying the inclusion of patients from the double-blind RCT to the LTE study, with the question, which of the many populations should be or are analysed in LTE studies? The populations that are described often vary between reports. Responders (A) (including patients with different levels of response) will be observed as will responders who fail to continue into the LTE study for several possible reasons including suboptimal benefit, safety issues etc (B); non-responders who may or may not complete the RCT (C and D) are also seen. In addition to such multiple factors, other steps often exist in clinical trials at present; for example, rescue medication available for non-responders, allowing some to become responders and subsequently enter into a LTE study. Such patients would require the inclusion of both their original non-responder and their subsequent responder status. It is therefore apparent that the final inclusion of patients into a LTE study (E) may vary from one study to another.

Figure 1

The complexity of long-term extension (LTE) studies. Patients in a double-blind randomised controlled trial (RCT) and subsequent LTE studies comprise several different groups: Responders (A) will be observed as will responders that do not continue onto the LTE study for a variety of reasons, such as suboptimal benefit, safety issues, loss of follow-up etc (B). Non-responders, both those who may or may not complete the initial RCT (C and D) are also seen. A considerable proportion of patients therefore do not enter a LTE study leading to a bias in patient inclusion (E). All these populations need to be considered when reporting LTE studies. DB, double-blind.

Issues related to reporting of efficacy

The aim of LTE studies is usually to confirm maintenance of the short-term response to the drug studied. One critical issue here is selection bias. The inclusion of responders (in terms of efficacy) and tolerators (in terms of safety) into the LTE study, with the exclusion of patients discontinuing therapy during the initial RCT phase (usually due to toxicity or inefficacy), together with those deciding against continued therapy at the RCT endpoint (often because of insufficient efficacy) automatically provides a population enriched with responders and resistant to toxicity for the LTE study. This usually gives the appearance of proportionately more responders and greater safety.

How can the issue of patients discontinuing therapy (or not wishing to enrol into LTE studies) and loss to follow-up be overcome? One method is to analyse the so-called intent-to-treat (ITT) population, which assumes all patients are treated according to the original study protocol and the treatment group they were originally randomly assigned to. Here, data for those truly lost to follow-up need to be imputed by, for example, using the commonly employed last observation carried forward method. Another method includes non-responder imputation, which assumes any subject who drops out of the study for any reason (including toxicity) is a non-responder. A sensitivity analysis using different imputational methods may enable better interpretation of the results. One way to test the robustness across all potential assumptions is to impute response in one, and to impute failure in another with sensitivity analysis for missing data or patients. The range of results would allow for a more informed assessment of the ‘truth’ behind the missing data.

Methods that do not impute data may also be valuable. Providing absolute numbers of responders (rather than percentages) in addition to absolute numbers of patients in a study at any point in time allows the reader to make their own interpretation.15 Merely describing the responders and non-responders at the beginning of the LTE, and reporting how many original responders maintained or lost their response and how many non-responders became responders would be another reasonable reporting method. For the reporting of a dichotomous variable, for example, response (yes/no), calculating the response rate at a given time point as a factor of the percentage of patients still on treatment at that time would also accommodate withdrawals.

Regardless of the imputational method used, the ITT population should be truly based on all patients from the beginning of the double-blind trial, ie, over the whole RCT plus LTE period. Whereas this approach also has its drawbacks (some patients decide not to enrol into an LTE for reasons other than safety or inefficacy), it would be a more valid solution.

Issues related to reporting of safety

Long-term drug data reporting offers the advantage of detecting incidences of long-term adverse effects (cardiovascular effects, incidence of malignancy, etc). This ability with LTE studies remains limited by the fact that almost all of the original trials that are followed by LTE have not been powered to detect safety signals (even if several trials are pooled into the extension phase); the role of LTE studies addressing safety questions is therefore quite limited. Postmarketing surveillance, larger cohort studies and registries therefore play a crucial role in the detection of long-term consequences and side-effects of drugs.

Lack of consistency

Table 1 summarises the design and methods of several LTE studies on treatments for RA that have been published over the past decade, illustrating the considerable inconsistency in methods applied. For example, the long-term lefluomide study17 included patients from two previous phase III studies. The primary endpoint was not clearly stated and no information on missing data imputation was provided. The AIM (abatacept plus methotrexate vs placebo) LTE study of 2 years maintained blinding of the originally randomised treatment.18 ITT evaluation using the cohort starting at LTE baseline was chosen for analysis. Assessment of efficacy measures used either protocol-prespecified analyses (on the ITT population) or post hoc as-observed analyses within the LTE. Safety data compared the original double-blind RCT (AIM) with the cumulative therapy (the double-blind and LTE study). In other studies, it is sometimes unclear as to the exact method of analysis undertaken.

Table 1

Summary of key features of LTE studies

Ethical considerations

Aside from the methodological and statistical challenges of LTE studies, several ethical issues also surround such initiatives. Ethical committees frequently request the extension of new drug provision for patients who have responded because it is deemed unethical to withdraw an investigational drug that has shown benefit. The principal purpose of a LTE study can, however, be a source of controversy in itself, with a conflict between marketing aims (to facilitate continued drug use) over the genuine research agenda. Long-term evaluation may mean continuation of true placebo in the control arm, which would be unethical; inclusion of a rescue therapy arm and switching of patients to the investigational agent at the start of the LTE phase addresses this issue and is now often favoured. Increasing consensus to aim for more ambitious endpoints to impact disease outcomes19 significantly raises the question of whether the degree of response achieved during the double-blind RCT should be considered before inclusion into a subsequent LTE study. For example, is attaining a mere American College of Rheumatology (ACR) 20 (or even ACR50) response considered sufficient grounds to enter patients into LTE? Indeed, the known flaws associated with response assessment using such composite tools only emphasises the inaccuracies introduced (in RCT generally) and LTE study patient inclusion.


The advantage of LTE studies is clearly the opportunity of patients continuing to receive an efficacious drug that is not yet available in clinical practice. LTE studies, however, require particular attention in deciding how to report the data using the most unbiased and valid scientific approach possible. Shortcomings in the design of studies limit the accurate interpretation of data; a lack of consistency limits between studies restricts the ability to undertake indirect comparisons. Together, such deficiencies compromise the application of information into clinical practice.

Finally, LTE studies are increasingly undertaken in other disease groups, such as ankylosing spondylitis. While this viewpoint focuses on RA, the principles underlying these comments are clearly relevant to and can be applied to other disease groups.

Research agenda: a time for consensus

Recent EULAR recommendations have included EULAR/ACR collaborative guidelines on clinical trial reporting20 as well as guidance on reporting from registries.21 There is similarly a clear need to address LTE clinical data reporting to ensure greater consistency and robust statistical methodology. A future research agenda could include a formal, systematic literature review of all LTE studies in RA in order to identify the principal methods of analysis/reporting that have been employed to date. A specialist task force could then work towards producing evidence and consensus-based recommendations on key aspects of LTE data, endorsing the full availability of data as well as guidance on appropriate methods of data imputation, analysis and presentation of results. Any such recommendations could be disseminated to the wider rheumatology community, industry and regulatory authorities. These could be applied for future LTE studies and this whole area re-visited with an evaluation of subsequent LTE studies to determine whether better consistency and accuracy of reporting is achieved.

Such an endeavour would ensure greater transparency for patients and physicians and provide better indirect comparability between trials22 with more effective application of information in our clinical practice.



  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.