Article Text

Download PDFPDF

Time-dependent biases in observational studies of comparative effectiveness research in rheumatology. A methodological review
  1. Michele Iudici1,
  2. Raphaël Porcher1,
  3. Carolina Riveros1,
  4. Philippe Ravaud1,2
  1. 1 Methods of Therapeutic Evaluation of Chronic Diseases (METHODS) Team, INSERM, UMR 1153, Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), Paris, France
  2. 2 Cochrane France, Paris, France
  1. Correspondence to Dr Michele Iudici, Methods of Therapeutic Evaluation of Chronic Diseases (METHODS) team, INSERM, UMR 1153, Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), Paris 75001, France; michele_iudici{at}


Objective To assess to what extent time-dependent biases (ie, immortal time bias (ITB) and time-lag bias (TLB)) occur in the latest rheumatology observational studies, describe their main mechanisms and increase the awareness on this topic.

Methods We searched PubMed for observational studies on rheumatic diseases published in leading medical journals in the last 5 years. Only studies with a time-to-event analysis exploring the association of one or more interventional strategies with an outcome were included. Each study was labelled as free from bias, at risk of TLB, at risk of misclassified ITB if the period of immortal time was incorrectly attributed to an intervention group, or at risk of excluded ITB if the immortal time was discarded from the analysis.

Results We included 78 papers. Most studies were performed in Europe or North America (46% each), were not industry funded (62%) and had a safety primary outcome (59%). In total, 13 (17%) studies were considered at risk of time-dependent biases. Among the studies at risk of ITB (n=8; 10%), in 5 (6%), waiting time to receive treatment was wrongly attributed to the treatment exposure group, which indicated misclassified ITB. Five (6%) studies were at risk of TLB: patients on conventional synthetic disease-modifying antirheumatic drugs (DMARD; first-line drugs) were compared with patients on biologic DMARDs (second or third-line drugs) without accounting for disease duration or prior medication use.

Conclusions One in six comparative effectiveness observational studies published in leading rheumatology journals is potentially flawed by time-dependent biases.

  • epidemiology
  • treatment
  • autoimmune diseases

Statistics from

Key messages

What is already known about this subject?

  • Time-dependent biases, that is, immortal time bias (ITB) and time-lag bias (TLB), are biases observed in time-to-event analyses. If present, they can distort study results by inflating the benefits of a drug in terms of higher efficacy or lower risk. These biases must be identified to avoid performing flawed analyses or wrongly interpreting the results of biased studies.

What does this study add?

  • We found ITB or TLB in about one in six observational comparative effectiveness studies of rheumatology published in leading journals. A description of the main mechanisms leading to these biases in the field and a summary of the key points useful to identify and avoid them are provided.

How might this impact on clinical practice or future developments?

  • A better recognition of such biases could help clinical researchers improve the quality of comparative effectiveness observational studies and clinicians critically appraise study results.


Time-dependent biases refer to a group of biases occurring in time-to-event analyses of observational studies. They include immortal time bias (ITB), referred to as ‘survivors treatment bias’1–4 or ‘guarantee-time bias’,5 and time-lag bias (TLB).6 7 Both biases tend to inflate the benefits observed with a drug in terms of higher efficacy or lower risks.6 7

ITB occurs when the treatment exposure in a fixed-time model is wrongly assigned or excluded from the analysis.1 8 Here we describe an example using observational data to assess whether total knee arthroplasty (TKA) in patients with advanced knee osteoarthritis is associated with better survival as compared with medical therapy alone. We have data from the diagnosis of severe osteoarthritis and the concomitant prescription of chronic medical treatment. Patients are eligible to undergo surgery. If patients are classified into ‘surgery’ or ‘medical therapy’ groups according to whether they had received surgery, the time between the diagnosis and surgery (figure 1, a) for the surgery group is immortal (‘free of events’) in the sense that they must have survived that time to be classified in this group. If this time is wrongly attributed to the surgery instead of medical therapy group (figure 1, a+b vs c), patients in the surgery group will benefit from this immortal time. This first case is reported as ‘misclassified ITB’.

Figure 1

Illustration of mechanisms of immortal time bias (ITB). In misclassified ITB, the immortal time (box a) is wrongly assigned to the ‘surgery’ group (a+b vs c) and in excluded ITB, it is excluded from the analysis (b vs c).

A second case would be to follow-up surgery patients after they receive TKA (figure 1, b) and the medical therapy group after the diagnosis (figure 1, c). In this case, the time ‘free of events’ (immortal time) before surgery (figure 1, a) is no longer attributed to surgery patients (which is correct), but it is not even attributed to medical therapy patients (which is not correct) (figure 1, b vs c). The rate of events will be overestimated in the latter group. This is an example of ‘excluded ITB’.

Conversely, TLB is observed when patients at different disease stages are compared without taking into account disease duration, if the occurrence of the outcome of interest is potentially confounded by the disease duration itself. This bias can be observed, for example, in studies investigating patients taking first-line versus second-line or third-line drugs (figure 2).6 7

Figure 2

Illustration of the time-lag bias.

In 2004, van Walraven et al 9 found time-dependent biases in about 10% of cohort studies published in leading medical journals and showed that correcting the biases could have qualitatively changed study conclusions in more than half of the studies. Since then, despite an increasing number of reports highlighting how these biases can undermine study results,1 6 8 10–17 no study has attempted to quantify and describe the main features of these biases in a specific medical domain.

Given the increasing number of registry and big database studies driving clinical decisions in rheumatic diseases,18 we aimed to assess to what extent time-dependent biases occur in the latest rheumatology literature and describe their main mechanisms to increase awareness on this topic.

Materials and methods

Search strategy

Because we considered that this methodological review lacked an outcome of direct clinical relevance, we did not record the protocol on PROSPERO. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.19 We performed a search of MEDLINE via PubMed on 3 September 2017 to identify rheumatology papers published in the five journals with the highest impact factors (according to Journal Citation Reports, Clarivate Analytics) in rheumatology (Annals of the Rheumatic Diseases, Arthritis & Rheumatology, Rheumatology (Oxford), Seminars in Arthritis and Rheumatism, Osteoarthritis and Cartilage) and general and internal medicine (NEJM, Lancet, JAMA, BMJ, Annals of Internal Medicine). We used different terms referring to observational studies and survival analysis to obtain the highest sensitivity. The full search strategy is found in the online supplementary file.

Eligibility criteria

Types of studies. Observational studies exploring the effect of one or more interventional strategies on a time-to-event outcome. We excluded case–control studies, reviews, comments, editorials, letters, meta-analyses, network meta-analyses and studies with a survival analysis not comparing two exposure groups.

Types of participants. Participants of any age having a rheumatic disease.

Types of interventions. Pharmacological or non-pharmacological treatments.

Types of outcomes. We excluded studies with drug retention/discontinuation as an outcome.

Data collection

Two researchers (MI, CR) independently checked each title and abstract to exclude irrelevant articles and then independently examined the full-text articles to determine eligibility. Consensus was reached by discussion in case of disagreement. A third reviewer was available in case of unresolved disagreement. We documented the primary reason for exclusion of full-text articles.

Data extraction and management

One author (MI) extracted the data by using a standardised form, and a second author (CR) checked the extracted data. Disagreements were discussed to reach consensus. The complete list of study characteristics extracted can be found in the online supplementary file. We considered primary outcome(s) labelled ‘primary’ by authors or the first outcome presented in the Results section. A study was considered industry funded if the sponsor or one of the collaborators was industry.

Assessment of the risk of time-dependent biases

Each eligible full-text article was independently checked for the presence of time-dependent biases by two of the authors (MI, RP). The risk of ITB was evaluated according to the criteria proposed by Levesque et al 8 consisting of the following six questions: (1) Was treatment status determined after the start of follow-up or defined using follow-up time? (2) Was the start of follow-up different for the treated and comparator group relative to the date of diagnosis? (3) Were the treatment groups identified hierarchically? (4) Were subjects excluded on the basis of treatment identified during follow-up? (5) Was a time-fixed analysis used?

Each study was then classified as free from ITB or at risk of ITB. Studies were classified at risk of misclassified ITB if the period of immortal time was incorrectly attributed to the treated group by a time-fixed analysis, or at risk of excluded ITB if the immortal time was excluded from the analysis (figure 1).8 9

A study was considered at risk of TLB if (1) two or more drugs prescribed at different disease stages were compared, (2) the study outcome was considered potentially confounded by disease duration, and (3) the analysis was not adjusted for disease duration or prior medication use. A drug was considered second line, third line, and so on according to statements by the authors or if the drug is known to be prescribed after failure of first-line treatments.20–25

Studies were definitively classified at risk of ITB/TLB only when consensus was reached. We provided a description of the main features of the studies at risk of bias, the biased analysis performed and information on which direction the bias could have modified study conclusions.

Data analysis

Data are summarised as number (%) for qualitative variables and median (range) for continuous variables. Characteristics of biased and unbiased papers were compared by Fisher’s exact test or Mann-Whitney test, as appropriate.


Among the 1029 records retrieved, 78 articles were selected (figure 3). Each study included a median of 6393 (range 49–341 749) patients. The studies were mostly published in rheumatology journals (n=71; 91%), were not industry funded (n=48; 62%) and had safety primary outcomes (n=46; 59%) (table 1). Overall, 13 (17%) studies were considered at risk of time-dependent biases: 8 (10%) were at risk of ITB and 5 (6%) were at risk of TLB. Factors found to be associated with the risk of time-dependent biases were the continent of origin of the corresponding author (p=0.002), the impact factor of the journal (p=0.033) and the disease investigated (p=0.0002) (table 1).

Figure 3

Flow chart of study selection.

Table 1

Main characteristics of the included studies at risk or not of immortal time bias (ITB) or time-lag bias (TLB)

Studies at risk of ITB

Eight (10%) studies26–33 were considered at risk of misclassified ITB and none were at risk of excluded ITB. Most of the studies at risk of ITB were published in journals with a median (IQR) impact factor of 4.8 (4.6–9.8), investigated treatment for systemic connective tissue diseases or vasculitis (n=7) and were non-industry funded (n=7). The statistical tools more commonly used were time-fixed Cox regression and Kaplan-Meier curves (table 2).

Table 2

Main features of studies at risk of ‘misclassified’ immortal time bias

In studies with a hierarchical treatment exposure model (ie, treated vs untreated), misclassified ITB occurred when (1) waiting time to receive treatment was accounted for in the treatment group (n=5 studies)26 27 29 30 33; (2) treatment status was defined as cumulative dosage reached/number of treatments received (n=2 studies)28 32; and (3) treatment status was defined as at least one prescription dispensed (n=1 study).31 In all these cases, the hazard of the outcome was likely underestimated in the exposure group and overestimated in the comparator, which led to an exaggerated beneficial effect or a lowered risk of harms for the treated group.4 A description of mechanisms leading to bias and its potential effects on study results is found in table 2.

In an additional 12 studies, the authors investigated differences between a cohort of new users and a comparator cohort of prevalent user patients. Although (1) the start of follow-up was different between the two groups (start of treatment for treated patients, entry in the cohort for comparator), (2) treated patients had received the comparator drug before entry in the study, and (3) person-time on the comparator drug for treated patients was excluded from the analysis, we did not classify these studies at risk of excluded ITB because the exclusion of person-time occurred in both treatment and comparison cohorts. Actually, patients who initiated the comparator drug and experienced an event of interest before being included in the cohort never entered the study, which led to underestimating the event rate also in this group. We checked with numerical simulations that no bias was induced by this design (data not shown).

Studies at risk of TLB

We considered 5 (6%) studies at risk of TLB: these studies compared the risk of cancer34–37 or tuberculosis38 development in patients treated with conventional synthetic disease-modifying antirheumatic drugs (DMARD) versus biologic DMARDs. In four studies,34 35 37 38 the authors did not adjust for disease duration or past medication exposure, and in one study36 the two groups were propensity score matched but only for cumulative steroid dose. Table 3 summarises the main features of these studies.

Table 3

Main features of the studies at risk of time-lag bias


We found ITB or TLB in about one in six observational studies of rheumatology published in leading journals. ITB was always due to ‘misclassification’ rather than ‘exclusion’ of the immortal time. TLB was observed when patients receiving conventional synthetic DMARDs were compared with those on biologic therapy without accounting for disease duration or previous drug intake.

Some examples taken from the papers included in the present review can help better identify ITB in published literature. A misclassification of the immortal time should be searched in the following cases. (1) When there is a waiting time between the start of follow-up ( ie, time of diagnosis, entry in the cohort) and treatment initiation , and such time is incorrectly attributed to the exposed group. For example, in one of the studies classified at risk of ITB,29 patients were defined as tacrolimus ‘treated’ if they had received tacrolimus as maintenance therapy within 28 days from the first immunosuppressor used to induce remission. Because follow-up started after the achievement of remission for both treated and untreated patients, the waiting time to receive tacrolimus, by definition ‘immortal’ for tacrolimus-treated patients, was wrongly attributed to this group, thereby potentially conferring to tacrolimus a spurious protective advantage on relapse prevention. (2) When the exposure is handled as ‘ ever’ or ‘ never’ drug intake over follow-up. An example is one study30 investigating the impact of traditional Chinese medicine on survival of patients with systemic lupus. The authors classified patients as traditional Chinese medicine ‘users’ or ‘never users’ if they had ever (or never) received such treatment within 3 years from study entry. In line with the previous example, the wrong attribution of the immortal time (from the start of follow-up to the introduction of treatment for ‘ever-treated’ patients) likely led to underestimating the rate of deaths in ‘treated’ patients and overestimating it in ‘untreated’ patients. (3) When a given duration of drug use or a given cumulative dose is required for a participant to be classified as exposed. In one study,28 patients were classified according to their cumulative dose of hydroxychloroquine (<129 or ≥129 g) over the follow-up. Receiving more hydroxychloroquine was associated with a 74% reduction in risk of diabetes developing. Again, this apparent longer survival free from diabetes is at least in part an artefact of the wrong attribution of the immortal time to patients with higher hydroxychloroquine cumulative intake.

Moreover, even if we did not find any case of excluded ITB, attention should be paid to avoiding discarding the immortal time from the analysis (figure 1, b vs c).

Several methodological aspects should be considered when performing comparative effectiveness observational studies. In an ideal scenario, a treatment should be compared with another having the same indications and that might be used interchangeably.39 However, if we want to compare drugs given at different disease stages (ie, patients on first-line vs second-line drugs), some points should be considered to avoid time-related biases, in particular TLB. First, different disease durations can be associated with the outcome of interest. Second, if the exposure to a first-line treatment is associated with the development of the outcome, even if after a long period, the attribution of the event to the first-line or second-line drug becomes challenging. In this setting, statistical analyses taking into account latency time windows and disease duration are needed.7

This study has some limitations. First, the choice to select high-impact journals could have underestimated the presence of time-related biases in the literature. Second, despite we tried to maximise the sensitivity of our search, we could have missed studies not providing in the title or abstract the keywords used. Moreover, we could identify which direction the bias could have affected study results but not quantify its impact on point estimates. Indeed, the potential impact of ITB on treatment effect estimates depends on the respective amounts of misclassified/excluded and correctly classified person-time, as well as number of events. Methods to quantify the magnitude of ITB require those data being detailed.40 41 Unfortunately, none of the articles at risk of ITB presented all that information. We could anticipate that the choice of a 3-year waiting time to define treatment exposure in the study from Ma et al 30 could have impacted more the estimation of the treatment effect than the choice to classify patients as receiving or not maintenance therapy within 28 days from the achievement of remission in the study from Kurita et al. 29

In conclusion, time-related biases are common in the rheumatology literature and can be avoided by using appropriate study designs and statistical analyses such as time-dependent Cox regression or landmark analysis.42 Attention should be paid to avoid a hierarchical treatment exposure based, for example, on (1) ‘ever’ or ‘never’ intake of the drug, (2) treated designation if the drug has been received for at least a given period, and (3) treated designation if a given cumulative dosage intake of the drug has been reached. Moreover, the comparison of groups of patients at different disease stages should consider differences of disease duration, if they are believed to potentially confound the study outcome. A better recognition of such biases could help improve study designs and the interpretation of published observational studies.


The authors thank Laura Smales for proofreading the manuscript.



  • Handling editor Josef S Smolen

  • Contributors MI, RP and PR contributed to the conception and design of the work. MI and CR contributed to the acquisition of data. MI, RP, CR and PR contributed to the analysis and interpretation of data. MI, RP, CR and PR helped draft the paper and critically revised it for important intellectual content. All the authors gave final approval of the version submitted.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All the data collected are available upon request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.