Article Text

Extended report
Gene–environment interaction between HLA-DRB1 shared epitope and heavy cigarette smoking in predicting incident rheumatoid arthritis
  1. E W Karlson1,
  2. S-C Chang2,
  3. J Cui1,
  4. L B Chibnik1,
  5. P A Fraser1,3,4,
  6. I De Vivo2,5,
  7. K H Costenbader1
  1. 1
    Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
  2. 2
    Harvard School of Public Health, Boston, Massachusetts, USA
  3. 3
    Immune Disease Institute, Boston, Massachusetts, USA
  4. 4
    Genzyme Corporation, Boston, Massachusetts, USA
  5. 5
    Channing Laboratory, Brigham and Women’s Hospital, Boston, Massachusetts, USA
  1. Correspondence to E W Karlson, Brigham and Women's Hospital, 75 Francis Street, Boston, Massachusetts 02115, USA; ekarlson{at}


Background: Previous studies have reported an interaction between ever cigarette smoking and the presence of the human leukocyte antigen (HLA)-DRB1 shared epitope (SE) genotype and rheumatoid arthritis (RA) risk. To address the effect of dosage, a case-control study nested within two prospective cohorts to determine the interaction between heavy smoking and the HLA-SE was conducted.

Methods: Blood was obtained from 32 826 women in the Nurses’ Health Study and 29 611 women in the Nurses’ Health Study II. Incident RA diagnoses were validated by chart review. Controls were matched for age, menopausal status and postmenopausal hormone use. High-resolution HLA-DRB1 genotyping was performed for SE alleles. HLA-SE, smoking, HLA-SE* smoking interactions and RA risk, were assessed using conditional logistic regression models, adjusted for age and reproductive factors. Additive and multiplicative interactions were tested.

Results: In all, 439 Caucasian matched pairs were included. Mean age at RA diagnosis was 55.2 years; 62% of cases were seropositive. A modest additive interaction was observed between ever smoking and HLA-SE in seropositive RA risk. A strong additive interaction (attributable proportion due to interaction (AP) = 0.50; p<0.001) and significant multiplicative interaction (p = 0.05) were found between heavy smoking (>10 pack-years) and any HLA-SE in seropositive RA risk. The highest risk was in heavy smokers with double copy HLA-SE (odds ratio (OR) 7.47, 95% CI 2.77 to 20.11).

Conclusions: A strong gene–environment interaction was observed between HLA-SE and smoking when stratifying by pack-years of smoking rather than by ever smoking. Future studies should assess cumulative exposure to cigarette smoke when testing for gene–smoking interactions.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Rheumatoid arthritis (RA), an autoimmune disease of unknown aetiology, affects approximately 1% of the adult population.1 Genetic and environmental factors are thought to interact in RA development. Epidemiological research has demonstrated a strong association between cigarette smoking and RA risk.2 3 4 5 6 7 8 9 10 11 12 13 14 In the Nurses’ Health Study (NHS), we have found that RA risk is significantly elevated among women with >10 pack-years of smoking, and a strong dose-response exists.12 14

The strongest genetic risk factor for RA is found within the human leukocyte antigen (HLA) complex (or major histocompatibility complex antigen (MHC)). Within the HLA class II region, multiple HLA-DRB1 alleles are associated with RA.15 16 17 In individuals of European ancestry, the associated HLA-DRB1 alleles share a region of sequence similarity or “shared epitope” (SE) at amino acid positions 70–74 in the third hypervariable region of the HLA-DRB1 molecule.18 Smoking and HLA shared epitope (HLA-SE) genotypes interact to increase risk of seropositive but not seronegative RA in several studies.13 19 20 21 However, the dose effect aspects of this interaction have not been studied. One study of three North American RA cohorts22 did not demonstrate a significant interaction between ever smoking and HLA-SE in predicting anti-cyclic citrullinated protein (CCP) antibodies or rheumatoid factor (RF) among RA cases.19

We studied the interaction between HLA-SE alleles and smoking dose among women in a case-control study nested within two large prospective cohort studies, the Nurses’ Health Studies. We aimed to determine whether heavier smoking was associated with a stronger gene–environment interaction than was ever smoking.


Study population

The NHS is a prospective cohort of 121 700 female nurses, aged between 30–55 years in 1976. From 1989 to 1990, 32 826 (27%) NHS participants ages 43–70 provided blood samples. The Nurses’ Health Study II (NHSII) is a similar cohort, with 116 608 female nurses aged between 25–42 in 1989. Between 1996 and 1999, 29 611 (25%) of the women participating in NHSII cohort, aged 32–52 at that time, provided blood samples. The demographics and exposure characteristics of participants who provided blood samples are similar to those of the overall cohorts.23 24 All aspects of this study were approved by the Partners’ HealthCare Institutional Review Board.

Identification of rheumatoid arthritis

As previously described,14 we confirmed self-reports of RA based on presence of RA symptoms on a connective tissue disease screening questionnaire (CSQ),25 and, medical record review for four or more of the seven American College of Rheumatology (ACR) classification criteria for RA.26 We included a small number of subjects (n = 14) with agreement by two rheumatologist reviewers on diagnosis of RA, three documented ACR criteria for RA and a diagnosis of RA by their doctor. The response rate to requests for the CSQ among RA self-reports was 77%, and 96% to requests for medical records.

Population for analysis

For cases and controls, we excluded women who reported any cancer (except non-melanoma skin cancer) at baseline or during follow-up. Each participant with RA was matched by year of birth, race/ethnicity, menopausal status and postmenopausal hormone use to a single healthy woman in the same cohort without RA. To minimise potential population stratification, we limited the analyses to Caucasian women.

DNA extraction and amplification

DNA was extracted from buffy coats and processed via the QIAmp (Qiagen, Chatsworth, California, USA) 96-spin blood kit protocol as previously described.27 All genomic DNA samples had an aliquot put through a whole genome amplification protocol using the GenomPhi DNA amplification kit (GE Healthcare, Piscataway, New Jersey, USA) to yield high quality DNA sufficient for HLA genotyping.

Seropositive phenotyping

We collected information on RF from medical records reviewed from the date of RA diagnosis. We did not have records from later in the disease course, or information on CCP as cases were diagnosed prior to its widespread use. For a subset of 180 NHS and 41 NHSII RA cases, plasma samples were collected in 1989 and a second set of plasma samples collected in 2000. In all, 98 samples were collected before RA diagnosis (incident samples) and 123 were collected after diagnosis (prevalent samples). We used the DIASTAT CCP (Axis-Shield Diagnostics, Dundee, UK) second-generation test, a semiquantitative/qualitative ELISA for detection of IgG CCP antibodies. A CCP antibody titre >5 U/ml was considered positive according to the manufacturer’s established threshold. Since prior work from Sweden has demonstrated gene–environment interactions between HLA-SE and smoking for RF-positive13 and CCP-positive RA,19 we created a combination phenotype based on RF results from the medical record supplemented by CCP results from plasma samples where available, as “ever seropositive” versus “never seropositive”.

HLA-SE determination

Low-resolution HLA-DRB1 genotyping was performed using polymerase chain reaction with sequence specific primers (PCR-SSP) using OLERUP SSP kits (Qiagen, West Chester, Pennsylvania, USA). We used primers to amplify DNA samples that contained sequences for HLA-DRB1*04, *01,*10 and *14, along with consensus primers and appropriate positive and negative control samples. For samples with positive two-digit HLA signals, sequence specific primers were used for high-resolution four-digit shared epitope allele detection of DRB1*0401, *0404, *0405, *0101, *0102, *1402 and *1001. OLERUP SSP computer software (Qiagen) was used to determine four-digit HLA types.


Information was collected via prospective biennial subject questionnaires regarding diseases, lifestyle and health practices. Reproductive covariates were chosen based on associations between reproductive factors and the RA risk in this cohort.28 Lifetime smoking history was collected at baseline and updated data concerning current smoking and number of cigarettes smoked a day were collected every 2 years. Data on smoking, parity, total duration of breast feeding, menopausal status and postmenopausal hormone use were selected from the cycle prior to the RA diagnosis date (or index date in controls). Smoking was categorised as: (1) never versus ever and (2) pack-years of smoking (product of years of smoking and packs of cigarettes per day). Pack-years were dichotomised as never or light smoking versus heavy smoking ⩽10 vs >10 pack-years based on epidemiological data from this cohort that demonstrate increased RA risk with >10 pack-years of smoking.14 We further investigated three smoking categories (never, past, current) and three categories of pack-years (⩽10, 10–20 and >20 pack-years).

Statistical methods

We verified the Hardy–Weinberg equilibrium for each genotype among controls in each nested case-control dataset. We calculated means with standard deviation and medians with range for continuous covariates stratified by cohort and case/control status. For categorical covariates, we calculated frequencies and percentages. SAS V.9.1 was used for all analyses (SAS Institute, Cary, North Carolina, USA). Distributions for HLA-SE among cases and controls were compared using the χ2 test of independence. Conditional logistic regression analyses, conditioning on matching factors and adjusting for age at menarche, menstrual regularity, parity, breastfeeding duration, menopausal status and postmenopausal hormone use, tested the association between HLA-SE alleles and RA risk in a general model and in a dominant model. Unconditional logistic regression analyses, adjusting for matching factors and covariates, were used to examine the risks of seropositive and seronegative RA.

Analyses of interaction

We used an additive models of interaction based on disease rates connected to the “pie model”.29 Rothman showed that independent risk factors adhere to an additive model and that biological interaction results in departure from additivity of the disease rates (see supplementary material). To test for additive interactions we followed the methods outlined by Lundberg30 and Andersson,31 using a 2×2 factorial design to calculate the attributable proportion due to interaction (AP), the relative excess risk due to interaction (RERI) and the synergy index (SI). A p value of <0.05 for AP was considered as departure from an additive model of association. For models where HLA-SE, pack-years of smoking and smoking status were categorised into three categories, we calculated indices of additive interaction for each stratum of exposure compared to the referent category of non-exposure. The 95% confidence intervals (CIs) were calculated using the delta method as described previously,32 which is a straightforward Taylor expansion of the variances and covariances to derive a probability distribution. Multiplicative interaction was assessed by adding an interaction variable (HLA-SE* smoking) to the regression models. A p value of <0.05 was considered as evidence for departure from a multiplicative model of association.


A total of 439 pairs of Caucasian women, each pair being 1 RA case and a matched control, were included. The cases in the NHS had a mean (SD) age of 56.7 (9.4) years, compared to 43.1 (5.1) years in the younger NHSII cohort, due to the different ages targeted for enrolment in each of the cohorts (table 1). Otherwise, the cases were similar in terms of RA characteristics with 61% seropositive RA in NHS and 65% seropositive in NHSII.

Table 1

Characteristics of rheumatoid arthritis (RA) cases and matched controls in the Nurses’ Health Study (NHS; 1976–2002) and the Nurses’ Health Study II (NHSII; 1989–2003)

Table 1 shows the distribution of covariates for the RA cases and their matched controls at the time of RA diagnosis (or index date for the controls). A higher proportion of RA cases and controls were postmenopausal at RA diagnosis in NHS compared to NHSII cohorts. In NHSII a slightly higher percentage of women with RA were parous compared to their matched controls (93.9% and 85.7%), but not in the NHS cohort (91.0% of RA cases and 94.4% of controls).

HLA-SE genotype distributions did not deviate from Hardy–Weinberg equilibrium. Overall, genotyping call rates were 98.5% for HLA-SE. The frequency of the HLA-SE was significantly higher among RA cases than controls (χ2 with 1 degree of freedom, p<0.001 for pooled NHS/NHSII cohorts). In all, 49 (12.8%) NHS cases had 2 copies of the HLA SE allele as compared to 24 (6.3%) controls (p<0.001). Similar results were seen in NHSII participants with nine (18.4%) cases having two copies of the HLA-SE allele versus one (2.1%) in the controls (p = 0.03). The most common HLA-SE alleles in RA cases were 0401 (13.9%), 0404 (5.4%) and 0101 (9%).

Table 2 includes the results of conditional logistic regression analyses for RA risk associated with HLA-SE for all RA and from unconditional logistic regression analyses stratified by seropositivity. The adjusted model includes pack-years of cigarette smoking, age at menarche, regularity of menses, parity, breast feeding, menopausal status and postmenopausal hormone use. RA risk associated with a single copy of HLA-SE was elevated (odds ratio (OR) 1.60, 95% CI 1.16 to 2.22) and with a double copy was markedly elevated (OR 3.78, 95% CI 2.13 to 6.71). These effects of HLA-SE were limited to seropositive RA (double HLA-SE OR 4.41, 95% CI 2.53 to 7.68) with no significant association with seronegative RA.

Table 2

Association between HLA-SE and rheumatoid arthritis (RA) risk in the Nurses’ Health Studies with stratification by serological status

Interaction results

Table 3 shows the results of analyses in which we tested for additive and multiplicative interactions between HLA-SE and smoking, categorised as ever/never smoking or dichotomised at ⩽10 or >10 pack-years of smoking. There was a 2.14-fold increased risk of RA (95% CI 1.39 to 3.29) for ever smokers who carried any HLA-SE compared to the referent group, never smokers with no HLA-SE, however, there was no evidence for gene–environment interaction. There was a modest additive but not multiplicative gene–environment interaction between ever smoking and the presence of the HLA-SE allele, with the proportion of risk due to additive interaction (AP) of 0.38 (95% CI 0.05 to 0.70, p = 0.02) for seropositive RA. In contrast, a 2.75-fold elevated risk of RA (95% CI 1.75 to 4.31) and a 3.6-fold elevated risk of seropositive RA (95% CI 2.26 to 5.78) were observed among heavy smokers (>10 pack-years) with any HLA-SE compared to the referent group (⩽10 pack-years without HLA-SE). We observed a significant additive, but not multiplicative, interaction between heavy cigarette smoking and the presence of the HLA-SE allele, with the proportion of risk due to additive interaction (AP) of 0.39 (95% CI 0.08 to 0.69, p = 0.01) for RA. A stronger additive interaction between heavy cigarette smoking and HLA-SE allele was observed for seropositive RA, with AP of 0.50 (95% CI 0.24 to 0.77, p<0.002), and a significant multiplicative interaction term (p = 0.05).

Table 3

Gene–environment interaction of human leukocyte antigen shared epitope (HLA-SE) and smoking in the Nurses’ Health Studies

When stratified by number of copies of HLA-SE there was strong evidence for increasing risk of RA with each copy of HLA-SE among heavy smokers, with a 6.6-fold increased risk of RA, (95% CI 2.49 to 17.46), for heavy smokers with 2 copies of HLA-SE as compared to the reference group, with evidence for additive interaction (table 4). The strongest evidence for additive interaction was for HLA-SE with heavy smoking in seropositive RA (OR 7.47, 95% CI 2.77 to 20.11), with a borderline multiplicative interaction (p = 0.07). Among HLA-SE subtypes, the only interaction was between the 0401 allele and heavy smoking, with evidence for additive interaction (p = 0.005) for all RA. The HLA-SE subtypes of 0401 and 0101 demonstrated significant additive interaction with heavy smoking (p = 0.003 and 0.01, respectively) for seropositive RA. There was no evidence for multiplicative interactions between HLA-SE subtypes and heavy smoking (data not shown).

Table 4

Gene–environment interaction between HLA-SE (none, single and double copies) and heavy smoking in all rheumatoid arthritis (RA) groups and in groups stratified by serological status in the Nurses’ Health Studies

When stratifying pack-years into three categories, the highest odds for RA were in the 10–20 pack-year and HLA-SE group in all RA and seropositive RA analyses (table 5). Comparing each stratum to the referent demonstrated significant additive interactions for 10–20 pack-year stratum (p = 0.008) for all RA and for 10–20 (p = 0.003) and for >20 year strata (p = 0.002) for seropositive RA with a borderline multiplicative interaction (p = 0.09). When stratifying smoking status into never, past, or current (table 6), there was little evidence for interactions except a modest additive interaction between past smoking and HLA-SE for seropositive RA (p = 0.04).

Table 5

Gene–environment interaction between HLA-SE (none, any) and pack-years of smoking (⩽10, 10–20, >20) in all rheumatoid arthritis (RA) groups and in groups stratified by serological status in the Nurses’ Health Studies

Table 6

Gene–environment interaction between HLA-SE (none, any) and smoking status (never, past, current) in all rheumatoid arthritis (RA) groups and in groups stratified by serological status in the Nurses’ Health Studies


In this nested case-control study of women, we demonstrate significant additive and multiplicative interaction between the HLA-DRB1 shared epitope and heavy cigarette smoking of at least 10 pack-years. The observed interaction between HLA-SE and smoking was strongest for seropositive RA with little evidence for association with seronegative RA. Evidence for interaction was less evident when smoking status was analysed as never/ever smoking or never/past/current, suggesting the importance of considering cumulative “dose” of smoking when testing for gene–environment interaction in RA.

Interactions between HLA-SE alleles and smoking in RA risk have been demonstrated in several large epidemiological studies. In the Epidemiologic Investigations in Rheumatoid Arthritis (EIRA) study, a strong additive interaction between HLA-SE and smoking was demonstrated for RF-positive and anti-CCP-positive RA, but not for seronegative RA13 19; a 21-fold increased risk of CCP-positive RA was observed among smokers carrying a double copy of HLA-SE.19 This finding suggests that these two important risk factors may interact along one or more biological pathways.29 The statistical interaction between smoking and HLA-SE alleles in CCP-positive RA is consistent with the hypothesis that cigarette smoking modulates the immunogenicity of citrulline and related peptides in individuals with specific HLA alleles. This hypothesis has been strengthened by demonstration that smoking can cause citrullination of peptides in lung macrophages19 and is associated with an increased expression of peptidyl arginine deiminase 2 (PADI2) in bronchoalveolar cells.33 In HLA-DRB1 0401 transgenic mice, citrullination of certain peptides increases binding to HLA class II molecules with the SE, triggering immune responses to citrullinated peptides.34

Evidence for gene–environment interaction between HLA-SE and smoking in seropositive RA risk was seen in patients with undifferentiated arthritis at the Leiden Early Arthritis Clinic. Among the participants who were HLA-SE positive, ever smoking significantly increased the OR for development of CCP-positive RA from 3.3 to 8.0 (p = 0.002 for multiplicative interaction)21; HLA-SE and ever smoking also increased the OR of CCP antibodies, with evidence for additive but not multiplicative interaction.20 35 A Danish case-control study of HLA-SE interactions with RA epidemiological risk factors, did not demonstrate any significant multiplicative interaction for HLA-SE* smoking, however, HLA-SE homozygotes had a 52-fold increased risk of CCP-positive RA compared to non-carrier/never smokers. Testing for additive interaction was not performed. In contrast, a significant HLA-SE* smoking interaction was demonstrated in the Iowa Women’s Health Study, an older Caucasian female cohort in which smoking was associated with increased risk of RA only among subjects who were HLA-SE negative, but not among subjects who were HLA-SE positive.36 The reasons for this discrepancy are unknown, although it may relate to the older age at RA onset or the small sample size (116 cases) in that cohort. A case-only analysis combining data from three large US populations was unable to confirm an interaction between HLA-SE alleles and cigarette smoking in relation to presence of CCP antibodies but smoking status was defined only as never/ever smoking.22 In one cohort with information on pack-years of smoking, there was an independent effect of smoking on the presence of CCP among heavy smokers (>20 pack-years).

Strengths of this study include the prospective collection of exposure information prior to the onset of RA, the detailed smoking data collected every 2 years and availability of high-resolution HLA genotyping. Limitations include the lack of data on CCP antibody status, as most cases were diagnosed prior to the widespread usage of this test, and absence of plasma samples for CCP testing in about half the cases. However, RF status was available from medical record reviews, and other gene–environment interaction studies demonstrate similar relationships for RF-positive and CCP-positive phenotypes.13 19 The rate of seropositive RA in this study (60%) is similar to that reported from a large US registry study, the National Databank (n = 14 000) with patients recruited from rheumatology practices across the US.37 Limited generalisability of NHS a concern, as the NHS cohorts are comprised of middle to older aged women with high educational levels and with primarily Caucasian heritage. However, an advantage of similar ethnic background in genetic studies is a lower potential for population stratification.

This study of gene–environment interactions in RA in a cohort of Caucasian US women demonstrates a significant additive and multiplicative interaction between the strongest genetic risk factor for RA, the HLA-SE, and heavy cigarette smoking >10 pack-years, the strongest environmental risk factor for RA, for seropositive but not seronegative RA. We demonstrate only additive interaction between smoking categorised as never/ever smoked or as three categories (never, past, current) with the HLA-SE and seropositive RA; however, if smoking is classified by dose (as pack-years), we demonstrate additive and multiplicative interaction for seropositive RA. This illustrates the importance of considering the dose effects of environmental and genetic factors in gene–environment interaction studies. Additionally, it lends evidence to the theory that seropositive and seronegative RA have different risk factors and pathogenic pathways.


The authors gratefully acknowledge the participants in the NHS for their continuing participation. The authors also thank Gideon Aweh, Karen Corsano, Wei-Zi Ding, Lingsheng Dong and Brendan Keenan for their technical assistance.


Supplementary materials

  • Web Only Data 69/1/54


  • ▸ Additional data (supplementary information) are published online only at

  • Funding Supported by NIH grants R01 AR49880, CA87969, P60 AR047782, K24 AR0524-01 and BIRCWH K12 HD051959 (supported by NIMH, NIAID, NICHD and OD). KHC is the recipient of an Arthritis Foundation/American College of Rheumatology Arthritis Investigator Award and a Katherine Swan Ginsburg Memorial Award.

  • Competing interests None declared.

  • Ethics approval Ethics approval was granted by the Partners Human Subjects Committee.

  • Provenance and Peer review Not commissioned; externally peer reviewed.