Identification of the NF-κB activating protein-like locus as a risk locus for rheumatoid arthritis
- Gang Xie1,3,
- Yue Lu2,
- Ye Sun1,3,
- Steven Shiyang Zhang1,
- Edward Clark Keystone3,
- Peter K Gregersen4,
- Robert M Plenge5,
- Christopher I Amos2,
- Katherine A Siminovitch1,3,6
- 1Mount Sinai Hospital Samuel Lunenfeld Research Institute and Toronto General Research Institute, Toronto, Ontario, Canada
- 2Department of Epidemiology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
- 3Rebecca MacDonald Centre for Arthritis, Department of Medicine, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, Canada
- 4Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, USA
- 5Division of Rheumatology, Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, USA
- 6Departments of Immunology and Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Correspondence to Dr Katherine A Siminovitch, Mount Sinai Hospital, Lunenfeld Research Institute and Toronto General Research Institute, 600 University Ave, Room 778D, Toronto, Ontario, Canada M5G 1X5;
- Received 23 May 2012
- Accepted 21 October 2012
- Published Online First 6 December 2012
Objective To fine-map the NF-κB activating protein-like (NKAPL) locus identified in a prior genome-wide study as a possible rheumatoid arthritis (RA) risk locus and thereby delineate additional variants with stronger and/or independent disease association.
Methods Genotypes for 101 SNPs across the NKAPL locus on chromosome 6p22.1 were obtained on 1368 Canadian RA cases and 1471 controls. Single marker associations were examined using logistic regression and the most strongly associated NKAPL locus SNPs then typed in another Canadian and a US-based RA case/control cohort.
Results Fine-mapping analyses identified six NKAPL locus variants in a single haplotype block showing association with p≤5.6×10−8 in the combined Canadian cohort. Among these SNPs, rs35656932 in the zinc finger 193 gene and rs13208096 in the NKAPL gene remained significant after conditional logistic regression, contributed independently to risk for disease, and were replicated in the US cohort (Pcomb=4.24×10−10 and 2.44×10−9, respectively). These associations remained significant after conditioning on SNPs tagging the HLA-shared epitope (SE) DRB1*0401 allele and were significantly stronger in the HLA-SE negative versus positive subgroup, with a significant negative interaction apparent between HLA-DRB1 SE and NKAPL risk alleles.
Conclusions By illuminating additional NKAPL variants with highly significant effects on risk that are distinct from, but interactive with those arising from the HLA-DRB1 locus, our data conclusively identify NKAPL as an RA susceptibility locus.
Rheumatoid arthritis (RA) is a chronic autoimmune disease primarily associated with inflammation of the synovial joints and affecting up to 1% of the population worldwide. Although the complex interplay of genetic and environmental factors underpinning RA are not well understood, major inroads have been made in mapping gene loci associated with risk for this disease.1 In addition to the HLA-DRB locus, over 35 non-major histocompatibility complex (MHC) RA risk loci have emerged from genome-wide association studies (GWAS) and subsequent GWAS meta-analyses of the GWAS datasets.2–10
Through a genome-wide scan of 2418 RA patients and 4504 healthy controls ascertained in Canada and the USA, we previously identified an association of RA with the REL NF-κB transcription factor locus and also confirmed already identified disease associations with the PTPN22, CTLA4, TNFAIP3, BLK and TRAF1/C5 genes.9 Our data also showed strongly suggestive signals (PGWAS values between 8.2×10−7 and 5.28×10−8) emanating from a cluster of Single nucleotide polymorphisms (SNP) across a 70 kb region on chromosome 6p22.1 encompassing the NF-κB activating protein-like gene (NKAPL) as well as three Zinc finger protein transcription factors ZNF193, ZNF307 and ZNF187. A follow-up GWAS meta-analysis of 5505 RA and 22 603 controls of European descent10 also revealed 13 genotyped or imputed SNPs across a 150 kb region encompassing the NKAPL locus to be strongly associated (Pmeta between 1×10−10 and 1×10−13) with RA (R Plenge, personal communication). Although NKAPL functions are unknown, its protein product shows 90% sequence similarity to NF-κB activating protein (NKAP), a protein implicated in NF-κB-mediated transcriptional activation of TNF and IL-1.11 As data from our group and others have also implicated other genes from the NF-κB signalling pathway (eg, REL, CD40, TRAF1, TNFAIP3, PRKCQ and TNFRSF14) in RA susceptibility,9 ,12 ,13 the NKAPL gene represents a compelling potential candidate gene for RA. We therefore undertook fine-mapping studies of the NKAPL locus aimed at confirming this association, identifying those variants providing the strongest association signal and defining whether such variants act together or independently of one another and/or the HLA-DRB1 locus in conferring risk for RA.
Study cohorts (see online supplementary methods) include: 3979 subjects of European origin (2078 RA patients and 1901 healthy controls) recruited independently from two clinical centres in Canada, Toronto (1368 cases and 1471 healthy controls) and Halifax (710 cases and 430 healthy controls) and a third cohort including 2064 subjects of European ancestry ascertained in the USA as part of the Brigham Rheumatoid Arthritis Sequential Study and used here for replication analysis.
SNPs from a 372 kb interval across the NKAPL locus on chromosome 6p22.1 were selected primarily based on at least one of the following criteria: (1) HapMap phase III data identifying the SNP as a tag SNP with minor allele frequency >0.01 and r2 threshold of 0.8 or (2) localisation within 150 kb upstream or downstream of SNPs most significantly associated with RA in our GWAS. Other SNPs studied were: the autoimmune disease-associated PTPN22 rs247660114 and two SNPs (rs660895 and rs6910071) that tag the HLA-DRB1*0401 allele on chromosome 6p21.3.15
Hardy–Weinberg equilibrium, allelic association and conditional logistic regression analyses were performed using PLINK software V1.07 (http://pngu.mgh.harvard.edu/purcell/plink/). For the allelic association tests, the threshold for declaring significance was assigned according to Benjamini and Hochberg's False Discovery Rate method and set at p<5.00×10−4 (0.05/101). Cochran-Mantel Haenszel χ2 analysis was used to combine p values and calculate OR from the Canadian and US cohorts and an R-script (http://www.rproject.org/) was used to generate figures. Haplotype block structure, depicted using Haploview software V4.1 (http://www.broad.mit.edu/mpg/haploview), was defined according to the criteria established by Gabriel16 and the pairwise estimates of standardised Lewontin's disequilibrium coefficient (D′), whereas the linkage disequilibrium (LD) among pairs of SNPs was characterised according to the square of the correlation coefficient (r2). Conditional logistic regression analyses of multiple markers were performed using SAS V9.13 (SAS Institute Inc., Cary, North Carolina, USA). Gene–gene interaction analysis was performed by case-only interaction analysis in which a logistic regression model was used to test for an association of shared epitope (SE) positivity with NKAPL risk alleles (coded in an additive fashion as −1, 0 or 1 for no, 1 or 2 risk alleles, respectively). For multinomial logistic regression modelling,17 controls were considered as the lowest risk outcome, SE negative cases as the intermediate risk outcome and SE positive cases as the highest risk outcome, and these multiple outcomes were then assessed according to number of NKAPL risk alleles. The statistical power for this study was evaluated using CaTS software (http://www.sph.umich.edu/csg/abecasis/CaTS/) with the following parameters: disease prevalence 0.01, disease allele frequency 0.2, α=0.0005 (0.05/101). Power to detect associations with relative risk of 1.5 was estimated to be 99.4%.
Fine-mapping of the RA-associated NKAPL locus at 6p22.1
To identify risk allele(s) at the NKAPL locus, we genotyped 1368 RA cases and 1471 controls from Toronto for 105 SNPs across a 372 kb genomic region encompassing the NKAPL gene. Characteristics of the study design and subjects are outlined in online supplementary figure S1. Among the 101 SNPs that passed quality control, 16 achieved the set significance threshold of p<5.00×10−4 with the top six markers showing associations with disease (p<6.00×10−7) that remained highly significant (p values 1.80×10−6–8.60×10−6) after False Discovery Rate correction (table 1 and online supplementary table S1). Haploview analysis of pairwise LD among the 101 SNPs revealed that these six most strongly associated SNPs map within a 70 kb segment representing the middle of three haplotype blocks across this region and containing the NKAPL gene and three zinc finger transcription factor genes, ZNF193, ZNF307 and ZNF187 (figure 1 and see online supplementary figure S2). The strongest association signal (p=2.48×10−8) came from a ZNF187 intronic SNP (rs67998226) at the distal end of this haplotype block, but these variants were all in strong LD with one another, the pairwise LD (r2) between rs67998226 and either the three SNPs across the more proximal ZNF193 gene (rs13195291, rs35656932 and rs13204012) or the NKAPL promoter region SNP (rs13208096) being >0.94 and 0.83, respectively, and between rs35656932 and rs13208096 being 0.87.
To further examine effects of this locus on RA susceptibility, the six most significant SNPs were also typed in a Halifax-derived cohort including 710 RA patients and 430 controls. Four of the six associations (rs13195291, rs35656932, rs13208096 and rs67998226) were replicated in this cohort, with combined (Toronto and Halifax) association signals (PCAN) ranging from 5.60×10−8 to 2.22×10−9 (table 1).
The extent to which the signals observed in the combined analysis were independent of one another was next examined using stepwise logistic regression analysis wherein variables were iteratively added into an empty model. This analysis identified the ZNF193/NKAPL locus rs35656932 SNP, which is in strong LD (r2=0.96) with the ZNF187 rs67998226 SNP as the variant most strongly associated with risk for RA, but also suggested that both NKAPL rs13208096 and rs3656932 SNP alleles influence risk for RA (table 2). Additional conditional analyses of the six SNPs revealed that both rs35656932 and rs13208096 remain significantly associated with disease after conditioning on each marker (see online supplementary table S2). Because these data suggest independent contributions of alleles of rs35656932 and rs13208096 on risk, associations of these two variants with disease were next explored in a third (US) cohort including 863 cases and 1201 controls. The two SNPs both replicated in this cohort (table 3): combined analyses of all 2941 cases and 3102 controls typed for these SNPs (table 3) yielding highly significant Pcomb values (4.24×10−10 for rs35656932 and 2.44×10−9 for rs13208096).
Variants of the NKAPL locus and the HLA region jointly contribute to risk for RA
Because the NKAPL locus maps to chromosome 6p22.1, evaluation of its disease association may be confounded by effects of the HLA-DRB1 locus at 6p21.3, a locus which encodes the RA-predisposing SE, confers much of the genetic risk for RA, and maps in a region of extensive LD.15 ,18–23 Although the NKAPL gene lies 4306 kb upstream of the RA risk-related HLA-DRB1 gene, the extent to which its association with RA may reflect LD with HLA-DRB1 risk alleles was investigated by genotyping the Canadian cases (2078) and controls (1901) for two SNPs (rs660895 and rs6910071) that tag one of the most common SE encoding alleles, HLA-DRB1*0401, reassessing the NKAPL–disease association by logistic regression analysis with conditioning on the HLA-DRB1*0401 SNPs. This analysis confirmed the strong association of RA with both tag SNPs, the signals reaching p=2.04×10−75 and p=2.39×10−63 for rs660895 and rs6910071, respectively (see online supplementary table S3). Importantly, the association signals from each of the six NKAPL locus SNPs remained highly significant (p<1×10−10) after conditioning on the DRB1 tag SNPs (see online supplementary table S4). Analyses of the pairwise LD between the two tag SNPs and each of the 55 SNPs with nominal evidence (p<0.05) for disease association in the initial analysis of Toronto controls also revealed no significant LD (r2<0.01 for each SNP pair) between any of these SNPs and the HLA-DRB1 tag SNPs (data not shown). These findings imply that the association signal at the NKAPL locus represents an effect on risk that is not attributable to LD with the HLA-DRB1*0401 allele.
To further evaluate the relationship between NKAPL and HLA locus effects on risk, we additionally used data from a large panel of British subjects genotyped for HLA alleles to impute, in the subset of the Canadian cohort included in our previous GWAS,9 137 HLA alleles encoding classic HLA-A, B, C, DRB1, DQA and DQB molecules. Association tests were performed in a dataset combining the imputed alleles, and the GWAS-derived HLA region as well as the fine mapping-derived SNP genotypes. Single SNP association tests performed using this combined dataset and assuming an additive model (implemented in PLINK software) identified 379 SNPs (data not shown) or HLA genotypes with p values less than 1.0×10−4 (see online supplementary table S5). These variants together with the six candidate SNPs identified by fine mapping were subjected to stepwise logistic regression analysis using SAS and a p value set at <0.01 as the criterion to enter and remain in the model. From this analysis, seven SNPs (including rs35656932), but no classical HLA alleles were retained in the model, suggesting that risk for disease at this locus is better explained by effects of SNPs rather than HLA alleles (see online supplementary table S6A). Stepwise logistic regression was then repeated with HLA-DRB1*0401 forced into the model. After this analysis, five variants, including HLA-DRB1*0401 (p=0.04), were retained in the model, although this HLA allele did not reach our significance criterion of p<0.01 (see online supplementary table S6B). These data provide further evidence that NKAPL variant(s) per se contribute to risk for RA.
Relevance of HLA genotypes to the NKAPL locus effect on risk for RA was also explored by assessing the extent to which the NKAPL region rs35656932 and rs13208096 SNPs associate with disease in cases stratified based on presence or absence of the SE alleles or of anticyclic citrullinated peptide (anti-CCP) antibody, an autoantibody strongly associated with SE alleles.24 This analysis revealed the disease associations for both risk alleles to be much higher in the SE negative (p=1.20×10−12 for rs35656932 and 3.50×10−11 for rs13208096) than in the SE positive (p=4.70×10−5 and 8.10×10−5, respectively) subgroup (table 4). By contrast, the association signals from these loci were essentially the same in the anti-CCP positive and negative subsets. Because a disparity between SE and anti-CCP status effects on this association was not expected, the stratified subgroups were also genotyped for the PTPN22 gene rs2476601 SNP for which an association with RA susceptibility is well established and thought to correlate with SE and anti-CCP positivity.25–27 This analysis confirmed the strong association of RA with the rs2476601 variant (p=2.72×10−15; see online supplementary table S3) as well as the positive effects of the SE (p=1.40×10−16 SE positive vs 2.40×10−3 in SE negative cases)) and of anti-CCP antibody (p=1.50×10−11 in anti-CCP positive vs 5.10×10−2 in anti-CCP negative cases)) on this association. Thus, the PTPN22 risk allele is associated with SE positivity in an RA population in which NKAPL effects on risk are primarily observed in SE negative disease.
Effects of the NKAPL locus on risk were the strongest in the SE negative patients and increased when conditioning on HLA-DRB1 *0401 tag SNPs, suggesting interaction between NKAPL and the HLA-DRB1 risk alleles. This possibility was directly examined using case-only logistic regression models wherein SE positivity was considered the outcome and NKAPL genotype an additive effect. This analysis (table 5) revealed significant interactions between risks conferred by HLA-DRB1 alleles and the NKAPL rs35656932 and rs13208096 SNPs, with the interaction effect being negative for both the latter SNPs (OR=0.67/CI 0.54 to 0.83 and OR=0.69/CI 0.55 to 0.86, respectively). By comparison, a positive interaction effect was apparent between the RA-associated PTPN22 rs2476601 variant and SE positivity (OR=1.34, CI 1.08 to 1.65). Results of multinomial logistic regression analyses (see online supplementary table S7) further confirmed strong interaction of the HLA-DRB1 SE alleles with each of the NKAPL (p=4.35×10−10 and 2.51×10−11) and the PTPN22 (p=2.89×10−15) risk alleles and again revealed the ORs associated with either of the two NKAPL risk alleles to be lower, but for the PTPN22 risk allele to be higher, in a comparison of SE positive cases to controls versus SE negative cases to controls.
The NKAPL region emerged as a candidate RA risk locus in the context of our prior GWAS data providing strong evidence for association of SNPs across this locus with RA in a Canadian and US study population. Because the association was supported in subsequent meta-analysis incorporating this and four more GWAS datasets,10 we undertook fine mapping and conditional analyses to screen for risk variants with stronger and/or independent signals of disease association. By genotyping our Canadian RA case/control cohort for 101 SNPs across the locus, we have identified six variants for which association with RA reaches a conservative level of genome-wide significance (p<5.7×10−8). Haplotype analysis reveals these six variants to all lie within one of these haplotype blocks in a region encompassing the NKAPL and three zing finger transcription factor genes. These genes are in strong LD with one another, but results of stepwise and conditional logistic regression analyses indicate that both ZNF193 rs35656932 and NKAPL rs13208096 SNPs contribute to risk for RA, associations for these two markers remaining significant after conditioning for each other associated SNP and results of stepwise logistic regression also suggesting independent effects of these SNPs on risk. These two associations were also replicated, albeit at modest levels of significance, in an independent US-based cohort.
Interpreting effects of the NKAPL locus on RA risk is complicated by the location of this locus in a chromosomal region (6p22.1) upstream of the HLA class II genes. While the NKAPL locus maps about 4306 kb away from the HLA-DRB1 gene and 1386 kb upstream from the telomeric end of the HLA region, the extensive LD across the region raises the possibility that the NKAPL association signals reflect LD with HLA-DRB1 SE alleles. However, effects of NKAPL as well as DRB1 SNPs on disease risk were revealed here by the logistic regression analyses conditioning on either of two HLA-DRB1*0401 tag SNPs. Analysis of pairwise LD between each of the most strongly associated NKAPL SNPs and the HLA-DRB1 tag SNPs also provided no evidence for LD (r2<0.01) between SNPs at these respective loci and a stepwise logistic regression analysis combining the top six candidate SNPs from the fine-mapping study, 379 HLA region SNPs from the GWAS and the imputed HLA alleles further support contribution of the NKAPL locus to risk for RA. These findings are consistent with other data suggesting that the MHC locus contains loci in addition to HLA-DRB1 that confer risk to RA.28 Importantly, primary association of this locus with SE negative disease also implies interaction between the NKAPL and HLA-DRB1 risk alleles, a possibility supported by the results of case-only and multinomial logistic analyses showing very significant negative interaction effects between HLA-DRB1 and each of two NKAPL risk alleles. While both the NKAPL and the observed PTPN22-HLA-DRB1 interaction effects on risk require further evaluation in relation to their biological significance, these data provide new insights into the complex effector interactions that may link risk genotypes to disease pathogenesis in RA.
The current data identify two SNPs, rs13208096 and rs35656932, as the major drivers of the association signal at the NKAPL locus. Among these, rs13208096 maps 1787bp upstream of NKAPL gene expressed in many tissues, including most immune cell populations.29 The gene encodes a 402 amino acid nuclear protein highly homologous to the NKAP that functions as a transcriptional repressor of NOTCH signalling in thymocytes and is required for haemopoietic stem cell maintenance and survival.30 ,31 NKAPL is not functionally characterised, but shares with NKAP a domain critical to NKAP roles in transcriptional repression30 and recent data from genome-wide annotation of transcriptional regulators (ENCONE:UCSC genome browser: http://genome.ucsc.edu/) reveal the rs13208096 SNP to be located in a region containing putative transcriptional regulatory histone marks.
The second SNP that drives the association signal at the NKAPL locus, rs35656932, maps 1741bp upstream of the ZNF193 gene encoding the zinc finger transcription protein, ZNF193. Little is known about the functions of ZNF193 or the ZNF187 and ZNF307 proteins encoded by the nearby genes. However, ZNF307 has transcriptional repressor activity and appears to target NF-κB, raising the intriguing, albeit highly speculative possibility that several genes at the NKAPL locus influence RA risk via effects on NF-κB signalling.32 ,33
Thus, while the current data do not identify the disease-causal allele at the NKAPL locus, our findings provide compelling evidence that this locus confers risk for RA and that the variants accounting for this association signal emanate from highly plausible candidate genes likely to influence NF-κB as well as HLA-DRB1-modulated immune cell responses already implicated in RA risk and pathogenesis.
This work was supported by grants from the Canadian Institutes for Health Research (MOP74621) and Ontario Research Fund (RE-01-061). KAS holds the Sherman Family Chair in Genomic Medicine and a Tier 1 Canada Research Chair. PAG, CIA and YL were partially supported by US NIH Grant AR44422.
Contributors GX planned and performed experiments, aided in writing the paper. Yue Lu and YS carried out statistical work, aided in preparing table. SSZ carried out DNA preparation and PCR reactions. ECK and RMP provided patient samples and data, aided in writing the paper. PKG provided data, aided in writing the paper. CIA planned the paper, guided statistical analyses, aided in data interpretation and paper writing. KAS led the planning and performance of experiments, aided interpretation of data and writing of the paper.
Competing interests None.
Patient consent Obtained.
Ethics approval Mount Sinai Hospital Institutional Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/3.0/ and http://creativecommons.org/licenses/by-nc/3.0/legalcode