Article Text
Abstract
Objectives During the last years, genome-wide association studies (GWASs) have identified a number of common genetic risk factors for rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE). However, the genetic overlap between these two immune-mediated diseases has not been thoroughly examined so far. The aim of the present study was to identify additional risk loci shared between RA and SLE.
Methods We performed a large-scale meta-analysis of GWAS data from RA (3911 cases and 4083 controls) and SLE (2237 cases and 6315 controls). The top-associated polymorphisms in the discovery phase were selected for replication in additional datasets comprising 13 641 RA cases and 31 921 controls and 1957 patients with SLE and 4588 controls.
Results The rs9603612 genetic variant, located nearby the COG6 gene, an established susceptibility locus for RA, reached genome-wide significance in the combined analysis including both discovery and replication sets (p value=2.95E−13). In silico expression quantitative trait locus analysis revealed that the associated polymorphism acts as a regulatory variant influencing COG6 expression. Moreover, protein–protein interaction and gene ontology enrichment analyses suggested the existence of overlap with specific biological processes, specially the type I interferon signalling pathway. Finally, genetic correlation and polygenic risk score analyses showed cross-phenotype associations between RA and SLE.
Conclusions In conclusion, we have identified a new risk locus shared between RA and SLE through a meta-analysis including GWAS datasets of both diseases. This study represents the first comprehensive large-scale analysis on the genetic overlap between these two complex disorders.
- Rheumatoid Arthritis
- Systemic Lupus Erythematosus
- Gene Polymorphism
Statistics from Altmetric.com
Introduction
Rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) are autoimmune rheumatic diseases with a complex aetiology, in which both genetic and environmental factors are implicated in their development.1 ,2 RA is characterised by a chronic inflammation of the synovial joints, leading to damage of the articular cartilage and the underlying bone,1 while the main event in SLE is the production of antibodies against self-components of the cell nucleus which results in a variety of clinical manifestations.2
Although both diseases present different phenotypes, several lines of evidence point to a shared genetic component between them. It has been described the existence of familial aggregation for RA and SLE.3 In addition, genome-wide association studies (GWASs) performed during the last years have shown a genetic overlap between them, with a considerable number of loci implicated in both RA and SLE susceptibility.4 ,5 Moreover, gene expression studies have revealed common molecular mechanisms involved in the pathogenesis of these two conditions. In this regard, it has been described the presence of an interferon (IFN) signature (expression of genes inducible by type I IFNs), which is a major feature of SLE, in groups of patients with RA;6 furthermore, a percentage of patients with SLE have been found to have extensive joint damage known as rhupus syndrome.7
One of the main limitations of the association studies in autoimmunity is the difficulty in identifying genetic risk variants with modest effects, given the large sample size required and the relatively low prevalence of these diseases in the general population. This limitation has been partially overcome by combining GWAS data from different pathologies as a single phenotype, thus providing the statistical power lacking in GWAS datasets of a specific disease. This approach has already been successfully applied in the study of several autoimmune diseases with common genetic backgrounds, such as Crohn's disease (CD) and coeliac disease (CeD),8 CD and psoriasis,9 CeD and RA10 ,11 and SLE and systemic sclerosis.12 During the last few years, several known risk loci for SLE have been tested for association with RA, and conversely, through candidate gene studies;13–15 however, no comprehensive large-scale analysis of the genetic overlap between RA and SLE has been performed so far.
In order to identify novel shared risk loci between RA and SLE, we performed a combined meta-analysis including previously published GWAS datasets of both diseases to increase the statistical power of the study.
Methods
Study population
A total of 17 552 patients with RA, 4194 SLE cases and 46 907 controls of European origin were enrolled in the study. Figure 1 and online supplementary table S1 detail the cohorts included in the different stages of the study.
Supplemental material
SLE GWAS dataset. In the discovery phase, we included GWAS data from 2237 SLE cases and 6315 controls from Germany, Italy, Spain, The Netherland and the USA, all of them included in previously published GWASs16 ,17 (see online supplementary table S1).
RA GWAS dataset. The RA discovery cohort was composed of 3911 cases and 4083 controls from Sweden and the UK,18 obtained from the epidemiological investigation of RA project (http://www.eirasweden.se) and the Wellcome Trust Case Control Consortium (WTCCC) data repositories (http://www.wtccc.org.uk/), respectively (see online supplementary table S1).
Replication cohorts. The replication phase of the study comprised 1957 patients with SLE and 4588 controls, and 13 641 RA cases and 31 921 controls (see online supplementary table S1). The SLE replication cohort included case/control sets from UK and Spain. Genotyping data from the UK patients with SLE came from a published GWAS,16 while the control individuals were obtained from the WTCCC2 (only those not overlapping with the WTCCC controls) (http://www.wtccc.org.uk/). Spanish patients with SLE and controls came from two different cohorts; one of them was genotyped using multiplex assays based on SNaPshot single-base extension technology (Applied Biosystems) while the other one was genotyped by TaqMan assays. These cohorts have been used and characterised in previous association studies.19 ,20 For the RA replication, Spanish patients and controls were genotyped using TaqMan assays. Summary statistics data from six additional European case/control collections were obtained from a published GWAS meta-analysis in RA.21
Quality control and imputation
Data quality control was performed for each sample set separately prior imputation. Single-nucleotide polymorphisms (SNPs) and subjects with success call rates lower than 95% were removed. SNPs with minor allele frequencies lower than 0.01 and those showing a deviation from Hardy–Weinberg equilibrium (HWE; p<0.001) were excluded.
IMPUTE2 software was used to perform imputation as described in Howie et al,22 using as reference panels the CEU (Utah residents with Northern and Western European ancestry) +TSI (Toscani in Italy) HapMap phase III data (UCSC (University of California Santa Cruz) hg18/NCBI Build 36) with 410 phased haplotypes encompassing 1 440 616 SNPs (http://hapmap.ncbi.nlm.nih.gov/; http://genome.ucsc.edu/). All included GWAS data were imputed as described above, except those from Okada et al21 (used in the replication phase), which came from data imputed using the 1000 genomes phase I reference panel for European ancestry. SNP imputation showed an accuracy of 98% in the combined European cohort. Imputed data were subsequently subjected to stringent quality filters in PLINK V.1.07,23 that is, individuals who generated genotypes at <90% were removed, and SNPs with call rates <90% and those that deviated from HWE in control (p<0.001) were also discarded. The first five principal components (PCs) were estimated and individuals showing more than four SDs from the cluster centroids were excluded as outliers. Duplicates and first-degree relatives were also removed.
Statistical analysis
Statistical analyses were performed with PLINK V.1.07.
GWAS meta-analysis. First, each GWAS case/control cohort was independently analysed by logistic regression assuming an additive model with the first five PCs as covariates, to rule out any population stratification effect. Subsequently, disease-specific meta-analyses were performed combining RA datasets, on one hand, and SLE datasets, on the other hand, by an inverse variance-weighted method. Given associations of human leucocyte antigen (HLA) alleles with both RA and SLE have been deeply studied, polymorphisms within this region were excluded from the subsequent analyses (chr 6: 20–40 Mb). Sex chromosomes were also excluded. In order to detect non-HLA genetic variants showing the same effect in both diseases (risk or protection), a combined RA–SLE meta-analysis was conducted. Those SNPs with p values lower than 1×10−5 in this combined meta-GWAS and p values lower than 0.01 in each disease meta-analysis were selected for the replication phase. On the other hand, to identify common signals with opposite effects in both diseases, the direction of association was flipped in the RA dataset (1/OR instead of OR) before performing the RA–SLE meta-analysis. Again, SNPs were selected for replication according to the above criteria. Genetic variants showing significant heterogeneity (Cochran's Q test <0.05) in the RA or SLE meta-analysis were not considered for the validation step.
Replication analysis. Replication cohorts were also analysed by logistic regression. For the previously selected SNPs, combined analysis of the RA and SLE replication and discovery cohorts was performed using the inverse variance method. After replication stage, polymorphisms with p values lower than 5×10−8 in the RA–SLE meta-analysis (discovery and replication cohorts) and disease-specific p values lower than 0.05 in the replication phase were considered as statistically significant.
Protein–protein interaction and gene set enrichment analyses
To assess for interaction among proteins encoded by SLE and RA common risk loci, a protein–protein network were constructed using the STRING database V.10.0,24 which builds protein–protein interaction (PPI) based on direct (physical) and indirect (functional) associations.
On the other hand, gene ontology (GO) (http://www.geneontology.org)25 was applied to perform an enrichment analysis in order to determine whether certain biological process are over-represented (or under-represented) in the common RA–SLE gene set.
Genetic pleiotropy analysis
After excluding markers within the HLA region, pleiotropy between both diseases was estimated using two different approaches:
Bivariate analysis. GCTA V.1.25.0 (http://cnsgenomics.com/software/gcta/) was used to create a genetic relationship matrix (GRM) file containing IBD (identity by descent) relationship calculations for all pair-wise sets of individuals. Genetic correlation (rG) between both diseases was calculated by GCTA bivariate REML (restricted maximum likelihood) analysis26 using the GRM with the first five PCs as covariates. A likelihood ratio test (LRT) was applied to determine the statistical significance of this genetic correlation.
Polygenic risk score (PRS) analysis. We used PRS to assess the genetic overlap between RA and SLE, as previously described.27 First, we selected a filtered set of SNPs from the results of disease-specific meta-GWASs. We used the—clump algorithm in PLINK to select polymorphisms with r2<0.20 within 500 kb windows and at a range of significance levels, specifically, we evaluated three different P thresholds (PT), PT<1×10−4, <1×10−3 and <1×10−2. Then, for each individual from a specific-disease dataset (SLE or RA), we calculated the number of score alleles they possessed, each weighted by the log of the OR of the other disease. With the scores generated, we performed logistic regression analysis to test the relationship between the computed scores and disease status. The variance in case/control status explained by the scores was estimated as the difference in the Nagelkerke's pseudo-R2 between a null generalised linear model, including the first five PCs and the country of origin as covariates, and an alternative model, including the same covariates and the risk scores. The significance level was estimated by the means of an LRT.
Results
Meta-GWAS and replication
We performed a meta-analysis considering both diseases as a single phenotype. After quality control, the discovery cohort comprised 3808 RA cases, 2104 patients with SLE and 10 157 controls. A total of 309 839 genetic variants outside the HLA region overlapped between the different GWAS datasets and were included in the meta-analysis.
When we combined disease-specific meta-analyses, assuming that alleles had the same effect in both diseases, 105 SNPs reached the significance threshold fixed for the combined meta-GWAS (p value <1×10−5). Eighty-nine of these genetic variants were located in loci already established as risk factors for both diseases (figure 2 and online supplementary table S2), PTPN22, TNFAIP3, IRF5, BLK, ATG5, UBE2L3 and ICAM3/TYK2, and therefore, were excluded from subsequent analyses. Regarding the remaining signals, nine met the selection criteria for the replication phase (p value <1×10−5 in the RA–SLE meta-GWAS and p value <0.01 in each disease-specific meta-analysis). From these nine polymorphisms, three were located next to two loci previously associated with RA (two of them close to COG6 and another one near NFKBIE), one was located in an intergenic region close to PTTG1 (a locus previously associated with SLE), and the rest of them lied within regions not associated with RA or SLE so far (three near TGFA, one close to SYPL1 and another one within RUNDC1).
When we performed the analysis under the assumption that alleles have opposite direction in both diseases, only two polymorphisms within the NMNAT2 gene, an established SLE risk locus, met the selection criteria for the replication stage (figure 2 and online supplementary table S3).
To confirm that these seven loci were associated with both diseases, the strongest associated SNP within each locus was selected for validation in additional sample sets (table 1). According to the significance criteria established, the rs9603612 polymorphism, near COG6, achieved genome-wide significance in the combined analysis including both discovery and replication sets (PDiscovery+Replication=2.95E−13), and also statistical significance in each disease-specific replication analysis (PSLE_Replication=0.045; PRA_Replication=3.58E−08).
Effect on gene expression
Given that rs9603612 is a non-coding variant, we assessed its potential regulatory function by means of in silico expression quantitative trait locus (eQTL) analysis using RegulomeDB (http://www.regulomedb.org/).28 This database includes high-throughput experimental datasets, as well as computational predictions and manual annotations to identify functional variants. According to RegulomeDB, rs9603612 showed a score 1b, indicating that this SNP likely affect transcription factor binding and is linked to expression of a gene target, specifically it may act as a cis-eQTL regulating COG6 expression in monocytes.
PPI and pathway enrichment analyses
Subsequently, we evaluated connectivity at the PPI level between the 18 common risk loci for RA and SLE, including COG6. One PPI network involving 14 of the 18 common proteins was formed (figure 3). Compared with randomly selected protein datasets, a significant network connectivity was found (p value <0.05). The RA–SLE common proteins were more likely to be connected to each other than expected by chance, with 32 interactions as compared with only two observed for random datasets. COG6 did not appear linked to any protein.
Then, we performed the analysis considering all the established risk factors for SLE (n=44) and RA (n=64) (obtained from Bentham et al16 and Okada et al,21 respectively) (see online supplementary figure S1). When SLE-associated genes were input, three different networks were formed, one of them involving 37 proteins. Again, COG6 was not connected to any protein (see online supplementary figure S1A). In the case of RA, a single PPI network comprising 52 proteins was evident, with COG6 linked to it through the AFF3 protein (see online supplementary figure S1B).
On the other hand, according to the GO enrichment analysis, several biological processes appeared over-represented among the RA–SLE common gene set. The most significantly over-represented pathways are related to the immune response (table 2), especially to the type I IFN signalling pathway (p value=2.61E−03).
Genetic overlap between SLE and RA
After bivariate REML analysis, a significant genetic correlation between RA and SLE was evident (rG=0.31, SE=0.061, p value=2.00E−07).
Similarly, the PRS analysis showed significant differences in the score distribution between both case groups and controls (figure 4). For each of the established scoring SNP sets, the mean score was significantly higher in patients compared with controls (figure 4A–F), thus indicating that SLE cases had a significant enrichment of RA risk alleles, and vice versa. For both, the most significant differences were observed when scores were calculated after applying the most stringent SNP inclusion cut-off (SLE: p valueLRT=8.93E−04; RA: p valueLRT=8.93E−12), explaining 0.19% and 0.78% of the variance (Nagelkerke's pseudo-R2) in disease status for SLE and RA (figure 4G), respectively.
Discussion
In the present study, we have identified a new risk locus shared between RA and SLE through a meta-GWAS considering both disorders as a single phenotype. This study represents the first comprehensive large-scale analysis, including more than 22 000 cases and 47 000 controls, carried out to reach a deeper understanding of the genetic overlap between these two diseases.
According to our data, COG6 (component of oligomeric Golgi complex 6), an established susceptibility locus for RA and psoriasis,21 ,29 also represents a genetic risk factor influencing the SLE predisposition. This locus is located on chromosome 13q14.11, within a 250-kb block of linkage disequilibrium (LD) that includes only this gene. COG6 encodes a subunit of the conserved oligomeric Golgi complex crucial for the normal structure and function of the Golgi apparatus, influencing processes such as protein sorting and glycosylation.30 Although some hypothesis on the possible implication of COG6 in autoimmunity have been proposed, its role in immune-mediated disorders remains unknown. It has been described that deficiency of this gene may lead to a clinical phenotype including inflammatory bowel disease and neutrophil and B and T cell dysfunction.31
A main advantage of combining GWAS data from related diseases is to increase the statistical power in order to capture association signals that may have been undetected in previous disease-specific studies. Indeed, the COG6 polymorphism associated with RA and SLE, rs9603612, showed a similar effect size in both disorders (OR=0.91). However, as mentioned above, this gene was previously associated with RA but not with SLE, probably due to the larger sample size of the GWASs published in the first so far. This would also explain why, in our study, the rs9603612 genetic variant reached genome-wide significance level in the RA meta-analysis, but not in the analysis of the SLE dataset. Rs9603612 shows tight LD with the COG6 SNPs implicated in RA (rs9603616, r2=0.98)21 and psoriasis (rs7993214, r2=0.97).29 Interestingly, whereas these show minimal evidence of acting as regulatory polymorphisms according to RegulomeDB (score 5 and 4, respectively), rs9603612 seems to influence the COG6 expression in monocytes, thus representing a better candidate to be the causal variant involved in the genetic predisposition to these three autoimmune conditions. Indeed, a very recent study focused on identifying functional variants for disease-associated loci, has confirmed this regulatory role of the rs9603612 polymorphism.32 Specifically, the minor allele (G) was found to increase the COG6 expression compared with the major allele, thus supporting that rs9603612 influences the development of SLE, RA and psoriasis by regulating the COG6 levels.
The PPI analysis evidenced a high connectivity among proteins involved in RA and SLE, which indicates the existence of overlap between specific biological pathways implicated in these two diseases. In this sense, the type I IFN signalling pathway emerged as the most significantly over-represented biological process among the RA–SLE common risk loci set. This is consistent with the increased expression of type I IFN regulated genes observed in both disorders6 ,33 and points to a role of this IFN signature, a major feature of patients with SLE, in the pathogenesis of RA. Indeed, a recent gene expression meta-analysis including Sjogren's syndrome, RA and SLE showed the IFN signature as a major over-represented shared gene profile.34 Regarding COG6, no connection with any of the remaining overlapping proteins was evident in principle. However, when considering the RA-associated gene set, COG6 was linked to the main network (which included most of the common proteins to both diseases) through AFF3. It should be noted that a nominal association of the RA-associated AFF3 polymorphism with SLE was reported in a candidate gene study performed in Europeans.14 This association was subsequently replicated in Chinese patients with SLE.35 Taking this into account, AFF3 could represent the nexus between COG6 and the rest of the RA–SLE common pathways. The fact that no association signals were detected within AFF3 in our study could be due to a low SNP coverage in the region. Indeed, the reported risk variant for RA and SLE, as well as those in high LD, were lacking in our GWAS dataset.
Finally, we aimed to quantify the genetic overlap between both conditions, since it has not been systematically examined using genome-wide data. As expected, genetic factors for RA and SLE were positively correlated. Similarly, PRS analysis showed that RA and SLE polymorphisms present a cross-phenotype effect, which was higher when scores were calculated from the strongest associated SNP set (PT=1×10−4). Genetic variants with larger effect on SLE susceptibility explained a higher percentage of the RA variance, and vice versa, thus indicating that most of the genetic component shared between both disorders is driven by their main risk alleles. This is consistent with previous findings showing a genetic overlap between both diseases,13 ,18 as well as a significant enrichment in carriage of SLE alleles in patients with RA compared with controls.13
In summary, the present study adds COG6 to the list of risk factors shared between RA and SLE. Our results highlight the existence of a relevant genetic correlation between both diseases as well as the influence of common molecular mechanisms in their pathophysiology. Since common genetic pathways are implicated in RA and SLE, a reclassification of patients from a genetic point of view will lead to more specific and effective therapeutic procedures.
Acknowledgments
We thank Sofia Vargas, Sonia Garcia and Gema Robledo for their excellent technical assistance, and all the patients and healthy controls for kindly accepting their essential collaboration.
References
Footnotes
Handling editor Tore K Kvien
Contributors AM and JM were involved in the conception and design of the study and contributed in the analysis and interpretation of data. AM drafted the manuscript. LV-B, LR-R, MAG-G, AB, IG-A, PC, NO-C, MMA-G, FJG-H, MFG-E, JMS, CT, AS, AG, LP, JW, TV and MEA-R collected samples and participated in analysis and interpretation of data. LV-B, LR-R, MAG-G, AB, IG-A, PC, NO-C, MMA-G, FJG-H, MFG-E, JMS, CT, AS, AG, LP, JW, TV, MEA-R and JM revised critically the manuscript draft. All authors approved the final version of the manuscript.
Funding This work was supported by the following grants: SAF2012-34435 from the Spanish Ministry of Economy and Competitiveness, P12-BIO-1395 from Consejería de Innovación, Ciencia y Tecnología, Junta de Andalucía (Spain), the European IMI BTCure Program, the EU/EFPIA Innovative Medicines Initiative Joint Undertaking PRECISESADS (ref: 115565), the Cooperative Research Thematic Network (RETICS) programme, RD12/0009/0004 (RIER), from Instituto de Salud Carlos III (ISCIII, Health Ministry, Madrid, Spain) and PI12/02558 from Instituto de Salud Carlos III (ISCIII, Health Ministry, Madrid, Spain). AM is recipient of a Rio Hortega fellowship (CM13/00314) from the Ministry of Economy and Competitiveness through the Instituto de Salud Carlos III (ISCIII, Health Ministry, Madrid, Spain).
Competing interests None declared.
Patient consent Obtained.
Ethics approval Comité de Bioética del Consejo Superior de Investigaciones Científicas and the local ethical committees of the different participating centres.
Provenance and peer review Not commissioned; externally peer reviewed.