Combined genetic analysis of juvenile idiopathic arthritis clinical subtypes identifies novel risk loci, target genes and key regulatory mechanisms

Objectives Juvenile idiopathic arthritis (JIA) is the most prevalent form of juvenile rheumatic disease. Our understanding of the genetic risk factors for this disease is limited due to low disease prevalence and extensive clinical heterogeneity. The objective of this research is to identify novel JIA susceptibility variants and link these variants to target genes, which is essential to facilitate the translation of genetic discoveries to clinical benefit. Methods We performed a genome-wide association study (GWAS) in 3305 patients and 9196 healthy controls, and used a Bayesian model selection approach to systematically investigate specificity and sharing of associated loci across JIA clinical subtypes. Suggestive signals were followed-up for meta-analysis with a previous GWAS (2751 cases/15 886 controls). We tested for enrichment of association signals in a broad range of functional annotations, and integrated statistical fine-mapping and experimental data to identify target genes. Results Our analysis provides evidence to support joint analysis of all JIA subtypes with the identification of five novel significant loci. Fine-mapping nominated causal single nucleotide polymorphisms with posterior inclusion probabilities ≥50% in five JIA loci. Enrichment analysis identified RELA and EBF1 as key transcription factors contributing to disease risk. Our integrative approach provided compelling evidence to prioritise target genes at six loci, highlighting mechanistic insights for the disease biology and IL6ST as a potential drug target. Conclusions In a large JIA GWAS, we identify five novel risk loci and describe potential function of JIA association signals that will be informative for future experimental works and therapeutic strategies.


INTRODUCTION
The contribution of large-scale genetic studies to the understanding of pathogenesis and management of complex traits has been widely documented over the last decade with the identification of thousands of genetic associations and their subsequent implications for biological pathways, drug discovery and repurposing. 1 2 However, progress in low-prevalence diseases has not been as rapid owing to hindrances in the recruitment of well-powered cohorts. This is well illustrated by considering the distinct and heterogeneous forms of childhood arthritis that are clinically encompassed under the term of juvenile idiopathic arthritis (JIA). JIA comprises childhood rheumatic conditions characterised by inflammatory arthritis of unknown origin that persists for at least 6 weeks and begins before the age of 16 years. 3 The International League of Associations for Rheumatology (ILAR) distinguishes seven JIA subtypes: oligoarticular arthritis (oligoJIA); rheumatoid factor (RF)-negative polyarthritis (RF-polyJIA); RF-positive polyarthritis (RF+polyJIA); juvenile psoriatic arthritis (JPsA); enthesitis-related arthritis childhood spondyloarthropathy (ERA); systemic arthritis (sJIA); and undifferentiated arthritis. 4 To date genetic studies in JIA susceptibility have identified 17 genome-wide significant associations

Key messages
What is already known about this subject? ► Juvenile idiopathic arthritis (JIA) is the most common form of childhood arthritis. However, our understanding of the genetic basis of JIA is hampered by low disease prevalence and extensive clinical heterogeneity represented by seven disease subtypes, with only 17 known susceptibility loci to date.

What does this study add?
► Although JIA is a heterogeneous disease, we show that most susceptibility loci are shared across multiple clinical subtypes, enabling joint analysis of clinically related subtypes, both for this study and future projects, increasing the power of our study leading to the identification of five novel susceptibility loci in the largest genome-wide genetic study to date. ► By linking susceptibility genetic variants to target genes, integrating functional annotations, statistical fine mapping, expression data from 15 immunological cell types and chromatin interaction data (HiChIP and Hi-C) from human T and B cell types, we identify putative causal (i) single nucleotide polymorphisms; (ii) genes and (iii) cell types while also highlighting key regulatory mechanisms underlying disease.

Paediatric rheumatology
highlighting a number of key findings [5][6][7] including, first, that there is overlap of susceptibility loci between two of the most common JIA subtypes, oligoJIA and RF-polyJIA; specifically, in the human leucocyte antigen (HLA) region, these two subtypes share the presence of a glycine at amino acid position 13 of HLA-DRB1 as their highest risk factor, resembling the findings in adult seronegative rheumatoid arthritis (RA). 6 7 Together these two JIA subtypes define a genetically homogeneous cluster sometimes referred to with the term 'polygo'. [7][8][9] Second, this high genetic correlation is not as evident in the remaining subtypes, especially considering their divergent associations observed across the HLA region. For example, the presence of a histidine at the same HLA-DRB1 position confers the highest risk for RF+polyJIA, consistent with the association reported in adult seropositive RA. 7 In addition, the amino acid at position 58 of HLA-DRB1 has been shown to be a specific risk factor for sJIA. 10 The clinical heterogeneity of JIA remains a challenging issue in deciphering its genetic architecture by balancing the need to focus on more clinically/genetically homogeneous subtypes against potentially sacrificing sample size. As a result, there has been a tendency to address the genetics of JIA in a subtype-based manner. [6][7][8][9] However, multinomial approaches have recently been developed to overcome the heterogeneity problem by allowing exploration of the genetic relationships between multiphenotype categories. 11 In this study, we hypothesised that a genome-wide association study (GWAS) combining all JIA subtypes would optimise the success rate in locus discovery. We, therefore, performed a new genome-wide scan of ~7.5 million single nucleotide polymorphisms (SNPs) in the largest JIA GWAS cohort recruited to date, and implemented a novel approach to systematically investigate specificity and sharing of associated loci across ILAR subtypes to support our strategy.

Study cohort and GWAS quality control
A total of 4520 UK JIA samples and 9965 healthy individuals were recruited for the present study. JIA DNA samples were genotyped on the Illumina Infinium CoreExome and Infinium OmniExpress genotyping arrays. Sample-level quality control (QC) was applied based on the following exclusion criteria: call rate <0.98 and discrepancy between genetically inferred sex and database records. SNPs that were non-autosomal, had a call rate <0.98 or a minor allele frequency (MAF) <0.01 were excluded. Healthy controls were genotyped using the Illumina Infinium CoreExome genotyping array. QC was consistent with that described above for JIA samples.
Identity-by-descent was used to identify related individuals across all study samples. For each related pair, the sample with the highest call rate was retained. Outliers were identified and excluded based on ancestry using principal component (PC) analysis performed with the flashpca software package (V.2.0) where outliers were identified using aberrant R library (V.1.0). 12 13 The total number of individuals that remained in the final QC-filtered data set was 12 501 (3305 cases and 9196 healthy controls) (online supplemental table 1).

Imputation
The QC-filtered GWAS data set was subjected to whole-genome genotype imputation. Haplotype phasing and imputation were performed in the Michigan Imputation server using SHAPEIT2 14 and Minimac3, 15 respectively, and the Haplotype Reference Consortium reference panel. Following imputation, SNPs were excluded based on MAF <0.01 and imputation quality (r 2 ) <0.4.

Association testing and meta-analysis
Case-control association testing was performed by SNPTEST software package (V.2.5.2). Three PCs were included as covariates to account for any residual population substructure. Any SNP with a p value <5 x 10 -6 was selected for validation in GWAS summary statistics from an independent data set of 2751 JIA cases (oligoJIA and RF-polyJIA) and 15 886 controls of European ancestry. 8 An inverse variance weighted fixed effects metaanalysis was performed using the software package GWAMA (V.2.2.2). 16 The presence of heterogeneity of ORs across data sets was evaluated with the test statistics I 2 and Q.

Clinical subtype specificity
The specificity and sharing of JIA susceptibility SNPs across ILAR subtypes was interrogated using Bayesian multinomial logistic regression assuming an additive model implemented in the software package Trinculo (V.0.96). 11 Model selection for specificity or sharing was based on comparison of log-Bayes factors (logBFs) where a positive logBF was interpreted as evidence that a particular association is specific to an ILAR subtypes, and vice versa.

Statistical fine-mapping of JIA-associated loci
Statistical fine-mapping of the association signal within each locus was performed using the FINEMAP software package (V.1.3.1). 17 The method estimates the posterior inclusion probabilities (PIPs) for SNPs to be causal, which in turn were used to generate 95% credible SNP sets for each locus (the smallest list of variants that jointly have a probability of including the causal variant ≥95%).

Functional annotation enrichment analysis
Summary statistics from the GWAS including all ILAR subtypes were tested for enrichment in four categories of annotations based on experimental genomic data including gene structure (coding sequence (CDS), 3'UTR and 5'UTR) from the GENCODE Project, binding sites for 165 transcription factors (TFs) from the ENCODE Project, and enhancers and active promoters for 98 cell types derived from the Roadmap Epigenomics Project. [18][19][20] Enrichment of JIA associations were tested separately in each annotation using fgwas (V.0.3.6). 21 A joint model of independent enrichments was further identified using the cross-validation likelihood option implemented in fgwas.

Gene prioritisation
Expression quantitative trait locus (eQTL) data for 15 immune cell types was downloaded from the DICE (Database of Immune Cell Expression, eQTLs and Epigenomics) project website. 22

Key messages
How might this impact on clinical practice or future developments? ► The results of this study demonstrate that clinically heterogeneous subtypes can be analysed in a combined approach to identify novel shared susceptibility loci which is an approach that will be informative for genetic studies of other clinically heterogeneous diseases. ► We identify causal genes at JIA susceptibility loci which is an essential step in the translational of genetic discoveries to clinical benefit by highlighting potential therapeutic targets.
Correlation of susceptibility association signals and gene expression were identified by selecting the top eQTL SNP for each gene and retaining those that were also present in the combined list of all credible SNPs. This analysis was further supported by statistical colocalisation of association and eQTL signals. The identification of the target genes of JIA-associated regions was further complemented by the interrogation of high-resolution maps of chromatin interactions for SNPs correlated with eQTL signals using H3K27ac HiChIP data in B and T cells. 23 We also explored chromatin interaction maps obtained by capture Hi-C. 24 Additional details of the Methods are available in the online supplemental material.

Five novel susceptibility loci for JIA
We performed a JIA GWAS comprising 12 501 individuals (3305 cases and 9196 healthy controls) and a high-density SNP panel with 7 461 861 variants. The combined analysis of all available JIA cases identified eight loci reaching genome-wide significance (p≤5 x 10 -8 ), of which seven have previously been reported and recognised by the notable gene at each locus: 21), and TYK2 (19p13.2) (figure 1 and table 1). 6 The strongest association was found to SNPs within the extended MHC region (chr6: 28 477 797 to 33 448 354). An in-depth analysis of this region has previously been reported for JIA, including a subset of samples from the current study; therefore this study will focus on non-MHC associations. 7 The novel genome-wide significant association was represented by the lead SNP rs497523 (p=7.12 x 10 -9 ), which is intronic to CCDC101 (16p11.2), also known as SGF29. A further 37 lead SNPs from independent loci, based on linkage disequilibrium, reached the suggestive significance threshold (p≤5 x 10 -6 ). This included previously reported and potentially novel JIA loci (table 1). Summary statistics for 22 of these variants were available from a previously published JIA GWAS comprising 2751 cases and 15 886 controls 8 and meta-analysed with the current GWAS data. The meta-analysis identified a further four novel SNPs exceeding genome-wide significance in the proximity of the genes AHI1 (6q23.3), CCR3 (3p21.31), TNFSF11 (13q14.11) and FOXP1 (3p13) (table 2 and online  supplemental table 2). Hence, a total of five new signals associated with JIA were identified. In addition, three SNPs showed evidence for replication in the independent data set although the meta-analysis test statistic did not exceed genome-wide significance (table 2): TNFSF8 (rs7043505) ( meta p=8.27 x 10 -8 ), AFF3 (rs11692867) ( meta p=9.26 x 10 -8 ), and RUNX3 (rs72657048) ( meta p=3.51 x 10 -7 ). There was no evidence of between-study heterogeneity at any of these loci.

Evidence for shared non-HLA loci across JIA clinical subtypes
We systematically addressed the genetic relationship across ILAR subtypes in a Bayesian framework. We performed a Bayesian model selection between the best subgroup-specific model and the best sharing model and estimated the logBFs for specificity of effects at each locus. The analysis included the 44 non-HLA index SNPs passing study suggestive significance threshold (5×10 -6 ) (table 1). The results revealed evidence for sharing of JIA susceptibility loci across multiple ILAR subtypes since most of the analysed SNPs showed negative logBFs for specificity. This pattern was also evident for previously reported JIA susceptibility SNPs based on a combined cohort of oligoJIA and RF-polyJIA subtypes (online supplemental table 3, online supplemental figure 1). Moreover, the vast majority of the strongest logBFs (values between −4.5 and −9) were observed for the sharing model that comprised all JIA subtypes. Only seven loci (16%) showed weak evidence in favour of them being specific to the polygo subgroup (logBF of 0.06 to 0.5). Overall, these findings support our approach of performing a joint analysis of all available JIA cases to maximise power to detect novel susceptibility loci.

Enrichment of JIA susceptibility SNPs in TFBS and cell-type specific regulatory regions
We investigated the over-representation of JIA susceptibility SNPs in functional categories including gene structure (CDS, 3'UTR and 5'UTR), transcription factor binding sites (TFBSs) and enhancers and active promoters in 98 cell/tissue types. Our results showed no evidence for significant enrichment of JIA susceptibility SNPs in any of the gene structure annotations (p values>0.1) (figure 2). The most significant enrichment was found to binding sites for the TF RELA (p value=2.66 x 10 -8 ) (online supplemental table 4 and online supplemental figure  2). Additionally 52 out of the 165 TFBSs interrogated showed significant over-representation, including EBF1 (p value=6.00 x Given the expected correlation between the analysed annotations, we then proceeded to perform a stepwise selection process to select a subset of non-redundant annotations. A combined model derived from all categories of annotations consisted of binding sites for RELA and EBF1, and enhancers in primary T helper memory cells and the T cell leukaemia cell line DND-41. The maximum likelihood of this cross-category model exceeded that from any of the single-annotation models thus identifying the most statistically relevant regulatory elements Paediatric rheumatology (figure 2). No annotations in the final model were excluded with cross-validation.

Prioritising potential causal SNPs
Using our high-density SNP panel, we aimed to identify the putative causal SNPs driving the association signals. For this purpose, we applied a Bayesian fine-mapping approach 17 to define the PIP of each variant being causal given all other variants in the region. We fine-mapped each of the five newly discovered loci and 12 previously reported non-MHC susceptibility loci (p value<5 x 10 -6 in the present study) to identify 95% credible SNP sets. There was no evidence to support multiple distinct association signals at any locus. For 5 (29%) and 10 (59%) of the 17 loci, fine-mapping resolved the association signal to 95% credible sets of ≤10 and ≤30 causal variants, respectively (online supplemental tables 7 and 8). Moreover, we identified five SNPs with PIPs of at least 0.5 for the following loci: RSBN1-PTPN22 (1p13.2; rs6679677), FOXP1 (3p13; rs7647909), CCR3 (3p21.31; rs79815064), ANKRD55 (5q11.2; rs7731626) and TYK2 (19p13.2; rs34536443) (online supplemental table  7). Interestingly, the method was able to identify rs34536443, a well-characterised non-synonymous variant in autoimmunity, 25 as the likely causal variant for TYK2 locus with a PIP of 80%.

Prioritising target genes
The identification of the target genes of the disease-associated variants is a crucial step towards describing the biological impact of a statistical association. To address this question, we first used eQTL data derived from 15 disease relevant immune cell types to correlate the identified credible SNPs with genes in each locus. The credible SNP sets captured the lead eQTL SNP for 15 genes (eGenes) at nine loci (figure 3 and online supplemental table 9). These observations were supported by statistical colocalisation (online supplemental table 10). Subsequently, we complemented the identification of the putative target genes of JIA SNPs by analysing high-resolution maps of enhancer-promoter interactions in human B and T cells. We observed HiChIP interactions for the promoters of 6 out of the 15 JIA eGenes: IL2RA, CLN3, ATP2A1, IL6ST, CCDC101 (SGF29) and ERAP2 (online supplemental table 11). In addition, SULT1A2, SULT1A1, ACTA2, FAS and AHI1 promoters were located within 1 kb windows of JIA credible SNPs that overlapped an H3K27ac peak as identified from HiChIP data. We also observed promoter interactions for JIA credible SNPs and the promoters of IL2RA, CLN3, IL6ST, CCDC101 and ERAP2 through chromatin interaction maps obtained by capture Hi-C experiments (online supplemental table 12).

Paediatric rheumatology
Interestingly, our analysis allowed us to refine the target gene of the association signal at 5q11.2 to IL6ST, since the credible SNP (rs7731626) showed chromatin contacts to the promoter of this gene but we did not observe interactions to the classically reported gene ANKRD55 ( figure 4). This exemplifies the potential of integrative analyses in deciphering the plausible mechanistic effect of association signals.
In total, we found 11 JIA targets genes showing both significant eQTL and H3K27ac HiChIP evidence.

DISCUSSION
We used a Bayesian model selection approach to demonstrate extensive sharing of JIA susceptibility loci across the ILAR subtypes and subsequent joint analysis of subtypes led to the identification of five novel risk loci, bringing the total of genomewide significant regions for JIA to 22. We were able to prioritise causal genes at six loci integrating Bayesian fine-mapped credible SNPs, transcriptomics and chromatin interaction maps derived from disease-relevant cells.
A key challenge for studies investigating JIA susceptibility is how to account for the clinical heterogeneity across the ILAR clinical subtypes. Previous studies have focussed on the more frequent ILAR subtypes in an attempt to mitigate the loss of power due a non-specific phenotype definition. 26 However, in the present study this would have resulted in the exclusion of 30% of the available cases. Guided by the Bayesian model selection, we chose to perform a combined analysis across all ILAR subtypes, which we show maximises power to detect novel loci. However, it is important to recognise that this approach will only increase power to detect loci that underlie biological pathways shared by multiple ILAR subtypes and does not exclude the existence of subtype specific risk factors, which are known to exist. 7 9 27 Enrichment of JIA susceptibility loci in functional annotations highlighted that most association signals affect disease risk through regulatory effects on gene expression and in a cell-type specific manner. Our analysis pointed to the TFBS of RELA and EBF1 as two main non-redundant regulatory elements suggesting a crucial contribution of them in JIA risk. Interestingly, RELA and EBF1 are known to regulate Treg-induced tolerance 28 and B cell specification and commitment, 29 respectively.
Identifying target genes of the association signals is a crucial step to translate statistical findings to biological meaning and, in turn, for the development of new therapeutic strategies. Applying , four adaptive immune cell types that have not encountered cognate antigen in the periphery (naive B cells, naive CD4 + T cells, naive CD8 + T cells and naive regulatory T cells (Treg)), six CD4 + memory or more differentiated T cell subsets (Th1, Th1/17, Th17, Th2, follicular helper T cell (Tfh) and memory Treg), and two activated cell types (naive CD4 + and CD8 + T cells that were stimulated ex vivo). The p value for significant correlations are reported in each cell for credible SNPs that capture the most significant eQTL. Beta coefficients to illustrate direction and magnitude as determined by risk allele. eQTL, expression quantitative trait locus; SNPs,single nucleotide polymorphisms.
an integrative approach, we provide robust evidence to nominate target genes at the novel locus at 16p11.2. This is a known susceptibility locus for multiple chronic inflammatory diseases and includes attractive biological candidate genes such as IL27. However, complementary evidence from eQTL and chromatin data implicates the genes CLN3 and SULT1A2. CLN3 encodes a protein that is involved in lysosomal function suggesting a role for lysosome-mediated degradative pathways via autophagy and phagocytosis. Interestingly, Peeters et al reported that synovial fluid T cells derived from JIA patients showed enhanced autophagy. 30 SULT1A2 encodes a catalytic enzyme that sulfonates different molecular components like thyroid hormones. Therefore, this target gene may establish a link for the comorbidity observed between rheumatic conditions and thyroid disorders. A second example of successful refinement is the association signal at 5q11.2 to IL6ST, instead of the classically reported gene ANKRD55. 6 We found that the ANKRD55 intronic SNP, rs7731626, interacts with the promoter of IL6ST, and that its risk allele increases the expression of the gene. IL6ST is the interleukin 6 (IL-6) signal transducer and is the drug target of satralizumab, a biological drug that is currently in Phase III of a clinical trial for neuromyelitis optica, a rare autoimmune disease of the nervous system. 31 Considering that other biological drugs targeting the IL-6 pathway, such as tocilizumab, are currently in use for the treatment of JIA, our findings provide genetic support for the study of satralizumab as a new therapeutic target for JIA.
In conclusion, our results highlight the utility of joint analysis considering all JIA subtypes to maximise discovery, shifting the classical paradigm on which previous JIA genetic studies were based, and illustrate the potential of integrative approaches to gain further insights into the genetic susceptibility of the disease, which may in turn inform future therapeutic drug targets and pathways.
Paediatric rheumatology the assistance given by IT Services and the use of the Computational Shared Facility at The University of Manchester. Understanding Society: The UK Household Longitudinal Study is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council. The survey was conducted by NatCen and the genome-wide scan data were analysed and deposited by the Wellcome Trust Sanger Institute.
Contributors JB and WT contributed to the conception and study design. JB, EL-I, SS, AY, MCM, MS, SP, CDL and SDT contributed to Data collection and/or QC and imputation. EL-I, JB, APM, CS, PM and VPG contributed to data analysis. JB, EL-I and WT were involved in drafting the manuscript. All co-authors made substantial contributions to data acquisition, data interpretation and revised the work critically for important intellectual content. Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information. Summary statistics of the GWAS analysed in the current study will be available through the NHGRI-EBI GWAS Catalog (https:// www. ebi. ac. uk/ gwas/ downloads/ summary-statistics) (Study Accession Code GCST90010715). Healthy controls data was obtained from the UK Household Longitudinal Study (https://www. understandingsociety. ac. uk/). Information on how to access the data can be found on the Understanding Society website https://www. understandingsociety. ac. uk/.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Open access
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https:// creativecommons. org/ licenses/ by/ 4. 0/.