Fine-mapping the MHC locus in juvenile idiopathic arthritis (JIA) reveals genetic heterogeneity corresponding to distinct adult inflammatory arthritic diseases

Objectives Juvenile idiopathic arthritis (JIA) is a heterogeneous group of diseases, comprising seven categories. Genetic data could potentially be used to help redefine JIA categories and improve the current classification system. The human leucocyte antigen (HLA) region is strongly associated with JIA. Fine-mapping of the region was performed to look for similarities and differences in HLA associations between the JIA categories and define correspondences with adult inflammatory arthritides. Methods Dense genotype data from the HLA region, from the Immunochip array for 5043 JIA cases and 14 390 controls, were used to impute single-nucleotide polymorphisms, HLA classical alleles and amino acids. Bivariate analysis was performed to investigate genetic correlation between the JIA categories. Conditional analysis was used to identify additional effects within the region. Comparison of the findings with those in adult inflammatory arthritic diseases was performed. Results We identified category-specific associations and have demonstrated for the first time that rheumatoid factor (RF)-negative polyarticular JIA and oligoarticular JIA are genetically similar in their HLA associations. We also observe that each JIA category potentially has an adult counterpart. The RF-positive polyarthritis association at HLA-DRB1 amino acid at position 13 mirrors the association in adult seropositive rheumatoid arthritis (RA). Interestingly, the combined oligoarthritis and RF-negative polyarthritis dataset shares the same association with adult seronegative RA. Conclusions The findings suggest the value of using genetic data in helping to classify the categories of this heterogeneous disease. Mapping JIA categories to adult counterparts could enable shared knowledge of disease pathogenesis and aetiology and facilitate transition from paediatric to adult services.


INTRODUCTION
Juvenile idiopathic arthritis ( JIA), the most common arthritic disease of childhood, is a heterogeneous group of diseases. The current International League of Associations for Rheumatology (ILAR) classification system defines seven categories based on clinical features, including an undifferentiated category for cases that do not fall into one of the defined categories. 1 Genetic data could be used to help define JIA categories and improve the current classification system. Prior studies of the best established genetic risk factor for JIA, the major histocompatibility region (MHC) on chromosome 6, have been in modest sample sizes. 2 3 The development of methods for imputation of classical human leucocyte antigen (HLA) alleles and amino acids 4 from genotyping array data enables a comprehensive and cost-effective approach for generating HLA typing on much larger JIA cohorts. We sought to use this powerful approach to dissect and refine the HLA associations of the heterogeneous JIA categories.
While there are considerable clinical similarities between some JIA categories and adult inflammatory arthritides, there is also substantial heterogeneity. Hence, we sought to compare the associations across the MHC region observed in JIA cohorts with those observed in adult inflammatory arthritides, such as rheumatoid arthritis (RA). 5 6 Furthermore, some categories of JIA have obvious adult counterparts (eg, enthesitis-related arthritis (ERA) and adult ankylosing spondylitis (AS), or juvenile psoriatic arthritis ( jPsA) with psoriatic arthritis), the most common categories of JIA, oligoarthritis and rheumatoid factor (RF)-negative polyarthritis, do not appear to map to any adult form of disease. Mapping each of the JIA categories to RA and other adult inflammatory arthritic diseases could have many benefits including enhanced understanding of the genetic basis and etiopathogenesis of inflammatory arthritis in general, allow extrapolation of results from clinical trials in adult inflammatory arthritis to paediatric counterparts to improve the therapy of JIA, and facilitate smooth transition of paediatric patients to adult care providers with consistent clinical designations. The goals of this study were threefold, to use comprehensive MHC fine-mapping genetic data to refine HLA associations across each JIA category, to assess correspondences between the JIA categories and finally compare associations with adult inflammatory arthritic diseases.

METHODS Subjects
All cohorts comprised individuals from populations of European descent from the USA, UK, Canada, Norway and Germany. Descriptions of the datasets can be found in the online supplementary information. The total dataset prequality control comprised all JIA categories and included 5737 patients with JIA and 16 403 controls genotyped for 191 494 markers.

Genotyping and quality controls
Samples were genotyped using ImmunoChip, a custom-made Illumina Infinium array, described previously. 7 The ImmunoChip includes dense coverage of the HLA region and 186 additional non-HLA loci. Genotyping was performed according to Illumina's protocols at labs in Hinxton, UK, Manchester, UK, Cincinnati, USA, Utah, USA, Charlottesville, USA, New York, USA, Brisbane, Australia and Toronto, Canada. The Illumina GenomeStudio GenTrain V.2.0 algorithm was used to recluster all 22 140 samples together.
Single-nucleotide polymorphisms (SNPs) were initially excluded if they had a call rate <98% and a cluster separation score of <0.4. A SNP was subsequently removed from the primary analysis, if it exhibited significant differential missingness between cases and controls ( p<0.05), had significant departure from Hardy-Weinberg equilibrium ( p<0.000001 in cases or p<0.01 in controls), or had a minor allele frequency (MAF) <0.01. Based on the SNPs that passed the above quality control thresholds, samples were then excluded for call rate <98%, or if there were inconsistencies between recorded and genotype-inferred gender or excess heterozygosity on the autosomes. Duplicates and first-degree or second-degree relatives were removed based on identity-by-descent statistics computed using the programme KING. 8 Admixture estimates were computed on the remaining samples while including the HapMap phase III individuals (CEU, YRI and CHB) as reference populations using the software ADMIXTURE. 9 The admixture estimates were then used to identify and remove genetic outliers. Three of these admixture estimates were included as covariates in the logistic regression (association) analysis to account for within-sample variation.

HLA imputation
The markers spanning 29-34 Mb (hg build19) on chromosome 6 which encompasses the HLA region were extracted from the post-QC Immunochip dataset. Cases and controls were imputed together using SNP2HLA (V.1.0) (http://www.broadinstitute.org/ mpg/snp2hla/). 4 This is a robust approach which enables imputation of classical HLA alleles as well as specific amino acid positions within HLA alleles, which may play an important functional role. The method uses a large reference dataset collected by the type 1 diabetes genetics consortium 10 (n=5225). This dataset has gold-standard HLA typing and high SNP density, thus using linkage disequilibrium patterns around SNPs and classical HLA alleles enables the inference of classical HLA alleles, amino acids and SNPs across the region based on the SNP data generated from Immunochip, an approach successfully used by a number of researchers. 5 6 11 12 Post-imputation QC included removing variants with a MAF <0.01 and variants with an r 2 <0.8. The dosage output, which accounts for imputation uncertainty, was used for the association analyses.
To assess the quality of the imputation, a proportion of the UK and the US JIA cases have two-digit and four-digit HLA-DRB1 typing available (n=1562) performed using a semiautomated, reverse dot-blot method, 2 3 which was used to calculate the proportion of accurately imputed two-digit and four-digit HLA-DRB1 alleles. In addition, the DRB1 two-digit and four-digit allele frequencies were compared between genotyped and imputed HLA allele calls.

Association analysis of HLA alleles and amino acid polymorphisms
To compare the differences and similarities of HLA associations across the different JIA categories, genetic correlation of the MHC region between the categories was calculated using bivariate analysis 13 implemented using GCTA. 14 This analysis first calculates the genetic variance (heritability) of each category and then calculates the genetic correlation between the categories across the HLA region. High correlation suggests similarities or pleiotropy between the two categories compared. This analysis requires independent controls for the two categories being compared and therefore the controls were randomly assigned to the two categories, splitting equally, taking into account the proportions of controls from each population.
HLA variants were binary coded (presence or absence) and included SNPs and two-digit and four-digit HLA alleles. Association analysis was performed using logistic regression in R, using dosage data (genotype probabilities), which takes into account imputation uncertainty. For the analysis of each JIA category, the total control dataset was used for each comparison. HLA amino acid polymorphisms have multiple residues at each position and were analysed using the omnibus test. This is a log-likelihood ratio test comparing the likelihood of the null model against the likelihood of the fitted model, which gave a p value assessing the improvement in fit of the model, the deviance is calculated (−2×the log likelihood ratio), which follows a χ 2 distribution with m−1 degrees of freedom (where m is the number of HLA variant alleles). 5 Three of the admixture estimates were included as covariates to account for potential population stratification.
To look for independent effects across the HLA region, conditional analysis was performed. Logistic regression, as described above, was performed to identify the most associated marker. Then this marker was used as a covariate in the model and logistic regression repeated. This analysis was continued sequentially in a forward stepwise approach until no variant satisfied the genome-wide significance threshold (conditioned p<5×10 −8 ). When the covariate was an amino acid, all multi-allelic variants of the amino acid were included as covariates, excluding the most frequent variant. To look for additional effects outside HLA-DRB1, we included all two-digit and four-digit HLA-DRB1 alleles within the model and looked for any residual effects.
To confirm the results of the conditional analysis and to check that there were no other combinations of variants that better fitted the models derived from the forward stepwise approach, described above, we exhaustively tested all possible combinations of 2, 3 and 4 amino acid positions, including the three admixture estimates as covariates. For each combination we calculated deviation from the null hypothesis, which included only the admixture covariates. To assess the improvement in the model fit we also calculated the improvement in the Akaike information criterion (ΔAIC), and also the improvement in the Bayesian information criterion (ΔBIC).
We used a disease prevalence of 0.001 to estimate the variance explained (h 2 ) by the HLA region and some of the independent effects and compared with the estimate calculated for all Immunochip, implemented using GCTA. 14

HLA imputation
Post-QC data was available for 6920 SNPs, 335 amino acids and 171 HLA alleles in 5043 JIA cases and 14 390 healthy controls (see online supplementary table S1). A detailed breakdown of the JIA cases by ILAR category is shown in table 1, and by population and gender in online supplementary table S2.
For a proportion of the UK and US JIA cases (n=1562), twodigit and four-digit HLA-DRB1 classical typing was available. The accuracy of the imputed data was calculated as 97.9% for two-digit and 89.3% for four-digit alleles, which is similar to those calculated in previous studies in RA. 5 A detailed analysis strategy is shown in online supplementary figure S1.

Bivariate analysis to look for genetic correlation between the JIA categories
We performed bivariate analysis to calculate the estimated HLA region genetic correlation between each pair of JIA categories (figure 1). The heritability for each category estimated from the bivariate analysis was similar to that estimated from univariate analyses performed in the total dataset (see online supplementary tables S3 and S4). The estimates of correlation between each pair of JIA categories showed a surprisingly strong correlation between the most common categories of JIA: RF-negative polyarthritis, persistent and extended oligoarthritis (rG>0.88). In contrast, the correlations between these three categories and the other categories of JIA were lower (figure 1).

Association analysis of HLA markers
After conducting primary association analysis of all 7426 variants, in each of the seven JIA categories (table 2), we observed that for oligoarthritis and both RF-positive and RF-negative polyarthritis the strongest association was with HLA-DRB1 amino acid position 13. However, for oligoarthritis and RF-negative polyarthritis, the most common categories of JIA, glycine13 confers the strongest risk; serine13 also confers a risk effect but histidine13 is protective. By contrast, in RF-positive polyarthritis, it is histidine13 that confers the strongest risk and serine13 confers a strong protective effect (see figure 2, online supplementary table S5 and supplementary figure S2). When the effect estimates for the histidine13 residue in the associated JIA categories were compared using multinomial logistic regression, strong protective effects were observed in persistent and extended oligoarthritis, with no significant difference in the effect estimates ( p=0.63). There was a slightly weaker, protective effect for RF-negative polyarthritis compared with that for persistent and extended oligoarthritis (p<0.05). Importantly, there was a significantly different risk effect in RF-positive polyarthritis compared with RF-negative polyarthritis, persistent and extended oligoarthritis. The remaining JIA categories had distinct HLA associations from these common categories. The most significant association in systemic JIA (sJIA) was for HLA-DRB1*11 and for the ERA category was HLA-B*27. For jPsA, no associations reached genome-wide level of significance (p<5×10 −8 ).

Investigation of multiple effects within the region
Observing that oligoarthritis and RF-negative polyarthritis showed similar HLA associations and evidence for pleiotropy from the bivariate analysis in GCTA, 13 14 these categories were combined to increase power for further analyses (total sample size=3934). To look for independent genetic effects across the HLA region, we conditioned on the most associated marker, HLA-DRB1 amino acid 13 and detected a second independent effect within HLA-DRB1 at amino acid position 67 (omnibus p=7.01×10 −83 ). Further conditioning revealed separate effects at amino acid positions 181 (omnibus p=3.33×10 −22 ) and 71

Variance explained by the HLA region
We calculated the proportion of variance explained by the independent HLA variants in the combined oligoarthritis and RF-negative polyarthritis dataset (see table 3 and online  supplementary table S4) and found that the total HLA region explained 8% of the total phenotypic variance, with the HLA-DRB1 region, driven by the amino acid at position 13, contributing 50% of variance explained by the HLA region.

Comparison with adult inflammatory arthritic diseases
We compared our HLA association findings across JIA categories with those of adult inflammatory arthritic diseases (see online supplementary table S7). In seropositive RA, Raychaudhuri et al showed multiple independent associations within the HLA-DRB1 gene at three amino acid positions (11,71 and 74) and also independent associations at amino acid position 9 in HLA-B and amino acid position 9 in HLA-DPB1. 5 The DRB1 amino acid at position 11 is in strong linkage disequilibrium with the amino acid at position 13, which makes it difficult to assign causality to one or the other. In this study, oligoarthritis and polyarthritis each showed association with HLA-DRB1 amino acid at position 13. If the ORs of the residues at HLA-DRB1 amino acid position 13 for paediatric and adult arthritic diseases are compared, the combined oligoarthritis and RF-negative polyarthritis dataset shows similar ORs to that seen in seronegative RA 6 (see online supplementary figure S4), suggesting that these JIA categories could potentially have an adult counterpart. Likewise, in RF-positive polyarthritis, the histidine residue at position 13 at HLA-DRB1 confers the greatest risk for disease and, unsurprisingly, this mirrors the association in seropositive RA 5 (see online supplementary figure S4). For the ERA category, as expected the most significant association was for HLA-B*27, the same HLA allele found in AS.

DISCUSSION
This is the largest investigation of association of the HLA region with JIA and its categories to date, exploiting novel imputation strategies we have observed differences and similarities between HLA associations for the different categories. The most common and also the most clinically homogeneous categories of JIA, oligoarthritis and RF-negative polyarthritis, showed strong genetic correlation across the HLA region supporting our previous approaches of combining these categories for genetic studies. 15 Combined analysis of these categories show they share association across the HLA region with strong association for HLA-DRB1 amino acid position 13. The results for these combined categories are consistent with previous findings investigating association of classical HLA alleles in JIA. For example, there is previous evidence for association of HLA-DRB1*08 and the HLA alleles that lie on this haplotype, with oligoarthritis and RF-negative polyarthritis. 2 3 At HLA-DRB1 amino acid position 13, the glycine residue lies on the HLA-DRB1*08 haplotype. The association with the amino acids is much stronger than that for the classical HLA allele (see online supplementary figure S3A). These combined categories also show multiple independent effects across the region, at HLA-DRB1 amino acid   Previous studies have failed to demonstrate an association with HLA-B, which are apparent only with the additional samples available for this study.
A striking finding has been the shared association of HLA-DRB1 amino acid position 13 for both paediatric and adult diseases. It is known that amino acid position 13 is involved in shaping the peptide-binding pocket 4 of HLA-DRB1. 16 We find that the association in the combined oligoarthritis and RF-negative polyarthritis dataset mirrors the findings seen in seronegative RA and similarly, in RF-positive polyarthritis, the findings correspond to the association in seropositive RA. Interestingly, the magnitudes of associations are stronger in the paediatric diseases compared with adult, suggesting the paediatric disease is more genetically driven.
We then further compared the associations seen in each of the other JIA categories with those of their proposed adult counterparts. Based on clinical features, it is likely that sJIA would map to adult Still's disease, but there is currently no HLA genetic data to support or refute this. The most significant association in sJIA was for HLA-DRB1*11, consistent with recent findings from a large genome-wide association study for sJIA, which used an overlapping set of samples. 17 Previous studies of a HLA association with sJIA had yielded conflicting results, but there is now clear evidence for association of the HLA region with this category of JIA. Data from the current study also show that the association is distinct to that seen in the other categories. This supports emerging evidence that sJIA is a distinct disease, with less of an autoimmune phenotype and displaying autoinflammatory features 18 and builds on previous genetic evidence, which reported no association with another wellestablished JIA susceptibility gene, PTPN22, in sJIA. 19 Unsurprisingly, the strongest association seen in ERA, HLA-B*27, is the same as adult AS. 20 Although no associations reaching genome-wide level of significance ( p<5×10 −8 ) were seen in jPsA, the most significant HLA alleles were HLA-DQA1*0401 ( p=0.0001), HLA-DRB1*08 ( p=0.0003) and HLA-DQB1*0402 ( p=0.0008), which all lie on the same haplotype. The established HLA association in adult-onset PsA is HLA-C*0602, 21 which is also the primary HLA association in psoriasis, 22 was also modestly associated in this study ( p=0.008). There was also evidence in jPsA for association with HLA-B*27 ( p=0.003), the HLA allele that is the most significant in ERA. The mixed HLA associations in jPsA may suggest some misclassification such that the jPsA samples may contain some individuals from oligoarthritis, RF-negative polyarthritis or ERA categories. This is perhaps not surprising given that jPsA is difficult to classify, and that some of the jPsA classification criteria have been disputed. 23 The results of this study have important implications for understanding disease pathogenesis, aetiology and potential future therapeutic strategies for JIA categories. Despite the development of a classification system, heterogeneity still exists within the ILAR categories. This heterogeneity of JIA remains a key challenge to paediatric rheumatologists; however, these results may inform the debate on classification and help define a more biological-driven and molecular-driven classification system. We show clear differences among many of the categories in terms of their HLA associations, but here we have also shown that the most common categories of JIA, oligoarthritis and RF-negative polyarthritis, are genetically similar and also notably similar to adult-onset seronegative RA. It is only relatively recently that the heterogeneous nature of adult RA has been recognised, with seronegative RA less common than seropositive RA. 24 25 There are no specific therapeutic strategies for seronegative RA at this time, but given the rarity of this subphenotype of RA and the JIA categories individually, this study suggests that further comparisons of genetic studies for these diseases could help identify novel pathways and targets for therapy for both adult-onset and childhood-onset forms of inflammatory arthritis. Table 3 Heritability estimates for HLA and various alleles and All Immunochip in the combined oligoarthritis and RF-negative polyarthritis dataset (n=3934)