Objectives We sought to investigate whether genetic effects on response to TNF inhibitors (TNFi) in rheumatoid arthritis (RA) could be localised by considering known genetic susceptibility loci for relevant traits and to evaluate the usefulness of these genetic loci for stratifying drug response.
Methods We studied the relation of TNFi response, quantified by change in swollen joint counts ( Δ SJC) and erythrocyte sedimentation rate ( Δ ESR) with locus-specific scores constructed from genome-wide assocation study summary statistics in 2938 genotyped individuals: 37 scores for RA; scores for 19 immune cell traits; scores for expression or methylation of 93 genes with previously reported associations between transcript level and drug response. Multivariate associations were evaluated in penalised regression models by cross-validation.
Results We detected a statistically significant association between Δ SJC and the RA score at the CD40 locus (p=0.0004) and an inverse association between Δ SJC and the score for expression of CD39 on CD4 T cells (p=0.00005). A previously reported association between CD39 expression on regulatory T cells and response to methotrexate was in the opposite direction. In stratified analysis by concomitant methotrexate treatment, the inverse association was stronger in the combination therapy group and dissipated in the TNFi monotherapy group. Overall, ability to predict TNFi response from genotypic scores was limited, with models explaining less than 1% of phenotypic variance.
Conclusions The association with the CD39 trait is difficult to interpret because patients with RA are often prescribed TNFi after failing to respond to methotrexate. The CD39 and CD40 pathways could be relevant for targeting drug therapy.
- rheumatoid arthritis
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
What is already known about this subject?
To date, no strong associations of individual genetic loci with response to tumour necrosis factor inhibitors (TNFi) in rheumatoid arthritis (RA) have been identified, despite recent large efforts based on conventional genome-wide association studies and a crowdsourcing initiative.
What does this study add?
We introduced a new methodological approach for localising genetic effects by using genotypic risk scores based on known genetic loci for related traits and likely biomarkers.
We identified two genetic loci strongly associated with TNFi response in RA and demonstrated that the genetic determinants of TNFi response are different to the known susceptibility loci for RA.
How might this impact on clinical practice or future developments?
Measurements of expression of CD40 and CD39 and their corresponding pathways could be relevant for targeting drug therapy in RA.
Our new methodological approach could be useful for localising genetic effects in traits for which assembling large sample sizes is not feasible, such as drug response.
Biologic therapies have transformed the outlook for rheumatoid arthritis (RA). However, for the most commonly used class of agent, tumour necrosis factor inhibitors (TNFi), there is substantial variability in response to treatment among patients with RA.1 This has spurred efforts to discover predictors of response and more generally to understand how to subtype this heterogeneous disease to predict which therapies will work.2 3
Genome-wide association studies (GWAS) of response to TNFi have shown that common single nucleotide polymorphisms (SNPs) explain an estimated 40% and 50% of the variance of change in swollen joint counts (SJC) and erythrocyte sedimentation rate (ESR), respectively; however, no strong associations with individual SNPs have been detected.4 Thus, as with many complex phenotypes, the genetic architecture of response to TNFi is likely to be polygenic with many small genetic effects.5 In this situation, the sample size required to learn a predictive model is very large—up to 10 cases per variable6—and it may not be feasible to assemble such large sample sizes for studying response to a single drug or drug class.
It has been suggested that improving prediction of complex clinical outcomes may be possible by incorporating information about the genetics of relevant traits in the prediction model.7 8 One such approach is to use publicly available summary GWAS results of relevant traits to compute genotypic scores, which can then be used as variables (‘features’) from which to build predictive models. By harnessing the genetic profiles of intermediate traits, these scores aggregate the effects of individual SNPs into larger regional or whole-genome effects. Relevant traits can include diseases, biomarkers and gene transcription levels. For polygenic traits such as RA, for which multiple genetic susceptibility loci have been identified, we can construct locus-specific scores allowing us to examine the extent to which drug response is related to genetic heterogeneity of the disease.
In the current study, we incorporated available genetic information on susceptibility to RA,9 immune cell traits from a publicly available bioresource10 and expression or methylation of genes implicated in response to TNFi treatment in RA.11 The genotypic scores associated with these intermediate traits were then tested for association with response to TNFi; by reducing the number of hypotheses being explored, the thresholds for claiming statistical significance are relaxed, which could help identify useful predictors.
Materials and methods
We used a sample of 2938 individuals of European ancestry for whom complete clinical and GWAS data were available. This sample comprised individuals from a pre-existing international collaboration formed to study the genetics of response to TNFi agents12 and individuals recruited to the Biologics in Rheumatoid Arthritis Genetics and Genomics Study Syndicate (BRAGGSS) after 2013.4
Table 1 shows sample sizes, phenotypes and clinical variables for each of the data collections used in this study. All participants provided informed consent, and institutional review board/ethics approvals were in place as described in Cui et al 12 and Massey et al.4
Definition of response to TNFi treatment
In RA, response to treatment is quantified by change in the Disease Activity Score (DAS), which depends on four measurements: ESR, SJC, tender joint count (TJC) and patient global health assessment rated on a visual analogue scale (GHVAS). Previous work has shown that only the SJC and ESR measurements have evidence of non-zero heritability4 and correlate significantly with synovitis quantified by ultrasound or MRI.13 14 Since TNFi were developed to control synovitis, we used the two objective components of the DAS (ESR and SJC) as primary outcomes for evaluating genetic effects and the two subjective components (TJC and GHVAS) and the composite score (DAS28-ESR4) as secondary outcomes. For each outcome, a baseline measurement was taken before initiation of TNFi treatment, and a follow-up measurement was taken between 3 and 6 months after initiation of TNFi treatment. The measurements for each component were transformed in accordance to the DAS28-ESR4 formula (see online supplementary methods). Response was modelled as the difference between the baseline and the follow-up measurement.
Genotypic risk scores
We used the GENOSCORES platform (https://pm2.phs.ed.ac.uk/genoscores/) to compute genotypic risk scores for the intermediate traits. GENOSCORES is a database of published SNP to trait associations from a large number of well-powered GWAS, including GWAS of disease traits, biomarkers, gene expression and methylation. The database is accompanied by a software package implemented in R that can be used to compute genotypic risk scores and run downstream statistical analyses in cohorts with SNP data.
We queried the GENOSCORES database for genetic associations with RA risk,9 149 heritable immune cell traits reported by Roederer et al,10 and whole-blood expression and methylation for 93 genes reported in a recent meta-analysis11 as differentially expressed before treatment between responder and non-responder patients with RA treated with TNFi. GWAS summary statistics reported by Westra et al 15 and Gusev et al 16 were used for expression quantitative trait loci (eQTLs). GWAS summary statistics reported by Gaunt et al 17 were used for methylation quantitative trait loci (mQTLs).
GWAS summary statistics for each intermediate trait were filtered at p value< . SNPs were then split into trait-associated regions, with regions defined as genomic loci containing at least one SNP with p value< . Only 19 of the immune cell traits had a corresponding trait-associated region. SNPs not assigned to a region were discarded. For each trait-associated region, a genotypic score was computed as a sum of SNP genotypes, g , weighted by the effect size estimates, β (log OR for binary traits, regression slope for quantitative traits) and adjusted for linkage disequilibrium. The regional score, , for an individual i , trait t and region r was computed as: , where denotes the SNP–SNP correlation matrix in genomic region r .
Additional details about the GENOSCORES platform, the score computation and the specific regional scores used in this study are given in online supplementary materials (see online supplementary methods, online supplementary tables S1–S5, online supplementary figure S1).
To evaluate genetic prediction of response to TNFi, we compared a model with clinical covariates only to a model with clinical covariates and genotypic scores for each type of intermediate trait. To avoid numerical instabilities, we removed highly correlated scores prior to fitting a model (see online supplementary methods). The number of filtered regional scores for each type of intermediate trait is shown in table 2.
We expected that only a subset of genotypic scores would be relevant for prediction of response to TNFi, and thus used a hierarchical shrinkage prior for the score coefficients. We implemented the prediction models in STAN18 using a horseshoe prior distribution and performed inference with Markov chain Monte Carlo sampling.19 20 To rank the importance of genotypic scores in a model, we applied projection predictive variable selection, an approach that projects posterior draws from the high-dimensional model to lower dimensional subspaces.21
We used a statistical model with the following clinical covariates: measurements for the four DAS components before initiation of TNFi treatment, gender, whether the patient was concomitantly treated with any non-biologic disease-modifying antirheumatic drugs (DMARDs), cohort (which is also a proxy for country), genotyping array and the 10 first principal components computed from the genotypic data of the full data set. We used individuals with complete measurements in the statistical analyses of each TNFi response outcome (see online supplementary table S6).
Evaluation of prediction
We used two measures to quantify improvement in prediction: the difference in log-likelihood between a model with clinical covariates and genotypic scores and a model with clinical covariates only (measured in natural log units (nats)); and the per cent of residual variance explained by the genotypic scores. Both measures were computed on the testing data from 10-fold cross-validation on the full data set.
For readers who prefer a frequentist interpretation, the asymptotic equivalence of model choice by cross-validation and Akaike’s information criterion (AIC)22 implies that a p value of 0.01 for comparison of nested models is equivalent to a difference in test log-likelihood of 2.3 nats (likelihood ratio of 10) for models differing by one extra parameter. The large sample size of this study means that small robust increments in predictive performance can be detected.
For models with a test log-likelihood difference of at least two nats, we further examined the univariate associations between genotypic scores and the TNFi response outcomes. We used the full data set to test the univariate association and included the same clinical covariates as in the multivariate prediction, in addition to a score.
For genotypic scores significantly associated with TNFi response at the Bonferroni-corrected p value threshold, we compared the estimated effects among groups receiving different TNFi agents. We considered etanercept, adalimumab and infliximab, where a large enough sample size was available. Additionally, we tested if the associations held when we adjusted for additional covariates: anticitrullinated protein antibodies (ACPA) status and smoking status. These covariates have been reported to influence TNFi response but were only available for a third of the samples, and thus were not included in the full models.
A diagram of the statistical analysis pipeline is given in online supplementary figure S2.
In the following sections, we focus on the results for the primary TNFi response outcomes. The results for secondary outcomes are discussed in the online supplementary results (see online supplementary table S7).
RA genotypic scores
Prediction of response to TNFi as quantified by Δ SJC improved by including the regional genotypic scores for RA risk in a penalised regression model (table 2). The test log-likelihood increased by 5.3 nats, suggesting that some of the genetic drivers for RA are also influencing response to TNFi; however, the absolute improvement in prediction was small, with less than 1% of phenotypic variance being explained by the RA genotypic scores.
The regional score at the CD40 locus had the highest explanatory power for both response phenotypes (figure 1). The direction of the effect was consistent for response as measured by both phenotypes, with higher RA load at the CD40 locus being associated with better TNFi response. The univariate association of the RA score at the CD40 locus with Δ SJC passed the p value threshold corrected for the number of RA scores and two response phenotypes (table 3).
The regional score for RA at the CD40 locus was correlated with a cis-acting eQTL score for CD40 expression in whole blood (correlation=0.65) and a cis-acting mQTL score for methylation of CD40 in whole blood (correlation=−0.70). The strongest association between response to TNFi and genotypic scores at the CD40 locus was with the score for RA (table 3). The estimated effect did not change when we stratified by TNFi agent and when we adjusted for ACPA and smoking status (see online supplementary tables S8 and S9).
Genotypic scores for immune cell traits
Prediction of response to TNFi as measured by Δ ESR improved by adding penalised genotypic scores for immune cell traits in the model (table 2). In univariate analyses, the regional scores for a number of immune cell traits at the ENTPD1 locus (which codes for CD39) had suggestive associations with Δ SJC (table 3). The association of Δ SJC with the genotypic score for ‘CD39 on CD4 T cells’ at the ENTPD1 locus passed the p value threshold corrected for the number of immune cell trait scores tested (470) and two response phenotypes. This score was correlated with the cis-acting eQTL score for ENTPD1 (correlation=0.65). Higher score for ‘CD39 on CD4 T cells’ at the ENTPD1 locus was associated with worse TNFi response as quantified by Δ SJC. There was no association between Δ ESR and genotypic scores at the ENTPD1 locus (table 3) and no association between either Δ ESR or Δ SJC and a genotypic score for cell subset frequency of CD73, which is the second ectonucleotidase involved in adenosine production in regulatory T cells (see online supplementary results).
Previously, Peres et al 23 showed that low expression of CD39 on peripheral regulatory T cells was associated with worse response to methotrexate (MTX) in patients with RA. We note that the direction of the effect is reversed, but this is not unlikely since we considered response to TNFi. To further investigate the effect of ‘CD39 on CD4 T cells’ expression on TNFi response, we performed two analyses stratified by concomitant treatment. Information on whether a patient was receiving a concomitant non-biologic DMARD was available for all samples, while information on whether a patient was specifically receiving concomitant MTX treatment was available for a subset of patients from the BRAGGSS cohort. Table 4 shows the effect of the genotypic score for expression of ‘CD39 on CD4 T cells’ on Δ SJC for patients receiving TNFi treatment stratified by concomitant treatment with either any non-biologic DMARD (top) or specifically with MTX (bottom). The effect became stronger in the groups receiving concomitant treatment and attenuated in the group receiving TNFi monotherapy; the CIs among all groups overlapped. Similarly, we did not detect statistically significant differences when we stratified by TNFi agent and when we adjusted for ACPA status (see online supplementary tables S8 and S9).
eQTL and mQTL scores
Prediction of response to TNFi as quantified by both Δ SJC and Δ ESR improved by adding eQTL and mQTL scores of implicated genes in a penalised regression model (table 2). Of the 93 genes reported in Kim et al 11 as differentially expressed between responders and non-responders to TNFi in RA, 54 genes had at least one eQTL, 54 genes had at least one mQTL and 36 had both. The test log-likelihood increased by 3.4 and 2.9 nats for Δ SJC and Δ ESR, respectively, by adding the eQTL scores in the model. We did not see a further improvement by adding genotypic scores for mQTLs.
In the largest international study of TNFi response to date, we have shown how using methods that leverage information from relevant intermediate traits can identify predictors of TNFi response. In a recent crowdsourced effort to use machine learning to construct a predictor of response to TNFi in RA, including SNP genotypes did not improve prediction beyond that obtained with clinical covariates alone.24 Genotypic prediction of psychiatric disorders and related phenotypes has been shown to improve by exploiting genetic correlations among multiple related traits,7 25 and methods have been extended to incorporate polygenic scores for multiple traits.8
In the current study, we have combined these approaches and implemented them in a newly developed platform, called GENOSCORES, which contains GWAS data for multiple traits and automates construction of genotypic scores. For polygenic traits with multiple trait-associated loci, such as RA, locus-specific scores can be constructed to examine how genetic heterogeneity of the intermediate trait can influence the trait of interest. Our approach reduces the dimensionality of the prediction task from about 2 million common SNPs to a few hundred or a few thousand genotypic scores, depending on how relevant traits are selected. The score constructions are a type of feature engineering, a task commonly used in machine learning applications.
Understanding the pathogenic mechanisms that initiate and perpetuate RA could give rise to informative biomarkers of prognosis, therapeutic response and toxicity.26 However, in agreement with earlier studies,27 we did not find strong predictors of TNFi response among alleles linked to the development of RA. A strength of the current study is the large sample size which allowed us to detect small robust increments in predictive performance. Our methodological approach was to first establish the predictive value for a set of genetic markers using a multivariate model and then to examine univariate associations between each marker and the outcome. Using this approach, we showed that a model including RA scores led to a small robust improvement in prediction, with the regional score at the CD40 locus driving the predictive signal.
Higher RA risk at the CD40 locus, higher CD40 transcription and lower CD40 methylation were associated with better TNFi response. CD40 is a transmembrane protein which belongs to the TNF receptor superfamily, critically important in modulating immune-(auto-immune) responses.28 CD40 is expressed by B cells and antigen-presenting cells (APCs), whereas CD40L is induced on CD4+ T cells following T-cell antigen receptor (TCR) with major histocompatability complex (MHC) molecule interaction. Engagement of the CD40–CD40L axis leads to B cell activation, proliferation and (auto)-antibody production, while activation of APCs by CD40L on CD4+ T cells induces upregulation of CD80, CD86, MHC class I and MHC class II, as well as secretion of proinflammatory cytokines such as interleukin (IL)-12, IL-23 and TNF- α .29 30
The risk allele associated with RA is associated with elevated CD40 expression in whole blood.31 As high CD40 expression is associated with elevated TNF- α production and CD40 and CD40L transcripts are increased in the disease tissue in both early and established disease,32 it is not surprising that patients with the CD40 risk allele respond better to TNFi therapies.
Overall, if there are genetic loci that predict TNFi response, these are mostly different to the known RA risk loci. However, we note that patients who receive TNFi therapy are likely to have more severe disease and to have failed on other treatments. It is therefore possible that our study sample has been selected with respect to genetic load for RA, thus limiting the heterogeneity in genetic RA risk profiles compared with a sample of newly diagnosed cases.
The genotypic score for the expression of the ectonucleotidase CD39 on CD4 T cells was inversely associated with TNFi response. The SNPs contributing to this score are in the ENTPD1 gene which encodes CD39. In stratified analyses, the inverse association with response was stronger in the groups receiving TNFi concomitantly with MTX or another non-biologic DMARD compared with the group receiving TNFi monotherapy and was stronger in the group receiving infliximab compared with the groups receiving adalimumab or etanercept. The CIs of the estimated effects among all groups overlapped. This effect on response to TNFi agents is in the opposite direction to the association reported between low expression of CD39 on regulatory T cells and resistance to MTX in RA.23
Interpreting the association between drug response and the CD39 trait is difficult both epidemiologically and mechanistically. The reasons for this being that in the UK and most European countries TNFi are usually prescribed only after patients have had a poor response to MTX and, unless of intolerance, always in combination with MTX. Therefore, these patients are likely to represent a selected group; this, together with the almost universal use of combination MTX/TNFi therapy, hinders progress in dissecting the potential mechanisms responsible for the divergence. Nonetheless, as RA is a highly heterogeneous disease, it is plausible to speculate that different cellular and molecular networks may be involved in driving diverse immune/inflammatory responses in different patients to different drugs. For example, as mentioned above, the poor response to MTX has been associated with low CD39 expression by regulatory T cells, while increased CD39 expression has been reported to be important in the expansion of Th17 cells driven by IL-6 and TGF-β via Stat3 and Gfi-1 transcription factors.33 In turn, the expansion of Th17 cells has been reported to be associated with incomplete response to TNFi.34 It remains to be established whether measurements of CD39/ENTPD1 expression or genotype may be useful in the choice of MTX, TNFi agent or concomitant treatment as first-line therapy for patients who need a DMARD.
Using previously reported associations between transcripts and TNFi response to select relevant genes, we have shown evidence that eQTL scores for these genes contain information that predicts TNFi response, even though the proportion of variance explained was low and no single genes associated with response could be identified.
Improved genomic prediction of treatment response requires measuring response more precisely to capture the molecular in addition to the clinical phenotype. The detected associations between genotypic scores and TNFi response were not the same for the two measures of response—change in SJC and change in ESR—suggesting that the two measures reflect different aspects of disease activity affected by TNFi. To derive refined measures of drug response, large data sets with multiple inflammatory biomarkers, joint imaging and clinical variables before and after treatment are needed.
We thank all the patients who have contributed to this research and clinical staff who supported patient recruitment. We thank the Medical Research Council and Arthritis Research UK (ARUK) for their joint funding of the Maximising Therapeutic Utility in Rheumatoid Arthritis (MATURA) study. We thank MATURA consortium members for their input and feedback during the design of this project, and we would like to give particular mention to Felix Agakov, Jennifer H Barrett and Heather J Cordell. The list of MATURA collaborators and affiliations is available in the online supplementary information (table S12). We thank Peter K Gregersen for providing access to clinical and genetic data from the ABCoN cohort. We thank Stephanie Ling for collating ACPA data from the BRAGGSS cohort into a usable format. We thank the anonymous reviewers for their constructive comments and suggestions.
Handling editor Josef S Smolen
Correction notice This article has been corrected since it first published. Table 1 data aligmment has been corrected.
Collaborators MATURA Consortium.
Contributors Substantial contributions to conception or design of the study: AS, PM, AB, CP. Substantial contributions to drafting the manuscript: AS, PM. Substantial contributions to data acquisition: JC, MJHC, KI, HY, SS, LP, SLB, RPK, YO, PLCMvR, GW, IEvdH-B, NdV, PPT, KO, HC, H-JG, TWJH, LAC, SR, MW, AGW, XM, JDI, AWM, AB. Substantial contributions to data analysis or interpretation: AS, PM, MC, DP, NN, CP. All authors contributed to revising the manuscript critically for important intellectual content and approved the final manuscript. The funding agencies had no part in writing or reviewing the manuscript.
Funding This study was supported by Medical Research Council (MRC) MR/K015346/1 MATURA study. ARUK 20670 MATURA study.
Competing interests None declared.
Patient and public involvement statement This study was conducted as part of Work Stream 2 of the MATURA Consortium (http://www.matura.whri.qmul.ac.uk/what_is_matura.php). A patient advisory group was established in 2014 when the MATURA project commenced. The group meets regularly to:
Ensure MATURA strategy is maintaining relevance, accountability and direction by embedding patients and members of the public within the decision making processes.
Determine what level of confidence in tests, and what type of tests, would be acceptable to patients for treatment decisions.
Maximise patient recruitment to research studies by increasing awareness through patient groups.
Readily obtain patients’ perspective on grant applications related to stratified medicines for RA.
Facilitate the dissemination of the results from MATURA research, for instance by producing lay summaries of papers in conjunction with the researchers (http://www.matura.whri.qmul.ac.uk/news.php).
Patient consent for publication Not required.
Ethics approval This study used anonymised data for human subjects from an international collaboration of 13 studies originally published in PLOS Genetics 2013;9:e1003394. All participants provided informed consent and institutional review board and ethics approvals were in place for each of the studies and described in the original publication.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data are available upon reasonable request.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.