Lifetime risk of rheumatoid arthritis-associated interstitial lung disease in MUC5B mutation carriers

Objectives To estimate lifetime risk of developing rheumatoid arthritis-associated interstitial lung disease (RA-ILD) with respect to the strongest known risk factor for pulmonary fibrosis, a MUC5B promoter variant. Methods FinnGen is a collection of epidemiological cohorts and hospital biobank samples, integrating genetic data with up to 50 years of follow-up within nationwide registries in Finland. Patients with RA and ILD were identified from the Finnish national hospital discharge, medication reimbursement and cause-of-death registries. We estimated lifetime risks of ILD by age 80 with respect to the common variant rs35705950, a MUC5B promoter variant. Results Out of 293 972 individuals, 1965 (0.7%) developed ILD by age 80. Among all individuals in the dataset, MUC5B increased the risk of ILD with a HR of 2.44 (95% CI: 2.22 to 2.68). Out of 6869 patients diagnosed with RA, 247 (3.6%) developed ILD. In patients with RA, MUC5B was a strong risk factor of ILD with a HR similar to the full dataset (HR: 2.27, 95% CI: 1.75 to 2.95). In patients with RA, lifetime risks of ILD were 16.8% (95% CI: 13.1% to 20.2%) for MUC5B carriers and 6.1% (95% CI: 5.0% to 7.2%) for MUC5B non-carriers. The difference between risks started to emerge at age 65, with a higher risk among men. Conclusion Our findings provide estimates of lifetime risk of RA-ILD based on MUC5B mutation carrier status, demonstrating the potential of genomics for risk stratification of RA-ILD.


INTRODUCTION
Interstitial lung disease (ILD) is one of the most common extra-articular manifestations of rheumatoid arthritis (RA). 1 The cumulative risk of developing clinical ILD during the RA disease course has varied in different studies, ranging from 5.0% to 7.7% in long-term follow-up studies of RA cohorts [1][2][3] to up to 10% in a study using death records. 4 Even higher estimates for subclinical radiographic findings consistent with ILD have been observed in patients with RA, ranging from 19% to 33%. [5][6][7] Although the RA-ILD course can vary, the disease is associated with significantly increased mortality compared with patients with RA without ILD. 3 4 8 Clinical risk factors for RA-ILD include older age, male gender, tobacco smoking, high levels of anticitrullinated protein antibodies and disease activity. 2 9 The strongest known genetic risk factor for idiopathic pulmonary fibrosis (IPF) is the common variant rs35705950, a promoter variant near the MUC5B gene. 10 A recent case-control study has demonstrated that the MUC5B promoter variation is associated with an increased risk of ILD among patients with RA. 11 The aim of this study was to evaluate the lifetime risk of ILD in patients with RA, comparing the risk to the population, and estimate how the MUC5B promoter variant modifies these risks in the real-world setting.

METHODS
FinnGen is a collection of prospective epidemiological and disease-based cohorts, and hospital biobank samples. The unique personal identification number links the genotypes to multiple nationwide registries, and cases were identified through the national hospital discharge registry (starting from 1968) including both inpatient and outpatient data, the national death registry (1969-) and the medication reimbursement registry (1964-).
RA was defined as patients having medication reimbursement for inflammatory rheumatic diseases (code 202), with an additional requirement of two contacts with the International Classification of Diseases, Tenth Revision (ICD-10) codes beginning with M05 (seropositive RA) or M06 (seronegative RA). In our recent validation study of RA diagnoses in Finnish biobank patients (unpublished), this combination resulted in a positive predictive value of 0.87 compared with chart review. Negative predictive value for any RA diagnosis was 1.0. Those without RA who had other inflammatory rheumatic diseases or inflammatory bowel disease were excluded.
ILD cases were identified with J84, M05.1/J99.0 (ICD-10), 515, 516 (ICD-9) or 484.99 or 517.01 (ICD-8) with following criteria: (1) the first and only record in the death registry or (2) after the initial diagnosis, a second contact (or death due to ILD) was required within 5 years, that is, we excluded individuals with no further healthcare contacts with ILD within 5 years. No exclusions were made based on temporality of RA and ILD. For both RA and ILD, age at onset was defined as age at first registered diagnosis.
For MUC5B (mucin 5B, oligomeric mucus/gel-forming), we studied carriers of the minor allele for the promoter variant rs35705950 (G>T) with minor allele frequency 0.1 (no enrichment compared with non-Finnish Europeans 12 ) and mean INFO 0.948 indicating high imputation quality. Individuals homozygous for the variant were analysed jointly with the heterozygotes.
Start of follow-up was set at birth, with follow-up ending at the first record of the endpoint of interest, death, or at the end of follow-up on 31 December 2019, whichever came first. Using the Cox proportional hazards model, we estimated adjusted HRs and 95% CIs (CI). With age as time scale, all regression models were stratified by sex, adjusted for 10 principal components of ancestry, FinnGen genotyping array and cohort. We report cumulative incidences with 95% CIs by age 80. We used R V.3.6.3. Detailed information on genotyping, disease definitions and analyses are provided in online supplemental methods.

Patient and public involvement
This study was carried out without direct patient and public involvement.

RESULTS
Among 293 972 individuals (mean age at the end of follow-up: 59.8, SD: 17.3, 56.4% women), we identified 1965 patients (1172 men, 793 women) diagnosed with ILD by end of follow-up. Out of 6869 patients with RA (mean age at onset: 49.4, SD: 14.9, 71.1% women), 247 (3.6%) had been diagnosed with ILD. Out of these 247 individuals, 20 (8.1%) had been diagnosed with ILD >1 year before the earliest record of RA, 36 (14.6%) within a year prior to or after the earliest record of RA and 191 (77.3%) >1 year after. Out of patients without RA, 19.3% were MUC5B carriers, and out of patients with RA, 20.9%. Among all individuals in the dataset, the MUC5B promoter variant rs35705950 was associated with ILD with a HR of 2.44 (2.22-2.68, p=3.87×10 −77 ), and among patients with RA, with a HR of 2.27 (1.75-2.95, p=8.15×10 −10 ). In a formal test for interaction by introducing an interaction term in the regression model, we found no evidence of an interaction between MUC5B and RA (p=0.16). These interaction tests indicate that the effect of MUC5B is similar in the population and in patients with RA.

DISCUSSION
In this large observational cohort study, we demonstrate that a combination of RA and MUC5B variation confers a 10-fold elevated risk of ILD compared with the population. Every sixth

Rheumatoid arthritis
patient with RA carrying the MUC5B risk allele was diagnosed with ILD by age 80, and the risk rapidly increased after age 65. A case-control study by Juge and colleagues recently demonstrated enrichment of MUC5B carriers in patients with RA-ILD, with supporting evidence from gene expression in lung parenchyma and high-resolution imaging. 11 Using large-scale biobank data, we now show how this finding translates to lifetime risks and demonstrate the potential of genomics for risk stratification of RA-ILD and early identification of patients.
Prevalence of RA-ILD shows high variability in the literature depending on the population, diagnostic methods and disease definitions used. 13 Our lifetime risks compare well with previous estimates of clinically significant disease, reported to occur in up to 5%-10% of patients with RA. [2][3][4] We show that the effect of MUC5B is similar in the population and in patients with RA, but as both MUC5B and RA are important risk factors of ILD, patients with RA who are MUC5B carriers are at a much higher risk of ILD than MUC5B carriers without RA.
The common variant rs35705950 in the MUC5B promoter is strongly associated with upregulation of MUC5B expression in the lungs, and the general association between the variant and ILD has been widely replicated. 10 11 14 In addition, evidence from fine-mapping indicates that rs35705950 might be a causal variant: Bayesian fine-mapping analyses of genome-wide association study (GWAS) results can be used for defining variant sets (credible sets), that with high probability contain one or several causal variants. Several sources report rs35705950 as the only variant in the credible sets for the locus in GWASs on ILD and IPF. 15 16 We were unable to account for some important risk factors, such as smoking and disease activity, and did not consider other common or rare genetic risk factors, 14 17 all of which are likely to further contribute to the risk. We did not have information about histological or radiological patterns of ILD. The study was limited to individuals of European ancestry, but MUC5B may be a relevant risk factor also in other populations 11 , although many have allele frequencies that are much lower. 12 With a prevalence of 2.3% for RA and 0.7% for ILD, our sample is slightly enriched in cases, which may affect our estimates. Although ILD was identified through healthcare registries, recurring healthcare encounters were required to reduce the proportion of false positives in our study, and the long-term risk of ILD in patients with RA was in line with previous studies. [1][2][3][4] Patients with RA might be exposed to more chest imaging as part of their standard care and due to increased awareness for the risk of ILD particularly during recent years, which could overestimate the risk difference between patients with and without RA. We also observed a modest association between MUC5B and RA, which was replicated in UK Biobank. This association was not detected in a previous study with a smaller sample size by Juge and colleagues. 11 This tentative finding, which was clearer in men, requires further replication with consideration of other important risk factors, such as smoking. As the effects remained similar when excluding all patients with ILD, we propose that the temporal sequence of ILD and RA is unlikely to impact the association.
In conclusion, the MUC5B promoter variant is a common risk factor for ILD in patients with RA and confers a significantly elevated lifetime risk of ILD. This study demonstrates the potential of genomics for risk stratification of RA-ILD and highlights the importance of genetic predisposition on the development of RA-ILD. Studies are needed to further investigate the interaction of clinical and genetic risk factors in the development of RA-ILD, and the impact of MUC5B on outcomes of RA-ILD.