Low copy numbers of complement C4 and C4A deficiency are risk factors for myositis, its subgroups and autoantibodies

Background Idiopathic inflammatory myopathies (IIM) are a group of autoimmune diseases characterised by myositis-related autoantibodies plus infiltration of leucocytes into muscles and/or the skin, leading to the destruction of blood vessels and muscle fibres, chronic weakness and fatigue. While complement-mediated destruction of capillary endothelia is implicated in paediatric and adult dermatomyositis, the complex diversity of complement C4 in IIM pathology was unknown. Methods We elucidated the gene copy number (GCN) variations of total C4, C4A and C4B, long and short genes in 1644 Caucasian patients with IIM, plus 3526 matched healthy controls using real-time PCR or Southern blot analyses. Plasma complement levels were determined by single radial immunodiffusion. Results The large study populations helped establish the distribution patterns of various C4 GCN groups. Low GCNs of C4T (C4T=2+3) and C4A deficiency (C4A=0+1) were strongly correlated with increased risk of IIM with OR equalled to 2.58 (2.28–2.91), p=5.0×10−53 for C4T, and 2.82 (2.48–3.21), p=7.0×10−57 for C4A deficiency. Contingency and regression analyses showed that among patients with C4A deficiency, the presence of HLA-DR3 became insignificant as a risk factor in IIM except for inclusion body myositis (IBM), by which 98.2% had HLA-DR3 with an OR of 11.02 (1.44–84.4). Intragroup analyses of patients with IIM for C4 protein levels and IIM-related autoantibodies showed that those with anti-Jo-1 or with anti-PM/Scl had significantly lower C4 plasma concentrations than those without these autoantibodies. Conclusions C4A deficiency is relevant in dermatomyositis, HLA-DRB1*03 is important in IBM and both C4A deficiency and HLA-DRB1*03 contribute interactively to risk of polymyositis.


1.
First, IIM patients were segregated into groups based on C4A deficiency and the effects of HLA-DRB1*03 was investigated. In the presence of C4A deficiency, DRB1*03 did not have a significant impact on increasing the risks of JDM, DM and PM but the influence of DRB1*03 on IBM was substantial (Table 4) Second, we asked if DRB1*03 was a significant risk factor when IIM patients were segregated based on C4 gene length. Among patients with C4L=0+1+2, DRB1*03 was a prominent risk factor for IBM as 97.1% had DRB1*03 with an OR of 20.9 (4. 99-87.8; p=1.3x10 -10  Effects of C4A deficiency and low GCNs of C4L under the presence and absence of HLA-DRB1*03. IIM patients were then segregated based on HLA-DRB1*03 status and we asked if C4A deficiency or low GCN of C4L were significant on increasing the risk of IIM and its subgroups. In the presence of DRB1*03, C4A deficiency was a significant risk factor for JDM and DM with OR of 2.65 (1.24-5.68) and 2.15 (1.27-3.65), respectively. The same phenomenon did not hold for PM and IBM. In the absence of DRB1*03, C4A deficiency was a highly significant risk factor for DM (p=4.5x10 -5 ) and PM (p=0.0009) with ORs of 4.05 (2.11-7.80) and 3.18 (1.63-6.20), respectively. However, C4A deficiency did not contribute significantly to JDM and IBM risk. Similar contingency analyses were performed for IIM patients with and without low C4L GCN in the setting of defined DRB1*03 backgrounds. The results were similar to those observed for C4A deficiency and BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)  Table 3. Both C4A deficiency and low copy number of C4L were not significant risk factors for IBM, irrespective of the status of DRB1*03.
Parallel contingency analyses revealed that when DRB1*03 was present, C4B deficiency was a protective factor for JDM ]. When DRB1*03 was absent, C4B deficiency was a risk factor for IBM, which had an OR=2.35 (1.08-5.14) (Table S3). This is an important advance for the alleles of complement C4 and HLA-DRB1 on the risk of an autoimmune disease.
We determined complement C4 and C3 plasma protein levels using EDTA-plasma from IIM patients. Each additional C4 gene copy increased mean C4 plasma protein levels by 66.9 mg/L. The regression formula was C4 protein (mg/L) = 59.1 + 66.9*GCN of C4T. The net yield of plasma C4 protein per gene copy (C4P/G) was slightly reduced for C4T or long genes with increases of GCN [regression formulae: C4P/G (mg/L) =104.6 -5.6*GCN of C4T; or C4P/G = 108.4 -8.9*GCN of C4L]. It is worthy to point out that C4 GCN variations are integral determining factors for plasma C4 proteins, and C3 is downstream of C4 in two complement activation pathways.
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) *OR, odds ratio between "Y" and "N" of individual alleles in CTL (controls) or IIM (idiopathic inflammatory myopathies). N, no or absence; Y, yes or presence. no., number.
Descriptions for Table S4.
Specific alleles of HLA genes are known to have strong linkage disequilibrium (LD) to form ancestral haplotypes (AH) with reduced frequencies of recombination between generations. A genetic recombination between DRB1*03, the presence of short C4B1 genes and C4A-deficiency would be implicated by multigeneration genetic studies that leads to dissociation of those alleles in haplotypes.
Among It is worthy to point out that HLA-DRB1*03 (DR3), C4A-deficiency (C4A=0+1), C4 gene without endogenous retrovirus HERV-K(C4) (C4L=0+1+2), C4B≥2 with the presence of C4S (C4S ≠ 0), HLA-DQA1*05 and HLA-DQB1*02 together form the ancestral haplotype (AH) 8.1. Our C4 data revealed extremely tight association remained among these alleles in the populations of healthy controls and among IIM patients, with considerable diversity or dissociations among them. Those dissociations were likely the cumulative results of historic genetic recombinations. Nevertheless, the degrees of association among specific alleles of the HLA and complement C4 gene copy number variants remain extremely striking. . Gene copy number (GCN) variations of human complement C4 with inherent diversities in GCNs for C4A and C4B, and for long (C4L) and short (C4S) genes among healthy subjects of European descent from the US and European countries were studied experimentally to decipher the variations and frequencies of gene copy number groups by Southern blot analyses and/or by TaqMan-based real-time PCR.
Copy number variations of total C4, C4A, C4B, long genes and short genes in healthy controls.
Gene copy numbers (GCN) for total C4 (C4T), C4A, C4B, long genes (C4L) and short genes (C4S) were determined in 3526 healthy control subjects (Fig. S1). There was a continuous variation in the copy number of C4T genes from 2 to 8 among control subjects studied. The mean GCN and standard deviation (± SD) was 3.833± 0.762. Categorically, the most common GCN group of C4T was 4, at a frequency of 55.9%. Healthy subjects had 3 and 2 copies of C4T at frequencies of 26.5% and 3.29%, respectively. Combined, 29.8% of healthy subjects had low copy numbers (C4T=2+3). Control subjects with 5, 6, 7 and 8 copies of C4T occurred at frequencies of 12.4%, 1.74%, 0.14%, and 0.03%, respectively. Together, high copy numbers (C4T=5 to 8) constituted 14.3% of the healthy subjects. The low (C4T=2+3), medium (C4T=4), and high (C4T=5 to 8) total C4 GCN groups had a distribution of 0.30, 0.56, and 0.14, respectively. There was an inherent bias towards low C4T copy number variants and a concomitant decrease in the frequency of high copy number variants within the white population.
Among total C4 genes, approximately 54% coded for C4A and 46% coded for C4B (Fig. 2). There was very strong and positive relationship in the variation of GCN between C4T and C4A with an R 2 of 0.499 (p=4.2x10 -322 ). While highly significant, the relationship between the GCN of C4T and C4B was negative with R 2 =0.061, p=3.93x10 -72 . Similarly, the copy numbers of C4A and C4B were negatively correlated, with an R 2 of 0.222, p=3.5x10 -281 .
Long genes and short genes. The copy number of C4L varied between 0 and 8, with three copies being the most prevalent with a frequency of 34.4% among our healthy subjects. The frequencies for the low and high copy number groups of C4L were quite evenly distributed, with a total of 33.4% for C4L=0, 1, or 2 and 32.2% for C4L=4, 5, 6, 7, and 8.
The copy number of C4S varied from 0 to 4 but skewed heavily towards the low end among our healthy subjects: 32.3% had zero copies and 47.7% had a single copy of C4S. Those with 2, 3, and 4 copies of short genes constituted 17.7%, 2.01%, and 0.20% of all controls, respectively.
Overall, three quarters (75.4%) of C4T were C4L and 24.6% C4S. There were direct, linear, and very strong relationships between the copy numbers of long genes with C4T or C4A, whose R 2 were 0.485 and 0.502, respectively (p=4.2x10 -322 ). The correlation between copy numbers of long genes and C4B was loose (R 2 = 0.0115) with a tendency of being inversely correlated (p=2.3x10 -10 ).
In brief, our data on the relatively large population of healthy control subjects strengthens our previous observations 24 30 on the continuous gene copy number variation of total C4, C4A, and C4B, as well as the size dichotomy between long and short C4 genes. Among our control subjects, the mean gene copy numbers were 3.83 for C4T, 2.10 for C4A and 1.73 for C4B. The distributions of C4T and C4B were skewed towards lower GCN, but for C4A towards slightly higher copy number. Three quarters of total C4 were long genes and the remainder were short. There were strong and positive correlations between C4T GCN and both C4A and C4L with R 2 values between 0.48 and 0.52. In contrast, the copy numbers of C4B and C4S were both inversely correlated with C4A and their relationships with C4T were unremarkable. While plasma C4 protein levels increased with C4 GCN, the net C4 protein yield per copy of C4 gene decreased with increasing GCNs of C4T or C4L. Such phenomenon may be due to less efficient transcription of the longer genes, or due to the promoter activity of the 3' LTR from the endogenous retrovirus HERV-K(C4) inserted into C4L, which might generate antisense transcripts to modulate C4 biosynthesis as proposed previously. 26 27 BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)  While gene copy number variants were elucidated for almost all of the study population (>3500 controls and >1500 IIM patients), four-digits data for HLA-DQA1, DQB1 and DRB1 data were available to us for 1204 British healthy controls and 810 IIM patients (750 from UK, 60 from NIH). Moreover, two-digits DRB1 data were generated from an additional 625 healthy controls in the US.

METHODS.
For each subgroup, significant risk factors were identified by fit "Y-by-X" analyses using ANOVA for continuous data,  2 analyses for categorical data with the JMP16 software. For each HLA-DQB1, DQA1, and DQB1 genetic variant, the absence was coded 0, heterozygous and homozygous presence were coded 1 and 2, respectively. A separate column was created with "N" for the absence, and "Y" for the presence of a variant or for those coded with "1" and "2".
Inclusion and exclusion criteria for logistic regression models: In each subgroup, parameters with p-value <0.05 were eligible to enter into initial regression models. For each variant, numeric parameters (0,1,2) for HLA, GCNs or frequencies for C4, or dichotomous parameters (Y/N) were selected (or entered into the initial regression models) for the ones with smaller p-values. For the HLA variants, those with two-digits were analyzed both numerically (0,1,2) and categorically (Y/N). In some cases, four-digit variants among each two-digit group yielded opposite results with p-values <0.05 and they both entered into the initial regression analyses.
After the initial regression analyses for each subgroup, parameters yielding the largest p-values or with p>0.05 were removed (or excluded) one at a time, until every parameters remained in the model had a p-value <0.05.
Protective factors (with reduced frequencies in disease): long C4 genes among total C4 (C4L/C4T, p=0.0076), GCNs of C4L (p=0.012), and C4A (p=0.021), and DRB1*13 (p=0.0499). Among myositis patients, complement C4 protein in addition to the genetic variants described above also played a role to modulate by immune-complex mediated consumption -some of the myositis-related autoantibodies could form immune complex with host antigens to activate complement.

JDM
Some variants for HLA were genetic risk factors for IIM in case/control studies, but had complex roles as they would be protective factors in intragroup analyses for presence and absence of myositis autoantibodies (e.g., DR2 was a protective factor for genetic risk, but a risk factor associated with MAA. Similarly, DQB1*0201 was a strong genetic risk factor for PM, but a moderate protective factor for MSA.