Introduction

Systemic lupus erythematosus (SLE) is a complex autoimmune disease of unknown aetiology and is primarily characterised by the production of autoantibodies directed against cell surface and nuclear components. Autoantibodies contribute to end-organ damage by a variety of mechanisms, and formation of immune complexes can result in glomerulonephritis, arthritis, rashes, serositis, and vasculitis.

SLE is estimated to affect about one in 2000 people in some populations, and the clinical presentation is diverse and can sometimes be fatal.1 There is a strong gender bias, with a female:male ratio of about 9:1 seen between the ages of 15 and 50 years.1, 2 Ethnicity influences this disease, and African-Americans/African-Caribbeans, and Hispanics are three times as likely to contract the disease compared to European-Caucasians.3, 1

The origin of SLE involves contributions both from the environment and from the genetic composition of the individual. Considerable evidence supports a genetic basis for susceptibility to SLE. Concordance rates in monozygotic twins range between 25 and 69%, while the rate is only 1–2% in dizygotic twins. Familial recurrence risks have been ascertained in SLE, and the recurrence risk of siblings of probands to the risk in the population as a whole (λs) is 20 for SLE.4

Given our current understanding of the pathophysiological mechanisms at play in SLE, we were interested in examining a potential role for the CD40 receptor (CD40) and CD40 ligand (CD40L) in disease susceptibility. This pair of interacting genes was selected based on genetic and functional data suggesting these as strong candidate genes.

CD40 (also known as TNFRSF5, p50, and Bp50) gene falls in the region 20q11–13, which has been linked with SLE in three independent investigations in European-Caucasians, Mexican-Americans, and African-Americans.5, 6, 7 CD40L (also known as TNFSF5, CD154, TRAP, and gp39) is located on Xq26.

The X chromosome was not examined in most of the SLE genome-wide scans. Only one study investigated the X chromosome and although no linkage was found this single study does not exclude a role for an X-linked gene effect in SLE.8 The gender bias in SLE prevalence and the increased frequency of disease in patients with Klinefelter's syndrome would be consistent with some genetic effects arising from the X chromosome.9

The CD40–CD40L pair has been well defined and regulates multiple phases of the humoral and cellular immune response. CD40 is expressed constitutively by B-lymphocytes, dendritic cells, endothelial cells and macrophages, whereas CD40L is upregulated on CD4+ T cells, platelets, mast cells, and basophils upon activation. Engagement of CD40 with CD40L induces the formation of memory B-lymphocytes, promotes immunoglobulin isotype switching, and is involved in thrombotic events.10, 11 Several groups have reported hyperexpression of CD40L by T cells and elevated soluble CD40L concentrations in human SLE.12, 13, 14

A rare inherited immune deficiency disorder, X-linked hyper IgM syndrome, is caused by mutations in CD40L. An autosomal recessive form of hyper IgM is caused by mutations in CD40.15 The hyper IgM disorders are characterised by defective IgG antibody formation, lack of immunoglobulin isotype switching, and impaired B differentiation. CD40L-deficient and CD40-deficient mice generated by gene targeting show defective class-switching, but do not exhibit the spontaneous hyper IgM.16

The critical role of CD40–CD40L in B-cell activation is such that CD40 has been considered as a therapeutic target in SLE. Administration of anti-CD40 ligand antibody has been found to limit lupus nephritis.17, 18 Furthermore, anti-CD40L antibody in clinical trials in patients with SLE have shown promising results and that the drug has immunomodulatory actions.19, 20

We have defined for the first time a high-resolution haplotype structure for CD40 and CD40L and examined association of these haplotypes with SLE in European-Caucasians from the UK.

Materials and methods

Family collection

A large collection of SLE nuclear families has been collected from the UK with predominantly one affected offspring per family. Samples from both parents were available for 65% of the cases and siblings were also collected where available. In the case of the single parent families, samples were always taken from siblings. All participants signed informed consents prior to blood and data collection and study protocols were approved by the London multicentre research ethics committee (MREC). The clinical manifestations of SLE are variable and diagnosis criteria have been established by the American College of Rheumatology (ACR).21, 22 Patients were classified as having renal lupus using the ACR criteria. Further clinical information was obtained from individuals by interview and completion of a health questionnaire.

The demographic details of the families studied are summarised in Table 2. The 623 families were randomly allocated into cohorts. Cohort 1 was used as an initial screen and cohort 2 was utilised for rare (low-frequency) Single-nucleotide polymorphisms (SNPs) and also to follow-up on SNPs showing hints of association from the initial screen. CD40L SNPs were genotyped on both cohorts because it is X-linked and there were less informative transmissions from only one parent.

Table 2 Pedigree breakdown and demographic features of the two SLE cohorts examined in this study

DNA samples were prepared from 40 ml blood by phenol–chloroform extraction.23 DNA concentration was obtained by pico-green quantification. DNA samples were stored at 4°C.

SNP selection

In silico SNP hunting was employed using published genomic and cDNA sequence data. The largest public databases of SNPs, the SNP consortium (http://snp.cshl.org/), GeneSNPs (http://www.genome.utah.edu/genesnps/), SNPper (http://bio.chip.org:8080/bio), dbSNP (http://ncbi.nlm.nih.gov/SNP) and HGBASE (http://hgbase.interactiva.de/) were explored for SNPs. These resources are constantly changing and were scrutinised on multiple occasions. The majority of these markers in the databases occur outside the coding regions of genes. Initial SNPs were chosen based on coding areas and noncoding SNPs chosen on spacing throughout the genes, validation and submitter information. SNPs that fell in low complexity or repeat regions identified by inputting the sequence area into the program REPEATMASKER (http://ftp.genome.washington.edu/RM/Repeatmasker) were dismissed at this stage. Our study focused on common SNPs (minor allele frequency (MAF) >5%).

Genotyping

The SNPs were genotyped by matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) mass spectrometry (MS), using the Sequenom MassARRAY™ (Sequenom Inc., San Diego, CA, USA) methodology. This system uses samples in chip-based, high-density arrays, whereby hundreds of samples can be run in parallel. This system was used to evaluate more than one SNP (multiplexing) per sample. This method has been previously described in detail.24 In brief, PCR primer pairs were designed to amplify the region around the SNP, along with extension primers using the Sequenom SpectroDESIGNER™ software. The PCR primers (Metabion, Hamburg, Germany) were used to amplify 5 ng of genomic DNA in a multiplex reaction. Unincorporated nucleotides were removed from the PCR products, using shrimp alkaline phosphatase treatment. The homogeneous mass extend reaction was then performed, adding one or two bases to the extension primer, to include the alternate variant plus one base beyond the SNP. Adding a suspension of cation exchange beads desalted the extension reaction, and the resulting supernatant was spotted onto a 384 SpectroCHIP™. The extension products on the chip are then analysed on the Bruker Autoflex TOF-MS and the composition of the SNP can be determined from the mass. The resulting spectra were analysed using the Sequenom SpectroTYPER RT™ software.

Statistical analysis

Hardy–Weinberg equilibrium (HWE) testing was carried out using the exact test courtesy of G Abecasis and J Wigginton (University of Michigan Center for Statistical Genetics). SNPs were chosen for further analysis if they did not deviate from HWE using the observed and expected heterozygosity results. Pedigree checking was performed using PedCheck 1.1.25 All genotyping data were run through to check for Mendelian inconsistencies in pedigree data. Markers with >5 errors were removed from the analysis.

Two measures of linkage disequilibrium (LD), squared correlation coefficient (r2), and Lewontin's standardised disequilibrium coefficient (D′) were computed between pairs of SNPs from founder chromosomes through use of Haploview 2.05 software (http://www.broad.mit.edu/personal/jcbarret/haploview/index.php). 95% confidence interval boundaries for pairs of SNPs were used to estimate recombination.26 Multilocus D′ between haplotype blocks is calculated by computing the 2 × 2 D′ score of each allele at first locus with allele at second locus and then taking a weighted average of these values.27

The transmission disequilibrium test (TDT) method evaluates whether the frequency of transmission of alleles from heterozygous parents to their affected children deviates from 50%, the expected Mendelian frequency when there is no linkage. TDT analysis on the X chromosome was performed using ASPEX v.2.2.28 ASPEX enables analysis of the X chromosome by using all affected siblings within a family and separates maternal and paternal transmissions. ASPEX calculates the probability of association occurring independently of linkage within families by permuting parental alleles while fixing the identity by descent (IBD) status of siblings. GENEHUNTER (version 3.0) was used for TDT analysis for nuclear simplex families.29 TRANSMIT (version 2.5.4) was used for TDT analysis of single-parent families.30 TRANSMIT can deal with transmission of multilocus haplotypes, even if phase is unknown, and parental genotypes may be unknown. Many of the SLE families only have one parent and this program enables us to obtain more information from our family collection. Data from unaffected siblings may be used to narrow down the range of possible parental genotypes that need to be considered. The pedigree disequilibrium test (PDT) uses data from related nuclear families and discordant sibships from extended pedigrees for TDT analysis.31 The most discordant siblings were selected based on being female and having negative antinuclear antibody (ANA) scores. One sibling per family was chosen randomly, if >1 discordant siblings were available. The PDTsum version of the statistic was used to compare allele frequencies between affected individuals and their unaffected discordant siblings within families.

Haplotype analysis was conducted using Haploview 2.05 software. Haplotypes with <1% frequency were not included in the analysis. Haplotypes were estimated using an accelerated expectation-maximisation (EM)-based algorithm that can deal with a large number of linked loci that have moderate levels of LD. The output of the EM algorithm is the maximum-likelihood estimate (MLE) and has highly accurate population frequency estimates for phased haplotypes.32

Results

Using SNPs markers, the CD40 and CD40L genes were investigated for association with SLE. In all, 14 SNPs were selected for CD40 genotyping (as described in Materials and methods). Two assays failed (rs1535044 and rs1883832) and one marker (rs1801293) was monomorphic when genotyped in cohort 1. These three SNPs were removed from analysis. Mendelian error checking found SNP rs1569723 to have six pedigree errors and this marker was excluded from analysis. Three SNPs (rs7273698, rs11086998 and rs1004731) had rare allele frequency <0.1% in cohort 1 and were further genotyped on cohort 2. There was no change in allele frequency in increased families for these rare polymorphisms and these SNPs were excluded from LD and haplotype analysis. Seven SNPs across CD40 genotyped well (greater than 75% individuals successfully genotyped) and fell within HWE (Table 1a, Figure 1a).

Table 1 CD40 SNP summary on (a) cohort 1 and (b) cohorts 1 and 2
Figure 1
figure 1

Structure of genes encoding CD40 and CD40L with relative positions of the SNPs studied. (a) CD40 gene with numbered SNPs corresponding to SNPs in Table 1a. (b) CD40L gene with numbered SNPs corresponding to SNPs in Table 1b. Orientation and transcription start is marked with an arrow. Exons are represented by boxes, and introns and intergenic regions by lines. Filled boxes denote translated regions; open boxes untranslated regions. Horizontal lines below gene depiction indicate SNP position within gene. Line breaks are equivalent to 3000 bp.

The CD40 SNPs were genotyped on cohort 1 (Table 2). Allele frequencies for the individual SNPs showed significant differences between the different ethnic groups (Table 1a). For this study, all association analyses were performed on European-Caucasian samples for power due to the small numbers of other ethnicities in our cohorts (Table 2).

CD40 LD analysis showed strong LD between SNP 1 and SNP 5 and strong LD between SNP 6 and SNP 7 (Figure 2a). Values of D′ were close to 1.0 for most of the SNP pairs. There is a 3 kb distance between haplotype block 1 and haplotype block 2. Haplotype block 1 spans the 5′-flanking region to intron 5 of CD40. Haplotype block 2 spans intron 8 to 3′-flanking region of CD40 (Figure 2a). The strength of LD within the blocks is reflected by the restricted haplotype diversity observed. Three haplotypes are observed in block 1, with one high-frequency haplotype (64%). Three haplotypes are seen in block 2 and there are two common haplotypes (38%) created mainly from the common haplotype in block 1 (Figure 2a). The haplotypes in block 1 represent 97% of all haplotypes present and the haplotypes in block 2 captured all of the diversity (100%) seen. A multilocus D′ of 0.71 between the two haplotype blocks shows intermediate LD between the blocks and minimal recombination.

Figure 2
figure 2

Results of LD between SNP pairs and haplotype blocks. (a) CD40 LD and haplotype diversity in Caucasians for cohort 1. (b) CD40L LD and haplotype diversity in Caucasians for cohorts 1 and 2. LD between SNP pairs was calculated using the LD coefficient D′. Values for D′ (upper number) and r2 (lower number) are presented in each box. Black boxes indicate strong evidence for LD (D′> 0.75 with small D′ confidence intervals (CI)), grey boxes indicate intermediate LD (D′>0.75 with large D′ CI), and white boxes indicate inconclusive LD or evidence for recombination (D′<0.75, r2<0.30, with large D′ CI). Horizontal line above the LD diagram representative of chromosome with location of SNPs indicated. Black shaded box on chromosome line represents location of gene. Shown below LD diagrams are haplotype blocks. These were created based on the 95% CI cutoff. Marker numbers are shown across the top. Haplotype tagging SNPs are highlighted with a triangular pointer. Haplotype frequencies are shown next to each haplotype. Lines show the most common crossings from one block to the next, with thicker lines showing more common crossings than thinner lines. Shown beneath the crossing lines is multilocus D′, which is a measure of the LD between two blocks.

In total, 11 SNPs were genotyped across CD40L on cohorts 1 and 2. Two assays failed (rs3092945 and rs3092936) and one marker (rs3092922) was monomorphic. These three markers were removed from the study. The remaining eight SNPs spanning CD40L showed no evidence for deviation from HWE (Table1b and Figure 1b). CD40L allele frequencies of individual SNPs were found to vary significantly between Afro-Caribbean and European-Caucasian populations for most of the SNPs studied (Table 1b). In comparison, only two SNPs allele frequencies varied significantly between Indo-Asians and European-Caucasian populations (Table 1b).

CD40L LD analysis and D′ values indicate two areas of strong LD (Figure 2b). There is evidence of disruption in LD between SNP 4 and SNP 5 (r2=0.01), and weak LD between SNP pairs across the two haplotype blocks (r2<0.30). There is a 4 kb distance between haplotype block 1 and haplotype block 2. Haplotype block 1 spans 5′-flanking region to intron 2 of CD40L and haplotype block 2 spans intron 3 to 3′-flanking region of CD40L (Figure 2b). There are four haplotypes in block 1 with one high-frequency haplotype (77%) and three haplotypes in block 2 with one high-frequency haplotype (90%). Three of the haplotypes from block 1 all cross to the common 90% haplotype in block 2 (Figure 2b). In all, 98% of the haplotypes seen above 1% frequency are captured in these two blocks for our population. Multilocus D′ between the haplotype blocks is 1.0, indicating very strong LD between the two blocks. Pairwise LD analysis of markers across CD40 and CD40L was carried out and did not show any evidence of LD between the genes (data not shown).

Single marker TDT analysis using GENEHUNTER of the seven SNPs across CD40 showed no evidence of association with SLE (Table 3). Furthermore, TRANSMIT TDT and discordant sib-analysis using PDT confirmed the lack of association for the seven SNPs in CD40 with SLE in this cohort (data not shown). Stratified TDT analysis was carried out on CD40 SNPs with patients selected for renal disease or thrombosis as affected status and there was no evidence of association (data not shown). Haplotype TDT analysis using GENEHUNTER and TRANSMIT for CD40 did not reveal any association of haplotypes (shown in Figure 2a) with SLE.

Table 3 Results of CD40 TDT in Cohort 1

Single marker ASPEX TDT that separates maternal and paternal transmissions to affected child was performed for the eight SNPs across CD40L. There was no evidence of association for CD40L SNPs with SLE (Table 4). Stratification for renal disease or thrombosis was not performed due to the loss of power from the low number of heterozygotic transmissions from mothers.

Table 4 Results of CD40L ASPEX TDT in Caucasians of cohorts 1 and 2

Discussion

CD40–CD40L are important tumour necrosis factor (TNF) superfamily members involved in B-cell interactions. Given the role that CD40–CD40L plays in humoral immune responses, they presented excellent candidate susceptibility genes for SLE. Our study aimed to characterise the haplotype structure of these genes and investigate association of markers with SLE.

The majority of SNPs genotyped in this study were located in introns and untranslated region (UTR) sequences. One coding SNP was successfully genotyped in CD40 (rs11086998) but was extremely low frequency (<0.1%) in European-Caucasians and could not be used for analysis. Only one synonymous coding SNP in CD40L (rs1126535) was taken through analysis in European-Caucasians.

We did not observe any association with SLE for CD40 or CD40L using family-based tests of association. We used clinical information on our cohorts for lupus nephritis and thrombosis and no evidence for association was found. There have been a limited number of association studies performed for CD40 and CD40L in autoimmune disease. One case–control association study found a SNP in CD40 to be positively associated with Graves’ disease in Caucasians.33 However, this was not replicated in a substantially larger study.34 More recently, association of a 3′-UTR microsatellite and SLE in CD40L was reported in 80 patients from the Canary Islands in Spain.35 No single microsatellite allele was associated with disease, but a group of longer alleles. The control frequency of this group of alleles was 0.32. This CD40L microsatellite lies between SNP 7 and SNP 8 and falls within haplotype block 2 (Figure 2b). It is unlikely that these alleles are associated with SLE in a UK population since we have defined CD40L haplotypes flanking this marker and our study is considerably larger than the one reported.

We have shown the LD and haplotype structure for CD40 and CD40L in European-Caucasians. CD40 markers span the gene from −4983 (from ATG) to +14912 (from ATG). CD40L markers span the genomic area enclosing CD40L starting at −6825 (from ATG) to +13 582 (from ATG). Variation in the CD40 and CD40L genes in European-Caucasians can be explained by the existence of three/four common haplotypes, which account for >97% frequency (Figure 2). It is clear from the pairwise marker LD and haplotypes in CD40 that there are high levels of LD across the gene. This is reflected in the three common haplotypes in the two haplotype blocks in CD40. In the case of CD40L, there is a breakdown in LD between haplotype blocks but still substantial LD between adjacent blocks. All the common haplotypes in CD40 and CD40L can be identified by genotyping of htSNPs as reported in the Results section. The extent of LD across CD40 and CD40L suggests that any putative polymorphism would be in LD with one or more of the SNPs or haplotypes studied.

Haplotype construction across CD40 has been previously described in Han-Chinese.36 This group showed strong LD across CD40 and five haplotypes. Three of the SNPs used by this group were the same as SNPs used in our study (rs1800686, rs752118 and rs3765459). Overall there are similarities in CD40 haplotype structure between Han-Chinese and European-Caucasians. However, the Han-Chinese had two low-frequency haplotypes (their nomenclature: Hap 4 and Hap 5) not present in European-Caucasians. The Han-Chinese SNPs had similar frequencies to the Indo-Asian samples for CD40. There has been one study on haplotypes across CD40L in African populations,37 and this group found similar LD for a section of the CD40L gene (rs975379 and rs715762) in Africans.

Allele frequencies of SNPs differed between the different ethnic groups as expected based on demographic histories of different populations. The differences were marked between Africans and European-Caucasians for both CD40 and CD40L. A study on positive selection with malaria resistance in CD40L showed similar differences.37, 38 Two of the SNPs (rs975379 and rs715762) we studied were also studied in Sabeti et al's work.37, 38 A more recent study that found no association of CD40L with tuberculosis confirmed the allele frequencies we observed for SNP rs1126535 in Afro-Caribbean's.39

If there were SLE susceptibility polymorphism/s within CD40 or CD40L, we would have expected to be able to detect an association within this European-Caucasian family study. We cannot exclude a rare variant occurring within CD40 or CD40L and association with SLE. However, we have captured greater than 97% of the haplotype diversity across these genes and all the common haplotypes. We exclude any common polymorphisms in CD40 and CD40L from association with SLE in European-Caucasians. Furthermore, the CD40 and CD40L htSNPs that we have identified will be useful for genetic association studies on autoimmune and vascular diseases in European-Caucasian populations.