Article Text

Extended report
Genes expressed in blood link osteoarthritis with apoptotic pathways
  1. Yolande F M Ramos1,
  2. Steffan D Bos1,2,
  3. Nico Lakenberg1,
  4. Stefan Böhringer3,
  5. Wouter J den Hollander1,
  6. Margreet Kloppenburg4,
  7. P Eline Slagboom1,2,
  8. Ingrid Meulenbelt1,2
  1. 1Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
  2. 2The Netherlands Genomics Initiative, sponsored by the NCHA, Leiden-Rotterdam, The Netherlands
  3. 3Department of Biostatistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
  4. 4Departments of Clinical Epidemiology and Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
  1. Correspondence to Dr Yolande F M Ramos, Department of Molecular Epidemiology, Leiden University Medical Center, LUMC Postzone S-05-P, PO Box 9600, Leiden 2300 RC, The Netherlands; y.f.m.ramos{at}lumc.nl

Abstract

Objective To identify novel gene expression networks in blood of osteoarthritis patients compared to controls.

Methods A comprehensive exploration of gene expression in peripheral blood was performed by microarray analysis for a subset of osteoarthritis patients from the Genetics osteoARthritis and Progression (GARP) study in comparison with sex and age-matched healthy controls. To identify pathways, we performed gene enrichment analyses (database for annotation, visualisation and integrated discovery and search tool for the retrieval of interacting genes). Quantitative PCR analysis in overlapping and in additional osteoarthritis samples was performed for prioritised genes to validate and replicate findings. Classification of cases and controls was explored by applying statistical models.

Results 741 probes representing 694 unique genes were differentially expressed between cases and controls, including 86 genes expressed with at least a 1.5-fold difference. ATF4, GPR18 and H3F3B were among the top genes identified (p<4.5 × 10−8). We found that in the blood of osteoarthritis patients the apoptosis pathway, including the well-known gene CASP3, was significantly enriched among the differentially expressed genes. Our findings were validated in independent samples and when using a small subset of the validated genes, we could accurately distinguish patients from controls (area under the curve 98%).

Conclusions In the current study, we have identified specific gene expression networks, in the easily accessible tissue blood, which associated consistently with osteoarthritis among GARP study cases. Our data further hint at the relevance of apoptosis as an aetiological factor in osteoarthritis onset, thereby qualifying expression profiling of blood as a useful tool to understand the underlying molecular mechanisms of osteoarthritis.

  • Osteoarthritis
  • Epidemiology
  • Outcomes research

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Osteoarthritis is a prevalent, complex, disabling disease affecting the articular joints. It has a considerable genetic component and the architecture is of a polygenic nature.1 Currently, adequate therapies to reverse or slow down the disease apart from pain relief are very poor,2 and joint replacement surgery at the end stage is commonly applied. Diagnosis of osteoarthritis occurs not before changes in the joint tissues are visible on radiographs. Such changes, however, reflect irreversible damage. Sensitive markers to monitor early changes or changes in active disease are not yet available and little is known about the early molecular processes marking the onset of osteoarthritis. Such insights, however, are important to prevent irreversible damage and are considered crucial for the development of effective disease-modifying treatments.

Initial genetic studies have resulted in a few consistent osteoarthritis susceptibility genes, such as GDF5,3 FRZB,4 SMAD35 and DIO2.6 Together, these genes denominated skeletogenesis as a common pathway that may be involved in osteoarthritis susceptibility thereby contributing to the hypothesis that osteoarthritis may have an early developmental origin that is expressed later in life.7 ,8 More recently, large consortia (such as TREAT∼OA and arcOGEN) started compiling genome-wide association data on osteoarthritis to perform powerful meta-analyses. These efforts have contributed to the discovery of a variety of genome-wide significant signals in osteoarthritis, some of which contained compelling osteoarthritis candidate genes,9 ,10 whereas other loci were found in gene deserts or in chromosomal regions with high linkage disequilibrium containing many genes without obvious candidates. These results point towards a typical problem in the genetic mapping of complex traits and expose the gap between genetic evidence and the underlying molecular mechanism of the disease. In addition, functional effects of genetic variation with a vast influence on many different biological processes but occurring at unknown (trans-acting) regulatory elements remain undetected.

Extensive large-scale exploration of gene expression networks may be more easily established by comparing well selected osteoarthritis cases and controls in a readily accessible tissue, such as blood. Moreover, it could be advocated that such expression profiles in blood reflect a genetic background predisposing to osteoarthritis, which could be undetectable in the affected cartilage tissue itself as a result of ongoing disease. Expression profiles in blood have successfully been used as molecular biomarkers for classification of disease subgroups in other complex diseases.11 ,12

We set out to identify gene expression networks associated with osteoarthritis by comparing single genes as well as pathway-based blood expression profiles of 106 patients of the Genetics osteoARthritis and Progression (GARP) study, consisting of sibling pairs with symptomatic and radiographic osteoarthritis in at least two joints to those of 33 controls. Subsequently, we tested whether these specific osteoarthritis signatures in blood reflected ongoing disease processes in articular cartilage or could discriminate patients from controls. Finally, we questioned whether the distinctive gene expression signature of blood could provide clues in the prioritisation of osteoarthritis susceptibility loci detected by large-scale genome-wide association meta-analysis.

Materials and methods

Study populations

The GARP study consists of 191 sibling pairs (N=382) of white, Dutch ancestry. All participants (age range 40–70 years; mean age 60 years) are clinically and radiographically diagnosed with primary, symptomatic osteoarthritis (as defined by the American College of Rheumatology (ACR)) at multiple joint sites in the hand, or in at least two joints of the following locations: hand, spine (cervical or lumbar), knee, or hip.13 In this study, a subset of 106 (from which 68 are unrelated individuals) participants of the GARP study was selected for gene expression analysis (discovery cohort). Osteoarthritis patients were compared with age and sex-matched healthy controls without symptomatic and clinical osteoarthritis according to ACR criteria. In addition, similar to participants of the GARP study, controls were subjected to radiographic examination at the indicated joints, and were free of signs of osteoarthritis (Kellgren–Lawrence grade <1) at the indicated joints except for five with a Kellgren–Lawrence grade of 2 or less at a single joint (table 1).

Table 1

General information of the participants included in the analysis

Technical validation of the microarray analysis was performed in a random selection of 28 of the 106 previous GARP participants (validation cohort) in comparison to 25 of the controls, and validated genes were replicated in an additional set of 36 unrelated individuals of the GARP study (replication cohort) in comparison to the same 25 controls supplemented with five additional controls (characteristics of the study participants are shown in table 1).

For the comparison of our blood expression profiles with gene expression in cartilage we used the ongoing Research Arthritis and Articular Cartilage (RAAK) study, of which gene expression data are available for the cartilage of 33 donors who underwent joint replacement surgery (unpublished data).

RNA isolation

Blood was collected and peripheral blood mononuclear cells (PBMC) were separated with Histopaque gradient. Cells were lysed in 1 mL of TRIzol reagent (Life Technologies, Bleiswijk, The Netherlands), and stored at −80°C until isolation. After thawing, 200 μL of chloroform was added before centrifugation (15 min at 16 000g). The aqueous upper layer was separated, 1 v/v 70% ethanol/DEPC-treated water was added, and total RNA was isolated using the RNeasy mini kit (Qiagen, Venlo, The Netherlands) following the manufacturer's recommendations. To assess the RNA quality, 36 random samples were analysed on the 2100 Bioanalyzer (Agilent Technologies, Amstelveen, The Netherlands), which had an RNA integrity number of at least 8.3 and an average 28 S/18 S ratio of 2.2 (range 1.8–2.7). The RNA concentration was measured using a Nanodrop spectrophotometer.

Microarray analysis

For first and second-strand reverse transcription 500 ng of total RNA served as a template. To perform complementary DNA synthesis, amplification, biotin labelling and hybridisation onto the microarrays the Ambion TotalPrep-96 RNA amplification kit (Life Technologies, Bleiswijk, The Netherlands) was used according to the manufacturer's protocol. After hybridisation on Illumina Human HT-12_v3_BeadChips (Illumina, Eindhoven, The Netherlands) slides were scanned with the Illumina Beadscanner 500GX.

Raw data from the beadscanner were prepared with GenomeStudio Software (Gene Expression Module v1.0 from Illumina), without normalisation or background substraction and with imputation of missing data. Quality control was performed in the R statistical programming language using the lumi-package,14 and two out of 108 GARP samples analysed on the Illumina BeadChips were excluded from the analysis based on their aberrant positions in the box plot, the density plot and the sample–relation analysis. Subsequently, high quality data of 106 cases and 33 controls were prepared for analysis by applying robust spline normalisation (rsn) and variance-stabilising transformation (vst). Differential expression between GARP participants and healthy controls was analysed using the limma package for linear regression analysis,15 correcting for sex, age, body mass index (BMI) and chip (to adjust for possible batch effects). The ‘BH’ (Benjamini and Hochberg) option was used to correct for multiple testing. Probes with detection p values higher than 0.05 in more than 50% of the samples were not included in subsequent analysis.

Pathway analysis

Gene enrichment was studied using either all significant differentially expressed genes (N=694 of which 619 were covered by the functional annotation algorithms in the database for annotation, visualisation and integrated discovery (DAVID)) or the 86 genes (of which 76 were covered) showing an expression difference of 1.5-fold or greater between osteoarthritis patients and healthy controls. Pathway analysis was performed with the DAVID tool,16 ,17 selecting for biological processes identified in the gene ontology database (GOTERM_BP_FAT in the options menu implemented in DAVID) and for the microarray background (HumanHT-12_V3_0_R2_11283641_A). Genes with a p value of 0.05 or less after correction for multiple testing according to Bonferroni were considered significant.

Analysis of protein interaction networks

To investigate protein interactions among the genes identified in our microarray, we used the search tool for the retrieval of interacting genes/proteins (STRING) 9.018 available online (data are thus analysed from source ‘previous knowledge’, which means it is derived from text-mining and therefore not all data are determined experimentally). With the option ‘enrichment’ we analysed for enrichment of protein–protein interactions, and also ‘GO biological processes’ were analysed with STRING.

Quantitative RT–PCR analysis

Quantitative reverse transcription PCR (RT–qPCR) was used to validate and replicate the results of the microarray analysis. Expression levels of 16 genes selected based on p value and fold difference, and 14 genes involved in apoptosis, were determined in the validation cohort (NGARP=28) and the replication cohort (NGARP=36) with gene-specific and pre-validated Taqman assays (Life Technologies, Bleiswijk, The Netherlands; Taqman assays are shown in supplementary table S1, available online only). Synthesis of cDNA using 1 μg RNA was performed with the First Strand cDNA synthesis kit (Roche Applied Science, Almere, The Netherlands) following the manufacturer's protocol. The cDNA was amplified using the DNA Engine Tetrad two Peltier Thermal Cycler (Bio-Rad, Veenendaal, The Netherlands) under standard PCR conditions and qPCR was performed in triplicate with the Taqman method using BiomarkTM 96.96 Dynamic Arrays (Fluidigm). Relative gene expression was calculated using the average of the absolute threshold cycle values (Ct values) in the 2−ΔΔCt method,19 and levels were normalised using the mean expression of two housekeeping genes (YKT6 and ACTB). Commercially available human total reference RNA (Clontech Laboratories, Mountain View, California, USA) served as reference RNA.

Statistical analysis of RT–qPCR data

Gene expression data were analysed for normal distribution and, if necessary, log transformed before analysis in the R statistical programming language using a generalised linear mixed model. p Values of  0.05 or less were considered statistically significant. For technical validation, Spearman correlation between expression levels of the RT–qPCR and expression levels as determined using the microarrays were calculated with SPSS.

Classification models

Classification models were constructed using multiple, penalised logistic regression.20 Models were evaluated by a receiver operating characteristic (ROC) analysis using the Z-scores of the expression for all genes and an area under the curve (AUC) criterion. Optimism in this procedure was controlled by applying leave-one-out cross-validation.

Analysis strategy

Figure 1 shows a schematic overview of the study strategy. Following microarray analysis, data were statistically analysed. Genes significantly different between patients and controls were subjected to pathway analysis with the DAVID tool. A selection of 30 genes (16 selected based on p value and fold difference and 14 selected from the apoptosis pathway) was used for technical validation of the results in the validation cohort, and 27 genes passing this control were replicated in a group of independent participants of the GARP study (the replication cohort). In addition, using a classification model expression of the 27 genes was used for osteoarthritis prediction in the replication cohort.

Figure 1

Flowchart of the gene expression analysis. FD, fold difference; GWAS, genome-wide association study; OA, osteoarthritis.

Gene expression data available from the microarray analysis were used as a reference, to search for osteoarthritis susceptibility genes known from the literature on the one hand, and to look up the significant genes in a dataset of cartilage expression profiles available in our laboratory on the other hand (unpublished data).

Results

Gene expression profiles of PBMC were generated from a subset of 106 participants of the GARP study (mean age 61 years and mean BMI 27 kg/m2) and 33 sex and age-matched healthy controls (mean age 58 years and mean BMI 24 kg/m2; see table 1A).

For an overview of the steps taken in the analysis of the data, see figure 1.

Expression profiles of osteoarthritis patients

In the microarray analysis we identified 741 probes comprising 15 non-coding RNA and 679 unique genes (N=694 total) that were expressed significantly different between osteoarthritis patients and healthy controls (see supplementary table S2), available online only). From these, 89 probes (86 genes) differed at least 1.5-fold between patients and healthy controls (table 2; for gene definitions see table 4 or supplementary table S2, available online only) of which 46 were expressed higher in the blood of osteoarthritis patients (52%). The lowest p value was found for ubiquitin conjugating enzyme E2 L 3 (UBE2L3; p=9.26×10−12), followed by activating transcription factor 4 (ATF4; p=4.14×10−8). Gene expression differences range from 2.6-fold higher as detected for G protein-coupled receptor 18 (GPR18; p=4.31×10−8) to 5.3-fold lower in osteoarthritis patients for early growth response 1 (EGR1; p=2.13×10−2).

Table 2

List of genes significantly different expressed (≥1.5-fold difference)

Expression profiles of apoptosis-related genes significantly different in osteoarthritis patients

To gain more insight into the pathways most predominantly involved in the differential expression, gene enrichment analysis was performed for all biological processes using DAVID. When analysing all significant genes including those with relatively small differences (N=694), we identified eight GO terms comprising five different biological processes that were significantly enriched: protein transport and localisation (GO:0015031, GO:0045184, GO:0008104, GO:0046907), macromolecular complex subunit organisation (GO:0034621), cell cycle (GO:0007049), RNA processing (GO:0006396) and apoptosis (GO:0006917; table 3A).

Table 3

Gene enrichment analysis using DAVID

Gene enrichment analysis for the 86 genes differing at least 1.5-fold (table 2), showed significant enrichment specifically for networks involved in apoptosis (table 3B). The majority of the genes involved in cell death (13 out of the 86 genes tested according to the DAVID analysis) showed higher expression in osteoarthritis patients. Among these genes, we identified increased expression (1.6-fold) of the well-known apoptosis-inducing gene Caspase 3 (CASP3; p=8.94×10−4; table 2).

Using STRING, significantly more protein–protein interactions were detected between the 86 genes than expected based on chance (56 interactions total; p=1.05×10−5). Also the apoptosis pathway was found in STRING (p=5.49×10−2), which was visualised by virtue of the multiple interactions between proteins involved in this pathway (figure 2). As shown, six out of the 13 proteins with three or more interaction partners are involved in apoptosis. A cluster of histone proteins (HIST1H3H, HIST1H2AC, and HIST2H2BE) involved in the above-mentioned pathway ‘macromolecular complex subunit organisation’ was also apparent.

Figure 2

Gene networks in search tool for the retrieval of interacting genes (STRING). Screenshot of the protein–protein interactions determined with STRING for the 86 genes differing at least 1.5-fold between patients and controls (disconnected proteins are hidden).

Validation and replication

To validate our results RT–qPCR was carried out. Genes were selected based on p value and fold difference (N=16), or based on a possible role in apoptosis (N=14). Technical validation was performed in the validation cohort (53 samples previously included in the microarray analyses). Here, 27 out of the 30 genes tested showed similar effect sizes and direction as the original data (see supplementary table S3, available online only). Next, we tested these 27 genes in the replication cohort (36 independent individuals of the GARP study compared to 30 controls). In the unrelated GARP patients, 22 out of the 27 genes tested (82%) were significantly different (table 4). Note that, except for the lunatic fringe gene (LFNG), the genes that were not significant in the replication (indicated with a black asterisk) still had fold differences comparable to those as measured in the microarray analysis.

Table 4

Genes replicated by RT–qPCR

Classification capability of differentially expressed genes

Figure 3A shows hierarchical clustering based on the correlation between the differentially expressed genes across the samples. Separation among osteoarthritis patients and controls was quite good (more than 80%), although some patients were clustered among the healthy controls and vice versa.

Figure 3

Clustering and classification of the study population. (A) Clustering of osteoarthritis patients of the Genetics osteoARthritis and Progression study (N=106; indicated in red in dendrogram) and age-matched healthy controls (N=33; indicated in green in dendrogram) based on euclidean distance. Each row represents a probe (N=89), and each column represents the expression of the probes for an individual (range from bright red (expression in osteoarthritis higher than control mean) to bright green (expression in osteoarthritis lower than control mean)). (B) Receiver operating curve (ROC) for the Discovery Cohort, based on sex, age, and body mass index alone (dotted line) or with the inclusion of the 27 validated genes (black line) or four genes (CASP3, SFRS5, RABEP1 and CDKN2D; dashed line) in our classification model. (C) ROC as in (B), for the Replication Cohort. AUC, area under the curve.

To investigate the possibility of using the set of validated genes (depicted in table 4) eventually as a diagnostic tool, a classification model was constructed and tested, first in our discovery cohort, and subsequently in the replication cohort. As shown in figure 3B and C, ROC curves of the baseline model containing sex, age and BMI resulted in a fair discriminative capacity with an AUC of 73% and 72%, respectively. However, when expression levels of the genes were added to the model the AUC increased to 95% and 97%, respectively, indicating an excellent classification of osteoarthritis patients compared to controls. According to our classification model expression of the serine/arginine-rich splicing factor 5 (SFRS5) gene contributed most in this model (data not shown). The possibility was examined to reduce our model to a smaller set of genes with which we could still obtain good classification between patients and controls, and we found a combination of four genes (SFRS5, together with CASP3, cyclin-dependent kinase inhibitor 2D (CDKN2D), and RAB GTPase-binding effector protein 1 (RABEP1)) with which we obtained ROC curves with excellent AUC as high as 98% in the replication cohort (figure 3C, dashed line).

Dataset searches for genes differentially expressed in blood of osteoarthritis patients

Differential gene expression profiles in the blood of osteoarthritis patients compared to controls may either reflect different setpoints of innate systemic processes taking place in osteoarthritis patients and/or it may indicate that blood could reflect pathological processes occuring specifically in joint tissues. To look into these possibilities we set out to assess the cartilage characteristics of the genes we identified as being differentially expressed in blood, and we investigated both the levels of expression and their respective differential expression in the cartilage of another study sample, namely the RAAK study. Within the RAAK study, a cartilage gene expression dataset is available in our laboratory (unpublished data). The median relative gene expression level of these cartilage samples is 7.1 (range 6.6–14.9) as measured on the microarrays. We found that 24 out of the 86 genes expressed with at least 1.5-fold difference in blood (depicted in table 2), also showed nominal significant changes when comparing cartilage from macroscopically unaffected (preserved) with the corresponding osteoarthritis affected sites. The median relative expression level of these genes was higher than the overall median level (8.6, with range 7.2–11.5; data not shown). Additional selection for genes with expression changes in the same direction in cartilage and in blood resulted in 12 genes with fold changes ranging between 1.2 up and 1.3 down in cartilage (see supplementary table S4, available online only). It is of note that among these genes we found SFRS5, the gene that contributed most to our prediction model in the blood of osteoarthritis patients. SFRS5 is highly expressed in cartilage (a relative expression of 11.5 is in the highest quartile).

Discussion

In the current paper, we have established a distinctive gene expression profile of 694 genes in the blood of osteoarthritis patients of the GARP study. Of these, 86 appeared to be expressed with at least a 1.5-fold difference. The most significant gene, which was validated and replicated in unrelated samples, was ATF4 (twofold lower; p=4.14 × 10−8). Moreover, of note is the H3F3B gene that was significantly downregulated in blood (2.17-fold; p=4.31 × 10−8) and in articular cartilage (1.15-fold; p=2.05 × 10−3). Subsequent pathway analyses revealed that genes involved in apoptosis were significantly enriched in our expression profile and included the renowned, osteoarthritis relevant, apoptosis-inducing gene CASP3 (p=8.94×10−4). In addition, protein–protein interactions appeared to be condensed around apoptosis-related genes, and CASP3 for example was found to interact with the protein products of six other genes identified in the microarray analysis. The microarray platform was validated for 27 out of 30 genes tested, providing high confidence in the data. When using these 27 genes in a classification model a highly sensitive distinction between patients and healthy controls (AUC 97%) was reached. Finally, of the genes identified in our analysis, we found that 12 were also differentially expressed in cartilage and with the same effect directions. Expression of these genes in cartilage was higher than average, and especially SFRS5, a RNA splice factor contributing significantly to our classification model, was abundantly expressed in cartilage.

It may not be surprising that we found relatively small overlap in the differentially expressed genes between osteoarthritis cartilage and mononuclear blood cells of osteoarthritis patients because gene expression in PBMC can be the result of several processes. As mentioned by Attur et al,21 blood cells passing through the different tissues of the joint may be activated by the osteoarthritis process resulting in an ‘inflammatory profile’, as found by Attur et al21 in a subset of symptomatic knee osteoarthritis patients, or, as found here, in an ‘apoptotic profile’. In most instances changes are subtle and we do not expect them to lead to a general pro-inflammatory state or to direct apoptosis of the blood cells of the patients. However, this remains to be investigated. Alternatively, differential gene expression in PBMC may reflect genetically determined dysregulation of processes.

Our top gene ATF4 encodes a potent transcription factor that specifically regulates Indian hedgehog (IHH) expression during endochondral ossification.22 As we could not detect significant differential expression of ATF4 in our and other published articular cartilage datasets,23–26 the difference in expression of ATF4 in the blood might be a reflection of distinct innate characteristic of osteoarthritis patients compared to controls. In growth plate cartilage, where IHH ensures growth plate chondrocytes to proliferate, IHH is considered ‘a master regulator in the process of chondrogenesis and osteogenesis’.27 Given the fact that ATF4 is expressed significantly lower in the blood of osteoarthritis patients, we hypothesise that, during endochondral ossification, the IHH levels may not be optimally assured resulting in suboptimal chondrocyte proliferative potential. Furthermore, these results support the hypotheses that late-onset osteoarthritis could have a developmental origin.7

By applying pathway analysis (table 3) among the differentially expressed genes in the blood of osteoarthritis patients compared to controls we showed enrichment of compelling apoptotic genes (eg, CASP3 and FAS-associated death domain (FADD)) but also of genes less well known for their involvement in the regulation of apoptosis (such as GPR1828 and SIAH129). Apart from genetic and experimental evidence, the incidence of apoptosis in osteoarthritis cartilage is also supported by the observation of large numbers of empty lacunae and hypocellularity.30–32 However, it remains to be established whether chondrocyte death is a cause or a consequence of osteoarthritis.33 As the apoptotic pathway was not found to be significantly enriched in the cartilage-specific gene expression dataset of the RAAK study (unpublished data), we hypothesise that on the occurrence of micro-traumas during life, systemic factors present in the blood may contribute to the onset of apoptosis in the joint cells, eventually leading to irreversible damage and the subsequent onset of osteoarthritis.

Among the differentially expressed genes, we also found cytokines such as IL1B (upregulated) and IL8 (downregulated) that were identified previously in a study with peripheral blood cells of knee osteoarthritis patients.21 Both proteins showed a considerable fold change in our analysis (1.8 and 3, respectively), and had a considerable number of interacting partners in the network analysis (six and seven, respectively) suggesting that expression changes are likely to have a substantial impact on cellular processes including apoptosis. Also MAFB and EGR1, identified previously as biomarkers in blood expression profiles of patients with mild knee osteoarthritis,34 were differentially expressed in our dataset. Taken together, the overlap between our analysis and other studies confirms that osteoarthritis-specific gene expression profiles of PBMC exist. Besides the overlap we also find differences most likely due to the fact that patients with generalised osteoarthritis in at least two joints were included, comprising knee osteoarthritis, but also osteoarthritis at other joints (hip, hand, and/or spine). In addition, it was found previously that different subgroups of osteoarthritis patients exist,21 which differ in their gene expression profiles.

We showed that four of the genes identified in our analysis (CASP3, CDKN2D, RABEP1 and SFRS5) were sufficient to distinguish patients from controls accurately in the discovery cohort, but also in the replication cohort. The classification models computed in our analysis combined clinical risk factors with expression levels of genes. In this way, cross-validated AUC reached values as high as 98%. The AUC has the interpretation of a concordance index, meaning that the AUC represents the chance that for any case–control pair a higher disease probability is correctly predicted for the case. Therefore, when confirmed in independent cases 98% is a clinically relevant value. Moreover, expression levels represent risk independent of clinical variables (98% vs 72%). It should be noted, however, that our study has a cross-sectional design. As a result, the ROC curves merely indicate classification (diagnosis) but not prediction, and the result remains to be tested in independent study cohorts. Predictive value needs to be assessed and validated in a longitudinal study design. In addition, it remains to be investigated whether CASP3, CDKN2D, RABEP1 and SRFS5 would be useful as early markers to diagnose osteoarthritis in a population-based setting.

Although the results presented in this paper appear very convincing and are in agreement with previous studies, it should be mentioned that a drawback of our analyses was the fact that the majority of the controls (25 of the 33 used) across our validation and classification model were overlapping with the controls used in our microarray analysis. As a result, part of our data could be based merely on the severe selection of controls (clinically and radiographically confirmed healthy joints at age ∼60 years) marking rather a healthy blood expression profile instead of the presence of osteoarthritis. The latter could be explored further by replication using population-based controls. Also, our discovery cohort included 38 siblings, which will have contributed to the robustness of the results. However, differential expression of 22 out of 27 genes has been confirmed by RT–qPCR in the replication cohort consisting of independent cases (table 4), and results of the analysis including only the probands of the discovery cohort are generally comparable (data not shown). As shown in table 1, most participants had pain due to osteoarthritis (92%) and more than half of our cases were using non-steroidal anti-inflammatory drugs (52%). Adjusting for these phenotypes did not significantly affect our results (data not shown). It is of note that in this study mainly women were included. Therefore, although we have adjusted for gender we cannot exclude the possibility that the results are more applicable to women. It would be interesting for future research to investigate whether sex differences exist for gene expression profiles in the PBMC of osteoarthritis patients. Finally, despite various efforts, we were not able to validate by RT–qPCR the observed highly significant differential expression of UBE2L3 as detected by the microarray expression (table 2).

Taken together, this study in patients with generalised osteoarthritis in a familial background provides a valuable tool for a better understanding of the pathology of this complex disease and for the classification of osteoarthritis. In addition, the gene networks may facilitate the selection of targets for drug development and biomarkers.

Acknowledgments

The authors would like to thank all members of the laboratory for suggestions and helpful discussions. Dr JM Boer and Dr JJ Goeman are acknowledged for initial help with the analysis of the microarrays.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Handling editor Tore K Kvien

  • Contributors All authors have made substantial contributions to the completion of this study. Study conception and design: YR, SDB, PS and IM. Acquisition of data: all authors. Analysis and interpretation of data: YR, PS and IM. Classification model: SB. Preparation of the manuscript: YR and IM. Critical reviewing of the manuscript: all authors.

  • Funding This study was supported by the Leiden University Medical Center and the Dutch Arthritis Association. Pfizer Inc., Groton, CT, USA supported the inclusion of the GARP study. All the other work was supported by the Netherlands Organisation of Scientific Research (MW 904-61-095, 911-03-016, 917 66344 and 911-03-012) and the Center of Medical System Biology and The Netherlands Consortium for Healthy Aging, both in the framework of the Netherlands Genomics Initiative.

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval The study received ethics approval from the Commissie Medische Ethiek (protocol no. P 76/98).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The gene expression profiles of the PBMC have been de posited in NCBI's Gene Expression Omnibus35 and are accessible through GEO Series accession number GSE48556 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE48556).