Over the last years microarray technologies have generated new perspectives for the high-throughput analysis of biological systems. Nowadays, it is possible to monitor thousands of genes in a single experiment. This molecular profiling technology combined with standardised and validated clinical measurements can allow a more precise characterisation of a patient’s phenotype, and may lead to the design of therapeutic protocols and procedures better tailored to an individual patient’s needs. In this report we provide an overview of expression profiling studies in rheumatoid arthritis (RA). RA is a chronic inflammatory disease in which both genetic and environmental factors are involved. The precise molecular mechanisms underlying RA are not fully understood. A systematic literature search revealed nine array-based expression profiling studies in patients with RA. Findings from these studies were compared with those of linkage and genome-wide association (GWA) studies. Although we observed many differences in study design, analysis and interpretation of results between the different studies, we extracted two sets of genes: (1) those differentially expressed in more than one study, and (2) genes differentially expressed in at least one of the reviewed studies and present in RA linkage or GWA loci. We suggest that both sets of genes include interesting candidate genes for further study in RA.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Large-scale analysis of gene expression patterns using (micro-) arrays is spreading over many fields, including that of rheumatology. Nowadays, array-based approaches allow the analysis of thousands of genes in a single experiment.1 There has been special interest in these technologies to elucidate the genetics of heterogeneous autoimmune diseases, such as rheumatoid arthritis (RA).
Array-based expression analysis is based on the hybridisation of an ordered set of probes attached to a surface with a target consisting of cell/tissue-isolated mRNAs. In general, these are mRNAs isolated under different biological situations, eg, health and disease, or before and after treatment. The hybridisation pattern reflects the relative abundance of each mRNA and leads to the identification of genes up- or downregulated in the test condition compared with the reference.1 By grouping sets of differentially expressed genes according to function, information can be obtained about key pathways related to disease or treatment.2
Expression profiling studies in RA can be classified roughly into two categories: (1) those focused on finding (new) candidate genes for disease aetiology and understanding its pathogenesis, and (2) those focused on identifying expression patterns typical for a state of RA (eg, mild versus severe disease or drug-responsive versus non-responsive patients). In the first category, expression analysis is often the first step in elucidating gene function, and in the second it is aimed at reducing phenotypic heterogeneity or at identifying expression profiles that can serve as diagnostic tools predicting, eg, disease outcome or response to disease-modifying anti-rheumatic drugs (eg, tumour necrosis factor (TNF) blocking agents).
The current study compared the findings of published reports dealing with expression profiling in RA with the aim of identifying genes frequently up- or downregulated. Furthermore, localisation of genes was compared with known genetic linkage and association regions for RA. Two sets of genes were derived: (1) those differentially expressed in more than one of the reviewed studies, and (2) genes differentially expressed in at least one of the reviewed studies and located within a genetic linkage or association region for RA. As the large heterogeneity in study design precluded a meta-analysis, we extracted these gene sets by simple comparison of the studies and report a power estimate of the reviewed genes. We argue that the selected genes are excellent candidates for further research.
Studies were selected by systematically searching MEDLINE (OVID 1966 to January 2007) using the following search terms: “rheumatoid arthritis”, “expression profiling”, “expression pattern”, “treatment responsiveness”, “anti-TNF responsiveness” and “autoimmune disease”. Studies had to deal with expression profiling in RA or other autoimmune diseases, if these also included RA samples. Animal studies and expression profiling performed using techniques other than microarray technology were excluded. The criteria were checked for every article independently by two reviewers (ET and MC). In case of disagreement, articles were re-examined and discussed until a consensus was achieved. From each selected article, detailed information was extracted on study design, number and characteristics of participants, interventions, outcome measurements and time of follow-up.
Forty-six studies were identified by the literature search, eight met the selection criteria and were included in this study. Two studies3 4 dealt with autoimmune diseases in general. Six studies discussed expression profiles of patients with RA, specifically.5–10 Of these, four5–8 focused on expression patterns typical of a specific disease state, and two studies9 10 examined expression patterns predictive of anti-TNF respondership. Table 1 summarises information about patient characteristics, tissues analysed and experimental parameters of the studies.
In table 2 we compiled genes found to be regulated in more than one study, irrespective of the above defined categories, but including at least one study comparing expression profiles from RA samples with those of healthy controls.
Studies focused on finding (new) candidate genes for disease and understanding its pathogenesis
Batliwalla and coworkers7 performed gene expression profiling using peripheral blood mononuclear cells (PBMCs) from 29 patients with RA and 21 healthy controls (table 1). Of the 4500 investigated genes, they identified 81 genes with significantly different expression levels between RA and control; 29 genes were downregulated and 52 genes were upregulated in the RA group. Glutaminyl cyclase (glutaminyl-peptide cyclotransferase; QPCT), interleukin 1 receptor antagonist (IL1RA), S100 calcium binding protein A12 (S100A12) and GRB2-associated binding protein 2 (GAB2) were among the top overexpressed genes; whereas CD72 and CD79b were most significantly downregulated. Strikingly, many of the overexpressed genes were monocyte specific. This may have been related to the fact that the patients with RA had an active disease and were included before initiation of treatment.7
Szodoray et al8 used a genome-scale microarray representing 21 329 genes to identify differentially regulated genes in peripheral blood B cells from eight patients with early RA compared with eight healthy controls (table 1). Three hundred and five genes were overexpressed in RA-derived B cells and 231 genes were repressed when compared with controls. Clusters of functionally associated networks of the differentially expressed genes were constructed into which 51 of the 536 genes fitted. Five functional classes were defined: cell activation, proliferation and apoptosis (31 genes); autoimmunity (five genes); cytokines, cytokine receptors and cytokine-mediated processes (eight genes); neuroimmune regulation (two genes); and angiogenesis (five genes). Quantitative polymerase chain reaction confirmed the upregulation of 13 randomly chosen genes in B cells from patients with RA (supplementary table 1).8
Studies on gene expression patterns typical of a specific state of disease
Studies in this category were aimed at identifying profiles predictive of a disease state, predominantly, but often also included a case–control comparison.
Van der Pouw Kraan et al5 used a custom microarray containing approximately 24 000 cDNA probes to subclassify patients with RA (n = 15) (table 1). Hierarchical clustering of gene expression in synovial tissue identified two main patients groups. These groups were indicated as RA-I (n = 10) and RA-II (n = 5). Gene expression in the RA-I group was indicative of an adaptive immune response, whereas genes identified in the RA-II group were involved in fibroblast dedifferentiation.5 The RA-I group overexpressed 121 genes compared with RA-II; 39 genes were overexpressed in the RA-II group compared with RA-I (supplementary table 2 lists the top five discriminators in each group).
Olsen et al6 aimed at identifying a gene expression pattern specific for early RA, which could be used as a diagnostic marker and lead to early treatment and better prognosis. PBMCs from patients with RA were analysed using microarrays containing 4329 cDNAs (table 1). Eleven patients who had RA for less then 2 years were compared with eight patients with long-standing RA (⩾10 years). The cluster analysis indeed allowed them to discern patients with early RA from those with long-standing disease.6 No clear ranking was provided for genes most significantly over- or underexpressed, but genes that were upregulated more than threefold in early RA included troponin I (TNNI2), troponin T2 (TNNT2), cytochrome P450, family 3, subfamily A, polypeptide 4 (CYP3A4), cleavage stimulation factor (CSTF2) and CSF 3 receptor (CSF3R). Genes downregulated in early RA included S100A10 and ribosomal protein SA (RPSA; LAMR1).
In another study, the same group hypothesised that patients with autoimmune disorders (including RA) exhibit highly reproducible PBMC gene expression profiles resulting from chronic inflammation, other disease manifestations, or from family resemblance.3 To test the latter hypothesis, PBMC gene expression profiles of individuals with RA (n = 4) and systemic lupus erythematosus (n = 4) and their unaffected first-degree relatives were compared (see table 1). Genes were classified into three major categories: overexpressed genes (n = 94), underexpressed genes (n = 111) and non-autoimmunity genes (n = 3924). Expression profiles in unaffected first-degree relatives resembled those of individuals with autoimmune diseases. Interestingly, this was also true for many of the autoimmune genes. Supplementary table 3 lists the top five under- and overexpressed autoimmunity genes displaying the highest correlation coefficients in relative pairs. Though the main goal of this study was to identify gene expression signatures across autoimmune disorders and not the identification of RA candidate genes, we added all genes that replicate results from other studies to table 2.
Another study of this group was aimed at identifying the proportion of a gene expression profile that was independent of familial resemblance and determining whether this was a product of disease duration, disease onset or other disease-related factors.4 The study included patients with long-standing systemic lupus erythematosus (n = 19), early RA (n = 9 and 17, respectively), insulin-dependent diabetes mellitus (IDDM; n = 5), multiple sclerosis (MS; n = 4), healthy controls (n = 8) and first-degree unaffected family members of individuals with systemic lupus erythematosus and RA (n = 8) (table 1). One hundred genes with shared expression levels between individuals with autoimmune diseases but not unaffected family members or controls were identified.4
A recent pharmacogenetic study by Lequerré et al9 set out to identify genes predictive of responsiveness to infliximab (Remicade), a TNF blocking agent, in PBMCs. Thirty-three patients with highly active disease refractory to methotrexate treatment were included (see table 1); 16 patients were classified as responders to infliximab, 17 as non-responders. Unsupervised hierarchical clustering of 41 mRNAs differentially expressed in PBMCs prior to treatment perfectly discriminated responders from non-responders. These transcripts included CYP3A4, LAMR1 and KNG1, genes that were also differentially expressed in several other studies and related to disease severity (table 2). Twenty of the 41 transcripts were assessed by quantitative polymerase chain reaction in a second set of 10 responders and 10 non-responders to validate their predictive value. This set of transcripts provided 90% sensitivity and 70% specificity for the classification of responders and non-responders (supplementary table 4).9
Also Lindberg and coworkers10 examined gene expression profiles in patients with RA treat with infliximab (n = 10). They included three responders to treatment, five moderate responders and two non-responders (table 1). Two hundred and seventy-nine significant differences in gene expression of synovial (inflamed) tissue were observed between the three good responders and the two non-responders.10 Several of the differentially expressed genes were also observed in patient–control comparisons (table 2).
Power estimation of expression profiling studies
Several parameters are needed for power analysis of expression profiling studies, such as sample size and standard deviation (SD). SD was not mentioned in most of the studies. Therefore, this parameter was estimated using a freely accessible RA expression data set (Gene Expression Omnibus, GSE 1911; http://www.ncbi.nlm.nih.gov/geo/). Using this data set, we calculated the SD for the RA group (0.936) and for the control group (0.670) and used these to determine the minimal fold change in gene expression a study was able to detect with a statistical power of 90%. All studies were included (table 1). In the study of Batliwalla et al7 a 1.4-fold change in gene expression was considered as threshold for differential expression of genes. We calculated that 264 RA samples and 264 controls would be needed for 90% power to detect a 1.4-fold change; 29 were analysed (table 1). Szodoray and coworkers8 used a threefold change as cut-off for differential expression. We calculated that 12 samples in both groups would be needed; eight were analysed. Van der Pouw Kraan and coworkers5 used a twofold change cut-off and, therefore, 44 samples in each group would have been needed to reach 90% statistical power; 15 were analysed. The study of Olsen et al6 considered a difference of 3 SD as the threshold for differential expression. The total group of patients analysed in this study (n = 19) was sufficient to detect this difference.
For the studies of Maas et al3 and Liu et al4 no information about fold change, effect size or False Discovery Rate was provided. Assuming a False Discovery Rate of 5% for both studies and taking into account the number of samples in the studies, we estimated that Maas and co-workers would have been able to reliably detect a fold change of 4.2 and higher, whereas Liu et al could detect a fold change of ⩾2.0. Also Lequerré et al9 and Lindberg et al10 did not report the fold change cut-off used. Using their published effect sizes (1.9 for Lequerré et al, 2.9 for Lindberg et al) we calculated the minimal fold change. Based on the number of samples analysed, indeed, both studies appeared sufficiently powered to detect these effects.
Linkage and genome-wide association studies
Additional input for the identification of candidate genes for RA disease susceptibility can be obtained by combining results of different experimental approaches, such as expression studies and genetic linkage or association studies.15 Genes that are found differentially expressed in patient–control comparisons and that are located under linkage peaks for RA can be viewed as strong candidates for RA. Several whole genome scans for RA have been performed.16–21 Recently, also two genome-wide association studies (GWAS) have been published.22 23 In this study we compared data from linkage meta-analyses24 25 and the GWAS22 23 with the results of the expression profiling studies described above. Several differentially expressed genes were located in RA linkage regions (table 3). As for overlap with the GWAS, tumour necrosis factor, α-induced protein 2 (TNFAIP2), which was differentially expressed in the study of Maas and coworkers,3 was also associated with RA in one of the GWAS.22
DISCUSSION AND CONCLUSIONS
The advent of expression profiling in RA research has served several important purposes: (1) long assumed concepts of RA got additional support on the molecular level (eg, the study of van der Pouw Kraan and co-workers confirmed the heterogeneous nature of RA and gave insight into the distinct pathogenic mechanisms contributing to the disease5), and (2) clinical biomarkers to aid disease diagnosis, prognosis and treatment outcome can be extracted from the genes differentially expressed between patients with RA and controls.26 27 The prognostic potential of gene expression profiles has been elegantly shown in cancer studies28–32 and is already used in the clinic. Studies dealing with various types of cancer show how gene expression profiling can help to predict treatment outcome in individual patients.33–35 For RA, two recent papers suggest feasibility of predicting treatment response by pharmacogenomics, potentially leading to more individualised treatment strategies.9 10 Especially Lequerré and co-workers9—by measuring transcript levels at baseline in a well accessible tissue (blood) suitable for implementation into clinical practice—showed that a small subset of discriminative transcripts can provide a tool to predict infliximab efficacy in RA. The genes identified in the treatment response studies were also found differentially expressed in studies related to disease severity, strengthening the view that the same genes and genetic mechanisms may underlie disease severity and response to anti-TNF treatment.36
In this study we performed a systematic literature search to identify array-based expression profiling studies in RA. Our hypothesis was that genes identified in several of the studies are better candidates for further research into RA susceptibility and progression than those identified only once. The combination of data from several sources may also help in target identification;15 therefore, we also included information from RA linkage meta-analysis and the two first GWAS. We describe the similarities and differences of the reviewed studies based on the data presented in the studies, and extract several genes that seem strong candidates for further study (tables 2 and 3). The overlap between the GWAS and the expression profiling studies was somewhat disappointing, but might be related to the low power of the studies performed so far, which will certainly improve. A better way to select interesting genes from all studies would be a meta-analysis. However, due to the large heterogeneity between the studies this is impossible. The gene expression data are obtained from studies with different designs (case–control, RA only) using very diverse platforms and probe sets and different sources of patient material or partly overlapping patient samples (table 1).
The differences in study design also hamper comparisons of the statistical power between studies. However, using a simulation strategy we were able to estimate the statistical power of individual studies. Many of them appeared underpowered. Nevertheless, a number of good candidates for further study were defined (table 3). Based on the power analysis the strongest candidate genes would be KNG1 and CSF3R. Both genes were found differentially expressed in three studies with sufficient power (KNG1,3 4 9 CSF3R3 4 6) and are located in a known RA linkage region.25
Based on the above it seems very important to reduce heterogeneity between studies and increase power in future studies. One way to achieve a reduction of heterogeneity is to make use of standardised protocols for the description of microarray experiments. MIAME is such a protocol and describes the Minimum Information About a Microarray Experiment that is needed to enable unambiguous interpretation of the study results and reproduction of the experiment (http://www.mged.org/Workgroups/MIAME/miame.html). Of the described studies, only Lequerré et al9 stated that their clinical and experimental data complies with the MIAME recommendations. Besides the use of standardised protocols, when studying complex diseases such as RA, the likelihood of identifying important genetic disease determinants can be increased if patients are well characterised and phenotypes are very narrowly defined.37
In our study we combined expression profiling, linkage and genome-wide association data to predict candidate genes for further research. One might also consider taking additional evidence into account, such as proteomics studies that identify proteins differentially expressed between patients and controls or between responders and non-responders to treatment.38 39
To permit the integration of data from different sources, such as expression profiling (in humans and model organisms), linkage, association and proteomics studies, software tools are currently being developed40 and some are already in use, such as the programs “Prioritizer” and “Endeavour”.41 42 These perform searches of existing literature as well as of databases containing, eg, gene expression profiles to identify (new) candidate genes for various diseases.
In conclusion, expression profiling opens up a new era in diagnosis, prognosis and treatment of RA and helps to elucidate many of the pathophysiological processes involved in this disease. Combining information from different studies and different sources can aid to find the right genes to study in the maze of different reports. We have identified genes reproducibly regulated in expression profiling studies of RA and/or present in RA linkage and association regions for RA and suggest that these are excellent candidates for further study.
We thank Sita Vermeulen, Alejandro Arias Vasquez and Christian Gilissen for their help with the statistical analysis. This work was supported by a personal grant to M. Coenen from the Netherlands Organisation for Scientific Research (grant 916.76.020).
Competing interests: None.