Article Text


Extended report
A prediction rule for the development of arthritis in seropositive arthralgia patients
  1. Lotte Arwen van de Stadt1,
  2. Birgit I Witte2,
  3. Wouter H Bos1,
  4. Dirkjan van Schaardenburg1,3
  1. 1Department of Rheumatology, Jan van Breemen Research Institute|Reade, Amsterdam, The Netherlands
  2. 2Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands
  3. 3VU Universitiy Medical center, Department of Rheumatology, Amsterdam, The Netherlands
  1. Correspondence to Lotte Arwen van de Stadt, Jan van Breemen Research Institute|Reade, PO Box 58271, 1040 HG Amsterdam 1056AB, The Netherlands; l.vd.stadt{at}


Objective To predict the development of arthritis in anticyclic citrullinated peptide antibodies and/or IgM rheumatoid factor positive (seropositive) arthralgia patients.

Methods A prediction rule was developed using a prospective cohort of 374 seropositive arthralgia patients, followed for the development of arthritis. The model was created with backward stepwise Cox regression with 18 variables.

Results 131 patients (35%) developed arthritis after a median of 12 months. The prediction model consisted of nine variables: Rheumatoid Arthritis in a first degree family member, alcohol non-use, duration of symptoms <12 months, presence of intermittent symptoms, arthralgia in upper and lower extremities, visual analogue scale pain ≥50, presence of morning stiffness ≥1 h, history of swollen joints as reported by the patient and antibody status. A simplified prediction rule was made ranging from 0 to 13 points. The area under the curve value (95% CI) of this prediction rule was 0.82 (0.75–0.89) after 5 years. Harrell's C (95% CI) was 0.78 (0.73–0.84). Patients could be categorised in three risk groups: low (0–4 points), intermediate (5–6 points) and high risk (7–13 points). With the low risk group as a reference, the intermediate risk group had a hazard ratio (HR; 95% CI) of 4.52 (2.42–8.77) and the high risk group had a HR of 14.86 (8.40–28.32).

Conclusions In patients presenting with seropositive arthralgia, the risk of developing arthritis can be predicted. The prediction rule that was made in this patient group can help (1) to inform patients and (2) to select high-risk patients for intervention studies before clinical arthritis occurs.

  • Rheumatoid Arthritis
  • Autoantibodies
  • Early Rheumatoid Arthritis
  • Epidemiology

Statistics from


Rheumatoid Arthritis (RA) is an immune disease that can have detrimental consequences if left untreated. Apart from joint destruction with subsequent disability, RA patients also suffer from various comorbidities, resulting in increased death rate.1 ,2 Joint destruction may already occur early in the disease course.3 Early and aggressive treatment can be highly effective in controlling inflammatory activity and development of erosions.4–6 Some studies even suggest that treatment within this early ‘window of opportunity’ might alter the natural history of RA.7 ,8 Early recognition of RA is thus highly important and several models have been created that predict the development of (erosive) RA in patients with undifferentiated arthritis.9 ,10 Recently, the classification criteria for RA itself have been renewed to enable earlier classification.11 However, these models and the new criteria concern patients that already have arthritis. To date no model exists to predict the development of RA before arthritis is clinically apparent.

Patients presenting with arthralgia and a positive test for anticyclic citrullinated peptide antibodies (aCCP) and/or IgM rheumatoid factor (IgM-RF) (seropositive) are at risk for developing RA.12 However, since not all of these patients develop arthritis12 the question remains which of these patients will do so and within which time frame. We prospectively followed a cohort of seropositive arthralgia patients and made an easily applicable prediction rule consisting of nine clinical variables for the development of arthritis in these patients.

Patients and methods

Study population

Between August 2004 and June 2011, patients with a positive aCCP and/or IgM-RF status and (a history of) arthralgia, but not arthritis were recruited at rheumatology outpatient clinics in the Amsterdam area of the Netherlands.12 Absence of arthritis was confirmed by physical examination of 44 joints by a trained medical doctor (WB or LAS) and a senior rheumatologist (DS).13 Patients with arthritis as revealed by chart review or baseline physical examination, a negative aCCP and IgM-RF status on second analysis, age >70 years, previous treatment with a disease modifying antirheumatic drug or recent glucocorticoid treatment (<3 months) were excluded. Patients without follow-up due to loss or recent inclusion (follow-up less than 6 months) were excluded from the present analysis. Figure 1 is a flow chart of inclusion. In total, 374 seropositive arthralgia patients were analysed. Of these patients, 83 were also included in a randomised placebo-controlled trial studying the effects of two intramuscular dexamethasone injections on arthritis development. Since dexamethasone did not delay or prevent arthritis these patients were considered suitable for the present analysis.14

Figure 1

Flow chart of inclusion.

At baseline, medical history, details of joint complaints and the number of tender joints at physical examination were recorded.15 Patients were seen semiannually the first year and annually thereafter. Extra visits were planned if arthritis developed. Development of arthritis in any of 44 joints was independently confirmed by two investigators (WB or LAS and DS).13 Median follow-up was 32 months (IQR: 13–48 months).

This study was approved by the local ethics committee and all participants gave informed consent.

Laboratory investigations

aCCP and IgM-RF levels were determined at baseline by second-generation aCCP ELISA (Axis Shield, Dundee, UK) and inhouse ELISA, respectively, as described previously.16 The cut-off level for aCCP positivity was set at five arbitrary units/ml (AU/ml), according to the manufacturer's instructions. The cut-off level for IgM-RF positivity was set at 30 IU/ml determined on the basis of the analysis of receiver operating characteristic (ROC) curves.16 C reactive protein (hsCRP) was measured using a highly sensitive latex-enhanced assay on a Hitachi 911 analyser (Roche Diagnostics) and the cut-off level for CRP positivity was set at 10 mg/l, according to the manufacturer's instructions. Human leukocyte antigen (HLA) genotyping: HLA-DQ typing was performed as described previously.17 HLA-DRB1 shared epitope (SE) carrier status (one or two copies of the HLA-DRB1*0101, *0102, *0401, *0404, *0405, *0408, *0410 or *1001 alleles) was inferred from HLA-DQA1, HLA-DQB1 haplotypes using strong linkage disequilibrium with HLA-DRB1 alleles in Caucasians.18

Statistical analysis

Data evaluation and statistical analysis were performed with SPSS V.20.0 software and the R software environment for statistical computing (R-Development Core Team). Prediction variables were selected based on clinical applicability, biological plausibility, previous research and expert opinion. If variables were strongly correlated, one was selected for use in the prediction model, which resulted in 18 prediction variables (table 1). No variable had missing data exceeding 5%. Missing data were imputed with multiple imputation, creating five imputation sets. Baseline variables were assessed for their ability to predict remission by univariate and multivariate Cox proportional hazard analysis performed on the five imputation sets. Results of the pooled analysis are reported. Cases were treated as censored after the date of their last visit. Continuous variables with a non-linear association with the outcome variable were categorised using clinically applicable cut offs or percentiles. Categories were pooled if corresponding regression coefficients were similar. IgM-RF and aCCP status and titres were combined into one variable with four categories based on previous research (table 1).12 In the multivariate analysis all variables were entered in the model and subsequently selected with a backward stepwise procedure (p removal 0.1). Nagelkerke's R2 was used to calculate the proportion of explained variation by this model.

Table 1

Univariate analysis for patients that did develop arthritis and patients that did not

A simplified prediction rule was obtained by rounding the regression coefficients to half points, which were then multiplied by 2 for easier clinical applicability. The diagnostic performance of the prediction model and rule were evaluated using the area under the curve (AUC) of ROC curves and Harrell's C in the original dataset. ROC analysis was performed for arthritis at 1, 3 and 5 years. Patients with less than 1, 3 or 5 years of follow-up, respectively, were treated as missing in the ROC analyses. Harrell's C is a performance measure that also takes into account time until development of arthritis, estimating the probability of concordance between the predicted and observed responses.19 Values around 0.5 indicate no predictive discrimination, values close to 1, high predictive discrimination.

For internal validation, cross validation was performed, in which 300 randomly drawn cases from each imputation set were used as the test sets to create the prediction model with Cox regression, using the pooled analysis over the five datasets. The remaining 74 cases were subsequently used as the validation set to calculate the AUC of the ROC curves and Harrell's C as described above. This procedure was repeated 1000 times. The average scores from these 1000 analysis for the resulting regression coefficients, AUC and Harrell's C are presented.


Arthritis development

In total 374 patients, 76% female, mean±SD age 49±11, were included in the present study. Patients had arthralgia for a median (IQR) of 12 (8–46) months with a median (IQR) number of reported painful joints of 4 (1–8). The median (IQR) tender joint count of 53 joints at physical examination was 0 (0–3). One hundred and twenty patients (32%) were IgM-RF positive and aCCP negative, 143 patients (38%) were IgM-RF negative and aCCP positive and 111 patients (30%) were IgM-RF and aCCP positive. One hundred and ninety-four patients (52%) were SE positive.

One hundred and thirty-one patients (35%) developed arthritis after a median (IQR) follow-up of 12 (6–23) months. One hundred and twenty-one (92%) of these patients could be classified as having RA according to the 2010 American College of Rheumatology (ACR)/The European League Against Rheumatism (EULAR) criteria, while 56 (45%) could be classified as having RA according to the 1987 criteria. Patients developing arthritis had a median (IQR) tender joint count of 5 (2–9) and a median (IQR) swollen joints count of 3 (2–5) at the time of diagnosis.

Univariate analysis

To study which baseline characteristics could predict arthritis development, univariate Cox regression was performed first. Patients that developed arthritis versus those that did not develop arthritis were more often non-alcohol drinkers, used non-steroid anti-inflammatory drugs more often, more often had symptoms less than 12 months, intermittent symptoms, symptoms in upper and lower extremities, a visual analogue scale (VAS) pain ≥50, morning stiffness ≥1 h and reported swollen joints, had a higher mean tender joint count, and were more often aCCP and SE positive (table 1).

Multivariate analysis

A prediction model for the development of arthritis was created with multivariate Cox regression using a backward stepwise approach (p removal 0.1). Variables included in the model and their regression coefficients are tabulated in table 2. The variables age, sex, smoking, non-steroid anti-inflammatory drug use, symmetric symptoms, symptoms in small joints, tender joint count, CRP and SE were excluded. When a lower p removal (0.05) was used, the variables first degree relative, alcohol non-use and location in upper and lower extremities were also excluded (data not shown). Performance measures of this model were slightly lower than those of the full model (table 3) and Nagelkerke's R2 was 0.29.

Table 2

Multivariate analysis of predictors for arthritis development

Table 3

AUC values of ROC curves and Harrell's C

For internal validation, cross validation was used. Regression coefficients did not change markedly after internal validation (table 2; B CV). As an alternative validation method, bootstrapping was also performed, using 1000 bootstrap samples to fit the prediction model. Regression coefficients did not change markedly either in these analyses (data not shown). The fraction of explained variation for the model (Nagelkerke's R2) was 0.31.

To see whether the model changed if only aCCP positive patients were analysed, the model was also fitted to this patient group alone. This resulted in the same variables in the model with only slightly changed regression coefficients (data not shown).

Prediction of arthritis

For each patient a risk score was calculated by multiplying each variable with the regression coefficient of the prediction model. To create a more easily applicable prediction rule, regression coefficients were rounded to half points and multiplied by 2 (table 2 and figure 3). Multiplying each variable by the corresponding risk points resulted in a prediction rule score for each patient. Out of a potential score of 0 to 13 points, the scores in this cohort ranged from 0 to 12 points. The ROC curves for the prediction model and the rule for development of arthritis within 1 year (panel A), 3 years (panel B) and 5 years (panel C) are depicted in figure 2. The curves overlapped each other in every time interval, indicating that simplifying the model into the rule did not induce loss of discriminative ability. The AUC values and Harrell's C are tabulated in table 3.

Figure 2

Receiver operating characteristic (ROC) and survival curves. (A), (B), (C): The diagnostic performance of the prediction model (black line) and rule (grey line) were evaluated using the area under the curve of ROC curves. ROC analysis was performed for arthritis at 1 year (panel A), at 3 years (panel B) and at 5 years (panel C). Patients with less than 1, 3 or 5 years of follow-up, respectively, were treated as missing in the ROC analyses. (D), (E): The prediction rule resulted in a score for each patient. The arthritis-free survival curves per risk score are shown in panel D. Patients could be categorised into three risk groups (panel E): 0–4 points: low (dotted line), 5–6 points: intermediate (dashed line) or 7–12 points: high risk (continuous line). Numbers at inclusion for each category are given on the right of figure 2D,E. Numbers in follow-up, numbers at risk and the percentage of arthritis development (95% CI) per time point are tabulated below figure 2E. Access the article online to view this figure in colour.

Using Cox regression with the calculated risk score of the prediction rule as independent variable, the HR (95% CI) of the prediction rule per risk point was 1.58 (1.46 to 1.72). In figure 2D, the arthritis-free survival curves are shown for all patients receiving a score from 1 to 11—the one patient receiving 12 points was censored after 6 months and is not shown. As is clear from the survival curves, patients could be categorised into three risk groups: patients receiving 0–4 points having a low risk, patients receiving 5–6 points having intermediate risk and patients receiving 7–12 points having high risk. One hundred and fifty-five patients (41%) had low risk, 102 patients (27%) had intermediate risk and 117 (31%) had high risk. The distribution of the variables with the highest weight (aCCP status and VAS≥50) over the different risk categories is given in online supplementary tables S1 and S2. HR (95% CI) for the intermediate risk and the high risk group (with the low risk group as a reference) were 4.52 (2.42 to 8.77) and 14.86 (8.40 to 28.32), respectively. Figure 2E depicts the arthritis-free survival curves per risk group. After 1 year, 3% of the low-risk group developed arthritis, 16% of the intermediate group developed arthritis, and 43% of the high-risk group developed arthritis. After 3 years, these percentages were 7%, 36% and 74%, respectively. After 5 years these percentages were 12%, 43% and 81%, respectively (figure 2E).

RA classification criteria

Most patients diagnosed with arthritis (92%), could be classified as having RA according to the 2010 ACR/EULAR criteria (RA2010). With RA2010 as the dependent variable in the Cox regression analysis, regression coefficients changed only slightly and the level of significance was a little lower for all variables (data not shown). This would have resulted in exclusion of the variable ‘location in upper and lower extremities’ because this variable had a p value of 0.11. The explained fraction of variation of the prediction model (including the variable ‘symptoms in upper and lower extremities’) for RA2010 was 0.31 and with RA2010 as the dependent variable the AUC values of the ROC curves and Harrell's C did not change notably (table 3).

Only 56 (45%) of the patients that developed arthritis could be classified as having RA according to the 1987 ACR criteria (RA1987). Cox regression analysis with RA1987 as the dependent variable would have resulted in a much smaller model, because the variables: first degree relative, alcohol non-use, location in upper and lower extremities and swollen joints reported by the patient all had p values above 0.1. But since the number of events was lower, this could be due to loss of power. The explained fraction of variance of the full model for RA1987 was 0.19. The AUC and Harrell's C decreased only slightly (figure 3).

Figure 3

Risk calculation form. With the risk calculation form, the total risk score can be calculated. Patients can be categorised into the low risk (0–4 points), intermediate risk (5–6 points) or high risk (7–13 points) group according to their total score.


Seropositive arthralgia patients are at risk of developing arthritis and subsequent RA. However, antibodies alone do not have sufficient predictive power to guide clinical decisions. We have developed a prediction rule for the development of arthritis in these patients, consisting of nine clinical variables that can easily be assessed during the first visit to the rheumatologist: RA in a first degree family member, alcohol non-use, duration of symptoms shorter than 12 months, presence of intermittent symptoms, arthralgia in upper and lower extremities, VAS pain ≥50, presence of morning stiffness ≥1 h, history of swollen joints as reported by the patient and antibody status. The discriminative ability was good with an AUC value of 0.82 and a Harrell's C of 0.79. Internal validation showed no major problems with overfitting. This rule can be readily used in clinical practice, since no complicated or expensive tests are necessary. It can help to inform patients and help to guide treatment decisions, although treatment protocols in these patients should first be tested in clinical trials.

For clinical applicability, a concern might be that in this study we chose the development of arthritis as outcome. However, arthritis can be self-limiting and still might not need disease modifying antirheumatic drug treatment. We therefore also took the 2010 and 1987 RA criteria into account as outcome variables. The prediction rule then still had good discriminative abilities, although the proportion of explained variance of the prediction model for the 1987 RA criteria was low. An alternative outcome measurement would be erosive arthritis. However, in the present cohort, follow-up data during the arthritis phase on radiographic progression is still too little to provide reliable results. Therefore we did not take erosive arthritis into account as outcome variable, although it will be interesting to explore the discriminative ability of this prediction rule for erosive arthritis in future studies.

Another concern is that, while the prediction rule could discriminate high-risk patients very well from low-risk patients, there was still a group of patients with intermediate risk comprising 27% of the cohort. These patients had a risk of approximately 40% to develop arthritis within 5 years, which is considerable, but also still fairly uncertain. In such a patient group it might be useful to do additional tests such as genetics, RNA profiling, antibody profiling or other test methods that are currently being developed. Also, a more detailed study of the quality of the symptoms experienced by patients in this phase of the disease may be helpful. This has recently been identified as a research priority by the EULAR study group for risk factors for RA.20 Additionally, different imaging techniques to assess the joints such as ultrasound, positron emission tomography and MRI might prove to be useful in these patients.21 ,22

An interesting finding in this patient group was that alcohol consumption seems to protect against development of arthritis, as we and others have previously reported.23–26 Although we would not advise patients to consume large amounts of alcohol, alcohol intake seems beneficial and, as our analysis indicates, should be taken into account in considering the risk of arthritis.

Factors that were surprisingly lacking in the model were presence of an acute phase response, presence of the SE, and smoking. We have previously shown that CRP levels and secretory phospholipase A2 (sPLA2) levels rise prior to development of RA in comparison to healthy individuals, but stay within the normal range.27 ,28 In the present cohort, we have also observed that the CRP levels were slightly higher than in healthy blood donors but still within normal limits. Therefore, an elevated CRP level was not associated with arthritis development.29 The presence of the SE on the other hand, was strongly associated with arthritis development in the univariate analysis. However, this association was not present in the multivariate analysis. An explanation for this phenomenon can be that the contribution of the SE to RA development is mediated via the presence and perhaps levels of ACPA.30 The same could be true for smoking, which is highly associated with the formation of anti-citrullinated protein antibodies (ACPA). The method of patient selection used for this study may cause the loss of association with arthritis of such risk factors within this cohort. We chose to develop a prediction rule for seropositive arthralgia patients from out of a clinical perspective and think the prediction rule will be a useful tool for this patient group. However, there are other groups of individuals at risk for development of arthritis who might be seronegative. Therefore, it is essential that the prediction rule that we developed is validated in other prospectively followed cohorts of patients at risk for the development of RA, such as first degree family members or patients with inflammatory type arthralgia. Such cohorts are currently being set up and followed and it will be exciting to see the results of these studies and the comparison with our cohort.

In conclusion, we developed a prediction rule for the development of arthritis in seropositive arthralgia patient that is discriminative and easily applicable. It is a first step towards better understanding of and care for this patient group.


The study was financially supported by the Dutch Arthritis Association, Grant number 0801034. We would like to gratefully thank all study participants for their efforts. Furthermore, we want to thank Margret de Koning and Roel Heijmans for their technical assistance.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Tore K Kvien

  • Correction notice This article has been corrected since it was published Online First. The author affiliations have been updated.

  • Contributors All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. LAS had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study conception and design: LAS, WB, DS. Acquisition of data: LAS, WB. Analysis and interpretation of data: LAS, BW, DS.

  • Funding Dutch Arthritis Association.

  • Competing interests None.

  • Ethics approval Ethics committee of the Slotervaart Hospital and Reade, Amsterdam, the Netherlands.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.