Objective To apply a data-driven approach to investigate, in patients newly presenting with undifferentiated inflammatory synovitis, key variables that discriminate the subset of patients at sufficiently high risk of persistent or erosive disease for the purpose of developing new criteria for rheumatoid arthritis (RA).
Methods In this first phase of the collaborative effort of the American College of Rheumatology and European League Against Rheumatism to develop new criteria for RA, a pooled analysis of early arthritis cohorts made available by the respective investigators is presented. All the variables associated with the gold standard of treatment with methotrexate during the first year after enrolment were first identified. Principal component analysis was then used to identify among the significant variables those sets that represent similar domains. In a final step, from each domain one representative variable was extracted, all of which were then tested for their independent effects in a multivariate regression model. From the OR in that final model, the relative weight of each variable was estimated.
Results The final domains and variables identified by this process (and their relative weights) were: swelling of a metacarpophalangeal joint (MCP; 1.5), swelling of a proximal interphalangeal joint (PIP; 1.5), swelling of the wrist (1.5), tenderness of the hand (ie, MCP, PIP or wrist (2)), acute phase reaction (ie, C reactive protein or erythrocyte sedimentation rate and weights for moderate or high elevations of either one (1 for moderate, 2 for high elevation)) and serological abnormalities (ie, rheumatoid factors or anti-citrullinated protein antibodies, again with separate weights for moderate or high elevations (2 and 4, respectively)).
Conclusion The results of this first phase were subsequently used in the second phase of the project, which is reported in a separate methodological paper, and for derivation of the final set of criteria.
Statistics from Altmetric.com
The classification criteria widely employed for rheumatoid arthritis (RA) are the 1987 American College of Rheumatology (ACR; then ARA) criteria.1 They have been internationally accepted by clinicians, researchers and regulators as providing the benchmark for disease definition. However, like other classification criteria, these criteria have certain limitations. In particular, the ACR criteria for the classification of RA were derived from patients with established and mostly long standing disease, and the selected features were those that best discriminated these patients with RA from a variety of other well-defined rheumatological disorders.2 The criteria are therefore not helpful in achieving the goal of early and effective intervention, which based on evidence accumulated over the past two decades, is needed to prevent joint damage and functional loss.3 Indeed, with modern therapies, the goal is to prevent individuals reaching the chronic erosive disease state that exemplifies the 1987 criteria for RA.4
When aiming to develop a new classification for RA, there is no ‘gold standard’ for RA ascertainment, and expert clinical opinion on what constitutes RA is so embedded with the use of the 1987 classification criteria that using such judgement would inevitably create a circularity by which the elements from the original classification criteria would turn out to be important in the new ones. However, the institution of disease-modifying antirheumatic drugs (DMARDs) in patients with inflammatory arthritis suggests a concern about the persistence of disease without the diagnostic label of RA. The aim of the process leading to new classification criteria for RA was therefore to define that group of individuals among subjects with inflammatory arthritis who are considered to be at sufficiently high risk of developing persistent and erosive disease, that currently exemplifies the disease entity ‘RA’ and therefore instituted DMARDs.
While a broad range of DMARDs are used for the treatment of various chronic inflammatory rheumatic conditions, methotrexate (MTX) has become the anchor drug for the treatment of RA over the past decade5 and, indeed, has been recommended as the drug of first choice.6 7 In that sense, the need to start MTX treatment in a patient with arthritis can be used as a reasonable surrogate marker of the disease RA for which, given the mentioned limitations of the ARA 1987 criteria, a gold standard does not exist.
This paper details the methodology of the data-driven approach with the aim of identifying the potential contribution of standard clinical and laboratory data collected at first presentation to predicting later MTX treatment. The identification of these variables was the first step in the process of deriving new criteria for classification of RA. Subsequent steps included a consensus process and a refinement process which are described elsewhere.8 9
The first stage was to identify the most parsimonious combination of variables that, when applied to cohorts of subjects with early arthritis, would identify the subgroup of patients for whom initiation of DMARD therapy was appropriate. For the purposes of this analysis, the gold standard was based on the decision in real life to commence treatment with MTX. Any emerging criteria rules would thus need to be tested for their sensitivity to ensure that the highest practical proportion of patients that should have been ascertained, such as those with persistent and/or erosive disease, were correctly classified as positive.
To allow for differences in physician practice, the data analysis part of the project used data from several early arthritis cohorts which are described below. We prepared the obtained datasets so that the clinical and serological variables that had been collected in these databases could be used for data analysis.
Cohorts and patient selection
Nine cohorts from early arthritis clinics and registers were identified on the basis of prior publications and the ability of getting access to the necessary data items through the respective principal investigators.10,–,17 The cohorts had been established with different entry criteria (see table S1 in the online supplement for a summary of the early arthritis cohorts that were made available to the committee including their providers, as well as the original inclusion criteria). The analysis of data from each single cohort was restricted to those patients who satisfied a number of inclusion and exclusion criteria.
▶. Patients had to have been enrolled in the year 2000 or later to ensure that (first-line) MTX was already in widespread use and to constrain the mix of secular trends in treatment.
▶. Patients had to have at least one swollen peripheral joint, based on the joint count performed in the respective cohort.
▶. The decision to start MTX had to be based on perceived clinical need by the physician rather than being mandated by treatment rules within individual cohorts. The latter would have jeopardised the validity of our approach, in which the gold standard was the need to initiate MTX in clinical practice—that is, at the discretion of the rheumatologist.
We excluded patients with a definitive diagnosis at baseline other than RA. Although some of these ‘definitive’ diagnoses at baseline were later revised during the course of follow-up in the cohorts, the initial labelling of a patient with a tentative diagnosis was deemed sufficient to exclude these individuals from the analysis set. These inclusion criteria were designed to define the population of patients with undifferentiated arthritis, for whom the emerging criteria were the desired target. This target population will also be defined later for the final set of criteria to lay out to whom they can be applied.9
All patients with a documented duration of symptoms of more than 3 years at the time of enrolment into the respective cohorts were also excluded (n=31, 1% of all cases). This was done as a compromise between maximal external validity (a small proportion of patients do present after a long interval following first joint involvement) and avoiding left censorship bias—that is, missing those who did not turn out to have RA and were therefore not followed any longer in a similar way.
Data extraction and transformation
Only six of the nine datasets, which are marked as ‘analysis datasets’ in table 1, have been used in this phase of the project (table 1). The remaining three cohorts (‘validation datasets’) were saved for later validation purposes. The following baseline data items were extracted from the databases based on consensus, even though items were not available for all cohorts or in all individuals within a cohort (table 1): age, gender, number and distribution of swollen and tender joints (in prespecified peripheral joints), Health Assessment Questionnaire (HAQ) scores, morning stiffness (presence and duration), rheumatoid factor (RF) and anti-cyclic citrullinated protein (anti-CCP) levels, erythrocyte sedimentation rate (ESR) and C reactive protein (CRP). A total of 3315 patients were included, with large variations in the number by cohort (ranging from 76 to 596). There was also considerable heterogeneity of patient characteristics at baseline (table 1), which is related to differences in the inclusion/exclusion criteria in the individual cohorts (see table S1 in online supplement).
Of note, duration of symptoms and radiographic signs of RA were excluded from this list of potential independent predictor variables as they would limit the usefulness of the final criteria in early disease. The measure of physician global was excluded because of its high circularity with regard to the gold standard definition of decision to start MTX treatment. The patient's global assessment of disease activity was also excluded as it might be difficult for a patient with new-onset symptoms to evaluate the spectrum of potential ‘disease’ activity.
The gold standard (dependent variable) for the regression models was defined as a treatment attempt with MTX within the first year after enrolment. Two time points were available for evaluation of treatment: 6 and 12 months. Patients receiving another DMARD than MTX or no DMARD at 12 months but who had a treatment attempt with MTX documented at the 6-month time point were also considered positive for the gold standard (details of the algorithm used are given in table S2 in the online supplement).
All the independent variables were divided into ordinal categories for the univariate regression analysis. As a minimum, we considered data from the joint regions that comprise the 28-joint count (ie, the proximal interphalangeal (PIP) and metacarpophalangeal (MCP) joints, wrists, elbows, shoulders and knees). Also, where available, involvement of the feet (ie, ankles and metatarsophalangeal (MTP) joints) was evaluated. Data were analysed from these sites individually and in relation to symmetrical involvement. A 28-joint count was also calculated. An ‘any large joint involvement’ variable was also derived based on involvement of at least one elbow, shoulder or knee. Except for gender and cohort, the other variables were divided a priori into three categories as follows (reference category marked with an asterisk):
▶. Gender: male*/female
▶. Cohort: nominal variable indicating cohort origin for each case
▶. Age: ≤44*/45–64/≥65 years
▶. ESR (1st*, 2nd, 3rd tertile)
▶. CRP (normal by local standards*; low and high defined as above or below median of non-normal values)
▶. Swollen joint counts (out of 28) (1*, 2–5, 6–28)
▶. Tender joint counts (out of 28) (1*, 2–5, 6–28)
▶. Joint swelling by region: separately for PIP, MCP, wrist, MTP, ankles and large joints (no involvement*/unilateral involvement/symmetrical involvement); large joints included the elbows, shoulders and knees
▶. Joint tenderness by region: separately for PIP, MCP, wrist, MTP, ankles and large joints (no involvement*/unilateral involvement/symmetrical involvement)
▶. HAQ scores (1st*, 2nd, 3rd tertile)
▶. Morning stiffness (<1 h*/≥1 h)
▶. RF (normal by local standards*; low and high defined as above or below median of non-normal values)
▶. Anti-citrullinated protein/peptide antibodies (ACPA) tested as anti-cyclic citrullinated peptide (anti-CCP) antibodies (normal by local standards*; low and high as above or below median of non-normal values)
Six cohorts were selected for the pooled data analysis while three cohorts, which needed to represent a good geographical mix, were saved for later validation9 (table 1). A three-stage analytical process was undertaken and sensitivity analyses were performed. First, univariate analysis of the individual independent variables on the gold standard of MTX treatment was undertaken to develop an initial list of possibly important variables (table 1). Given the pooling of heterogeneous datasets from different populations, all the analyses were adjusted for the nominal variable ‘cohort’ identifying the data origin. The term ‘univariate’ is therefore not exact in this context.
Second, principal component analysis was used to extract factors (‘themes’) to which these numerous variables could be summarised. We included any significant variables from the univariate analysis and considered all factors with an Eigenvalue ≥1; loadings of variables were determined using a Varimax rotation with Kaiser normalisation. Scree plots were used to suggest an appropriate minimal model.
Third, to identify the relative importance of these factors, we performed a multiple logistic regression model, adjusting for age, gender and cohort, and a classification tree analysis using the AnswerTree Software (SPSS Inc, Chicago, Illinois, USA).
Finally, a number of sensitivity analyses were undertaken using modifications of the gold standard, such as treatment initiation at 6 months, as well as several stratified analyses using data splits for ≤2004/>2004 to provide insights into the possibility of secular trends; for anti-CCP positivity; and for the fact that in some cohorts anti-CCP results were available to the physician at the time of decision making and in others they were not (eg, measured post hoc).
Identifying relevant variables
The results of the univariate analysis are given in figure 1 which shows the individual ORs including their 95% CI (the detailed data for figure 1 and the statistical results of using the continuous variables (ie, without prior categorisation) are shown in tables S3 and S4 in the online supplement). The swollen and tender joint counts were significant in the respective univariate models. All the individual joint regions analysed were important except for the ‘any large joints’ variable (swelling and tenderness) and MTP joints (tenderness), while ankle involvement even seemed to be a protective factor for MTX treatment. Among the regions that showed a significant association with MTX treatment, significant additional effects by symmetry of involvement were suggested only for MCP swelling and wrist swelling. ESR, CRP, RF and anti-CCP were all significantly associated with MTX treatment and more or less all showed a dose-response across the respective categories. The same was true for HAQ scores. Morning stiffness was not significant using the traditional 1 h cut-off, which was available in most of the cohorts.
In this step we included the significant variables from the univariate analysis. Based on the results, joint site involvement was restricted to swelling of PIP, MCP, wrist or MTP and tenderness of PIP, MCP or wrist. The other variables entered were categories of swollen joint count, tender joint count, ESR, CRP, HAQ, RF and anti-CCP. Based on the Scree plot, four factors were selected, but we included two additional factors with an Eigenvalue just below the traditional cut-off point of 1 (0.99 and 0.94, see figure S1 in online supplement) to ensure no relevant domain would be missed.
The loadings of the individual variables on these factors are shown in table 2. Based on these loadings, the six factors were thematically named as: MCP involvement, wrist involvement, tenderness of the hand, acute phase response, PIP involvement and serology. Tenderness of the hand was defined as the presence or absence of tenderness in any of the three relevant regions (MCP, PIP and wrist joints). For the three variables on MCP, PIP and wrist ‘involvement’, we then used the presence or absence of swelling as the defining feature. The acute phase variable was defined based on whichever category of ESR and CRP was higher (both using the three levels of normal/moderately elevated/highly elevated); the serology variable was analogously defined based on the highest category of RF and anti-CCP.
Determining the diagnostic weight of the variables
Multivariate logistic regression model
The six variables extracted from the factor analysis were then entered as categorical predictors in a multivariate logistic regression model to determine their independent importance. The results of this model are shown in table 3. Predictive capacity was strongest for the serology variable and was more or less the same for the other five variables. Based on the OR, we extracted factor weights which allow a numerical appreciation of the relative importance of the variables in the context of predicting MTX treatment (table 3).
Classification tree analysis
In a subsequent classification tree analysis we investigated the interrelationship of the individual variables in classifying patients according to their MTX treatment status. The tree based on the six variables from the factor analysis is shown in figure S2 in the online supplement. The most powerful split was between seropositive (either RF or anti-CCP positive) and seronegative (RF and anti-CCP negative) patients. Among the seropositive patients, tenderness of the hand region was the next split, even before joint swelling or the acute phase response came into play. Among the seronegative patients, the subsequent splits were by wrist swelling, PIP swelling and then MCP swelling (depending on the presence or absence of wrist swelling). At the subsequent step, the acute phase response was the split at all branches.
The tree shown is very detailed and not pruned back to the smallest set of most reliable predictors, because the purpose of the analysis was not to define the final criteria but rather to understand the sequential importance of the various variables.
In supplementation of the traditional interpretation of a classification tree, we are providing two additional aspects based on the tree analysis: first, we followed the tree into its periphery to investigate what determined MTX treatment. As shown in figure S3 in the online supplement, 56% of those patients who were finally treated with MTX were seropositive. Of the remaining 44% (seronegative) patients who were later treated with MTX, 23% had swelling of the wrist and, of those without wrist swelling, an additional 9% had swelling of the MCP joints. Despite the inclusion of acute phase response and swelling of the PIP joints, there remained a 5% proportion of seronegative patients without elevated acute phase response or any swelling of the wrist, MCP or PIP joints who had been treated with MTX. Hand tenderness was only important among the seropositive patients and is therefore not relevant in this interpretation of the data.
Finally, we looked at the most powerful combinations of these variables—that is, those which identified the largest proportion of treated patients. It can be seen from table 4 that, among seropositive patients with hand tenderness and highly elevated measures of the acute phase response, 73.7% received MTX treatment at 12 months. If the acute phase measures were normal or only moderately elevated, still 63% of these patients later received MTX.
An additional sensitivity analysis employing the 6-month MTX treatment status as the gold standard did not identify additional variables, nor did stratification for secular trends (≤2004/>2004), for anti-CCP positivity or awareness of anti-CCP status at the time of decision making.
This report is on the first phase of a large collaborative effort of the ACR and European League Against Rheumatism (EULAR) to develop new classification criteria for RA. The approach in this phase was purely data-driven and identified several important factors that appear to suggest to physicians the presence of an inflammatory arthritis requiring DMARD treatment, consistent with our current concept of RA—namely, swelling and tenderness of the small joints of the hands, serology and acute phase reactants. Although these might not be considered to be very surprising, several ‘traditional’ markers such as symmetry and morning stiffness did not turn out to be important.
Several points of potential criticism have to be addressed. First, is MTX initiation the best surrogate for a diagnosis of RA? It is not. However, its definition as the gold standard in the present analyses was based on a compromise between the danger of circularity (eg, when the ARA 1987 criteria or a physician ‘diagnosis’ were used as gold standard) and accuracy. The latter refers to the fact that no treatment indication is absolutely specific to a single disease entity, and MTX might therefore also have been introduced for other arthritides such as psoriatic arthritis and, less likely, other diseases. Thus, the compromise of using MTX initiation was to gain the advantage of avoiding circularity, which is the greatest hurdle in redefining currently very rigidly defined diseases.
Second, the databases used were of different origins and were highly variable, not only in their size (contribution to the pooled set) but also in terms of symptom duration, seropositivity and disease activity of the included patients. Pooling of such heterogeneous data is usually considered very difficult and a potential threat to validity. In our analysis we wanted good generalisability of the final results, for which we considered the heterogeneity of the data very helpful. However, to reduce some of the noise in the pooled dataset, we accounted for each respective setting by adjusting for differences in cohorts using the nominal ‘cohort’ variable when we looked at each single variable.
Third, several important features of established RA were excluded from consideration in this exercise. This was intentional as the major motivation of the project was to make the definition of RA more sensitive to early disease, so typical features of classical established RA such as erosions and nodules had to be excluded. For the same reason we excluded approximately 1% of patients who already had longstanding disease at baseline (defined as >3 years). It is without doubt that patients presenting with nodules and destructive joint disease will be clinically diagnosed with RA, but the inclusion of these features in classification sets would give them overwhelmingly strong weights and would result in an important number of patients with early RA who do not have erosions being missed. While defining RA by the presence of erosions has been the current standard way to think about and approach this disease, the new classification criteria are intended to not have the disease defined by the outcome that we want to prevent with effective and timely therapy.
The utility of the phase I results are manifold. All classification criteria need to be built on evidence—either from the literature or, as in our current approach, from extensive data analysis. The results from phase I were complementary to and informative for phase II of the work towards the development of the new ACR/EULAR RA classification criteria. In phase II, international expert rheumatologists provided their expertise in the rating of patient profiles. In addition to the scientific evidence that is needed, the second phase was necessary to ensure a good representation and coverage of expertise of the physicians actually diagnosing and treating patients in clinical practice, and to ensure that factors not included in the cohorts used in phase I analyses were also considered. Together with phase II, this work informed the final phase of criteria development outlined in the companion paper,8 in which the final criteria set, including the simplified scoring algorithm and the cut-off point to be used to define ‘definite RA’, were determined and preliminary validation performed.
The authors thank Joan Bathon, Dinesh Khanna, Larry Moreland, Jim O'Dell, Ted Pincus and Fred Wolfe for their highly valuable contributions during the data analysis process of the project. They are also grateful to Celina Alves, Carly Cheng, Tracey Farragher, Elisabeth Hensor, Jolanda Luime, Klaus Machold, Maria Dahl Mjaavatten, Valerie Nell, Nathalie Rincheval, Marleen van de Sande and Annette van der Helm-van Mil who were involved in development, data management or maintenance of their respective dataset; and Amy Miller and Regina Parker from the ACR and Heinz Marchesi and Anja Schönbächler from EULAR for their administrative support of the project.
Funding ACR, EULAR.
Competing interests None. Francis Berenbaum was the handling editor for this manuscript.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.