Objectives Rheumatoid factor (RF) and anti-cyclic citrullinated protein/peptide antibodies (ACPA) are integrated in the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria for rheumatoid arthritis (RA). The objectives of this study were to evaluate the technical and diagnostic performance of different RF and ACPA assays and to evaluate whether differences in performance impact RA classification.
Methods Samples from 594 consecutive patients who for the first time consulted a rheumatologist (44 of whom were diagnosed with RA) and 26 extra newly diagnosed patients with RA were analysed with six different RF assays (Menarini, Thermo Fisher, Inova, Roche, Abbott, Euroimmun) and seven different ACPA assays (Menarini, Thermo Fisher, Inova, Roche, Abbott, Euro Diagnostica, Euroimmun).
Results We found differences in analytical performance between assays. There was poor numerical agreement between the different RF and ACPA assays. For all assays, the likelihood ratio for RA increased with increasing antibody levels. The areas under the curve of receiver operating characteristic analysis of the RF (range 0.676–0.709) and ACPA assays (range 0.672–0.769) only differed between some ACPA assays. Nevertheless, using the cut-off proposed by the manufacturer, there was a large variation in sensitivity and specificity between assays (mainly for RF). Consequently, depending on the assay used, a subgroup of patients (13% for RF, 1% for ACPA and 9% for RF/ACPA) might or might not be classified as RA according to the 2010 ACR/EULAR criteria.
Conclusion Due to poor harmonisation of RF and ACPA assays and of test result interpretation, RA classification according to 2010 ACR/EULAR criteria may vary when different assays are used.
- rheumatoid arthritis
- rheumatoid factor
Statistics from Altmetric.com
Rheumatoid arthritis (RA) is the most common chronic inflammatory joint disease, affecting 0.5%–1% of the population in the industrialised world.1 If left untreated, or undertreated, RA is associated with progressive and irreversible joint destruction leading to disability, reduction of quality of life and increased mortality.2 The early start of aggressive therapy aiming to halt progression of disease is currently being emphasised as important strategic principle in view of the ‘window of opportunity’ theory.3 4 However, in patients with early disease, diagnosis of RA is difficult.5
The American College of Rheumatology (ACR) criteria are widely used as the ‘gold standard’ for classification of RA. Due to the lack of sensitivity of the 1987 ACR criteria,6 the ACR and the European League Against Rheumatism (EULAR) proposed new classification criteria in 2010.7 Application of the 2010 classification criteria provides a score of 0–10, with a score ≥6 being indicative of definite RA. The presence of rheumatoid factor (RF) or anti-cyclic citrullinated protein/peptide antibodies (ACPA) contributes two points if detectable and three points if present at levels >3 times the upper limit of normal.7
RF is an antibody against the Fc portion of IgG. Despite its relatively low specificity (±80%), RF has historically been used as a serological marker for RA.8 ACPA are antibodies to citrullinated peptides. ACPA are associated with a bad outcome and are more specific (±95%) for RA than RF.9 10 Overall, sensitivities of RF and ACPA for RA are comparable (±60%).2 11 In the 2010 ACR/EULAR criteria, RF and ACPA are regarded as equivalent.7
Over the past years, different assays for the detection of ACPA and RF have been introduced. Initially, ACPA were detected by ELISA using citrullinated recombinant rat filaggrin.12 Subsequently, sensitivity of ACPA tests was enhanced without compromising specificity by using synthetic cyclic citrullinated peptides (CCP2) (second-generation ACPA).13 More recently, a third-generation ACPA (CCP3) test has been designed.14 15
Most (not all) RF assays are calibrated against an international recognised standards, namely the WHO International standard W1066 or the British Standard of human RA serum 64/002 standard (National Institute of Biological Standards and Control (NIBSC)). Both standards are the same material.16–18 ACPA assays, on the other hand, are not harmonised. There is a large variability between the different ACPA assays and numerical test results are not interchangeable.19 Only recently, the Centers for Disease Control and Prevention (CDC) provided a reference human ACPA for in vitro immunodiagnostic use in solid phase enzyme immunoassays.20
The objectives of this study were to evaluate the technical and diagnostic performance of different ACPA and RF assays and to study whether differences in test performance could impact the 2010 ACR/EULAR classification of RA.
Materials and methods
Patients and samples
Between January 2014 and June 2015, all unique patients (n=594) who for the first time underwent laboratory testing for a rheumatologic disease, requested by a rheumatologist of the Onze-Lieve-Vrouw Hospital in Aalst, Belgium (a secondary care hospital), were included. Patients for whom there was not enough serum to perform additional testing were excluded. All serum samples were stored at −20°C before analysis.
After review of the electronic medical records, the diagnosis was registered and reviewed by the consulting rheumatologist. Patients were categorised into three groups: RA, rheumatologic disease control group (RDCG) and disease control group (DCG). A patient with synovitis was considered to have RA (n=44) when the treating rheumatologist initiated methotrexate treatment (if no contraindication existed) and no alternative diagnosis could better explain the symptoms. These RA criteria were based on the criteria used for deriving the 2010 ACR/EULAR criteria.7 The RA diagnosis was reviewed after 1 year of follow-up.
To enlarge the group of patients with RA, 26 additional patients with RA (recruited between June 2012 and April 2016) were included. These patients with RA were coded as described above.
For all patients coded as RA it was checked whether they fulfilled the 1987 ACR6 and the 2010 ACR/EULAR criteria7: 53 patients fulfilled both criteria, 14 fulfilled only the 1987 criteria and 3 patients fulfilled only the 2010 ACR/EULAR criteria (online supplementary data S10).
Supplementary file 10
Six commercial RF and seven commercial ACPA assays were included in this study.
For RF, the Quantia RF on the Abbott ARCHITECT c System (Abbott, Germany), QUANTA Lite RF IgM ELISA on a QUANTA-Lyser 2 (Inova Diagnostics, USA), RF EliA IgM on Phadia 250 (Thermo Fisher Scientific, Sweden), RF-II on a Cobas c502 analyser (Roche Diagnostics, Germany), Diagam RF on a ZENIT analyser (Menarini Diagnostics, Italy) and the RF IgM ELISA from Euroimmun (Euroimmun, Germany) were evaluated.
For ACPA, the ARCHITECT Anti-CCP assay on the ARCHITECT i System (Abbott), Immunoscan CCPlus (Euro Diagnostica, Sweden) on a QUANTA-Lyser 2 (Inova Diagnostics), QUANTA Flash CCP3 on the BIO-FLASH instrument (Inova Diagnostics), CCP EliA IgG on Phadia 250 (Thermo Fisher Scientific), Anti-CCP on a Cobas e601 analyser (Roche Diagnostics), ZENIT RA CCP on a ZENIT analyser (Menarini Diagnostics) and the Anti-CCP ELISA (IgG) from Euroimmun (Euroimmun) were included. With the exception of QUANTA Flash CCP3, all included ACPA assays were CCP2 tests.
Imprecision was determined using the manufacturer’s internal quality control (iQC) materials, a patient serum sample with a low and a patient sample with a high RF and/or ACPA concentration. All iQC samples were measured before and after every run during 10 runs.21
Linearity was assessed by diluting serum samples containing RF or ACPA with increasing amounts of a serum sample with very low levels of RF or ACPA.22
The limit of quantification (LOQ) was verified by analysing 10 times a serum sample with an RF/ACPA concentration around the LOQ provided by the manufacturer.23
The WHO W1066 international reference serum for RF (target value of 25 IU/mL) and the CDC ACPA standard (target value of 100 IU/mL) were analysed for, respectively, RF and ACPA with all assays. The standard material was reconstituted according to the guidelines, aliquoted and measured three times in different runs.24
To determine the amount of carry-over, a sample with a high concentration (H) of RF/ACPA and one with a low concentration (L) was measured two times (in the sequence HHLLL).25
For analytical method comparison, Bland-Altman plots (mean difference in U/mL), least squares regression analysis and Spearman’s rank correlation coefficients (r) (and 95% CIs) were calculated for all assays.26
Diagnostic performance was evaluated by sensitivity, specificity, likelihood ratio (LR) and receiver operating characteristic curve analysis.
Statistics were performed using MEDCALC (V.17.1, Ostend, Belgium).
Patients and samples
We included 594 unique consecutive patients for whom the rheumatologist considered the possibility of RA. Forty-four (7.4%) had RA, 247 (41.6%) were coded as RDCG and 225 (37.9%) as DCG. For 78 (13.1%) patients, the rheumatologic diagnosis remained undifferentiated. In addition, we included 26 extra newly diagnosed patients with RA. An overview of the demographic features of patients with RA and the controls is listed in online supplementary data table S1. Patients with RA were significantly older than the controls.
Supplementary file 1
Total CVs (online supplementary data table S2), except for Menarini RF, were within the manufacturer’s specifications. The highest imprecision was found for Inova RF ELISA and Euro Diagnostica ACPA ELISA.
Supplementary file 2
No significant carry-over was detected (<1% for all methods).
For all assays, the Cusum test for linearity did not reveal significant deviation from linearity.
Limit of quantification
The LOQ was verified for every assay included (online supplementary data table S3). Not every manufacturer had a predefined criterion for LOQ available.
Supplementary file 3
Quantification of WHO W1066 RF and CDC ACPA standards
With the exception of Euroimmun RF ELISA, all evaluated RF assays reported in their package insert traceability to an international RF standard (Menarini, Abbott, Thermo Fisher: W1066; Roche, Inova: British 64/002). W1066 was tested three times for RF with all assays. We found a good quantitative agreement between RF IgM assays from Menarini, Abbott, Thermo Fisher and Roche. The Inova RF IgM ELISA gave higher values (figure 1A). The ratio of the mean W1066 standard value over the manufacturer specific cut-off value varied from 0.6 to 9.3 (figure 1B). The W1066 standard was scored negative by one assay, borderline positive by one assay and strongly positive by three assays. For one assay the median W1066 value corresponded to the cut-off.
The CDC ACPA reference material was tested with all ACPA assays. We found large differences between numerical results obtained, with results from Euro Diagnostica and Roche being much higher than results from the other manufacturers (figure 1C). All assays scored the CDC ACPA reference material as ‘strongly positive’ according to the 2010 ACR/EULAR criteria, but ratios of median standard values over manufacturer’s specific cut-off values varied from 11.2 to 22.3 (figure 1D).
For all methods, the median RF and ACPA titres were significantly higher in samples from patients with RA than in samples from controls (online supplementary data table S4), but the range and numerical values varied substantially among methods. Figure 2A,B and online supplementary data table S5 summarise the results of method comparison studies for RF and ACPA. Spearman’s rank r varied between 0.400 and 0.783 and between 0.336 and 0.702 for RF and ACPA assays, respectively. Bland-Altman analysis and regression analysis revealed low quantitative agreement between assays. Poor numerical agreement was observed between assays, both for RF and ACPA assays, with large deviations away from the target values of 1.00 for slopes and 0.00 for intercepts. For RF, the best agreement was observed for results obtained with the Roche and Abbott methods. For ACPA, the best agreement was between Euroimmun and Abbott.
Supplementary file 4
Supplementary file 5
Using the manufacturer’s cut-off, RF positivity was found in 35.7%–60.0% of patients with RA and in 0.4%–27.5% of controls. The highest sensitivity (60.0%) and lowest specificity (71.6%) were found for Euroimmun RF. ACPA positivity was comparable between the assays and varied between 34.3% and 41.4% in patients with RA and between 0.4% and 3.2% in controls. RF/ACPA prevalence and antibody level were higher in patients with RA aged <70 years compared with patients with RA aged ≥70 years. This was statistically significant for all included ACPA assays but not for the RF assays (online supplementary data table S6). For 64 of the 70 patients with RA included, the date of onset of symptoms could be retrieved. Prevalence of RF and ACPA tended to be lower in patients with early RA (n=37/64 with symptom onset <3 months before evaluation) than in patients with established RA, but this was not statistically significant (online supplementary data table S7). Prevalence of RF and ACPA was significantly higher in patients with RA with erosive RA disease than in patients without erosive disease (P<0.05 for all methods, except for Euroimmun RF (P=0.4362)).
Supplementary file 6
Supplementary file 7
The AUCs were 0.709, 0.687, 0.676, 0.709, 0.690 and 0.708 for, respectively, Menarini, Thermo Fisher, Inova, Roche, Abbott and Euroimmun RF assays, and 0.698, 0.769, 0.685, 0.672, 0.693, 0.734 and 0.709 for, respectively, Menarini, Thermo Fisher, Inova, Roche, Abbott, Euro Diagnostica and Euroimmun ACPA assays (online supplementary data table S8). For the RF assays, the AUCs were not statistically significantly different. For ACPA, the AUCs were significantly different between Roche and Euro Diagnostica (P=0.0115) and between Abbott, Inova, Euroimmun, Roche, Menarini on the one hand and Thermo Fisher on the other hand (respectively P=0.0008, P=0.0057, P=0.0287, P=0.0011 and P=0.0037).
Supplementary file 8
At a cut-off that corresponded to a specificity of 95% for the RF assays, the sensitivity ranged from 37.1% to 44.3%, depending on the assay. The specificity of Abbott RF IgM assay at LOQ of 20 U/mL was already >97.5%. The specificity of the Euroimmun RF IgM ELISA never exceeded 95.5%. At a cut-off that corresponded to a specificity of 98.5% for the ACPA assays, sensitivity ranged from 32.9% to 40.0%.
Supplementary file 11
Table 3 shows the LRs for RF and ACPA and a combination thereof according to the serological 2010 ACR/EULAR criteria. The LRs for RA were higher for strong positive results (>3 times cut-off) than for weak positive results (one to three times cut-off value). Strong positive RF or ACPA results had a high LR for RA (>10), except for two RF assays (from Inova and Euroimmun). The highest LRs were consistently found for double positivity for RF and ACPA. The differences in LR between assays were less pronounced when a cut-off that is based on a predefined specificity was applied.
As our study revealed differences in test results between different companies, we evaluated whether such differences could impact disease classification. All patients with RA were classified according to the 2010 ACR/EULAR criteria using RF and ACPA results obtained with assays from different manufacturers (table 4 and online supplementary data table S9). In 32 (of 70) patients with RA (46%), a 2010 ACR/EULAR criteria score ≥6 was obtained based solely on clinical data (ie, without lab data). Inflammation did not contribute to disease classification. Thus, in 38 patients (54%), RF and/or ACPA contributed to disease classification. When only RF was considered, then 49 or 58 patients fulfilled the criteria when, respectively, the least or most sensitive RF assay was considered. When only ACPA was considered, then 49 or 50 patients fulfilled the criteria when, respectively, the least or most sensitive assay was considered. When RF and ACPA were considered (by combining assays from the same manufacturer), then 53 or 59 patients fulfilled the criteria depending on the manufacturer. Thus, classification of patients using the 2010 ACR/EULAR criteria depended on the assays used.
Supplementary file 9
In this study, we compared the analytical and diagnostic performance of six RF and seven ACPA assays in a secondary care hospital.
We found differences in analytical performance between assays. For example, some assays had a higher imprecision and poorer linearity than other assays. Several manufacturers did not specify the LOQ.
We quantified the WHO W1066 RF standard with all RF assays. Of note, NIBSC 64/002 is the same material as WHO W1066.16–18 Assays from Thermo Fisher, Menarini, Abbott RF and Roche are traceable to either W1066 or NIBSC 64/002 and gave comparable results close to the target value of 25 IU/mL (confirming good analytical accuracy). Despite traceability to NIBSC 64/002, the Inova RF assay revealed higher results and poor agreement with the other RF IgM assays (indicating poor analytical accuracy). Euroimmun did not mention traceability and results obtained with this assay differed from results obtained with the other assays.
Differences in numerical values between results obtained with different assays were further stressed by Bland-Altman analysis and regression analysis. Even assays calibrated against the same reference material do not give comparable results. These data illustrate a lack of harmonisation in RF testing with quantitative differences between assays. Test results cannot be used interchangeably. Although a reference serum for RF has been available since 1968,18 standardisation of RF determination across companies has not yet been achieved.
Even though RF assays from Thermo Fisher, Roche, Menarini and Abbott are traceable to the same reference material and give comparable results for W1066 (around 20 IU/mL), they apply totally different cut-off values, respectively, below (5 and 14 IU/mL), on (20 IU/mL) or above (30 IU/mL) the value for W1066. Consequently, sensitivity, specificity and LRs differed among assays. For example, Abbott RF had a particularly high specificity, but a low sensitivity. This again highlights a lack of harmonisation of test result interpretation. Since the 2010 ACR/EULAR criteria take RF and ACPA positivity into account, cut-off values should be aligned among companies, for example, by defining cut-offs based on a predefined specificity in disease controls (eg, 95%).
None of the ACPA assays tested were traceable to an international standard. This lack of standardisation between the different assays obviously led to substantial dispersion in numerical test results when the CDC reference serum was tested. Such differences have previously been reported.19 Euro Diagnostica and Roche ACPA assays gave much higher numerical values than the other assays, which is related to the fact that Roche calibrated its assay against Euro Diagnostica. Alignment of cut-off values across companies could be improved.
As our study revealed differences in clinical performance between different RF and ACPA assays, we evaluated whether such differences have an impact on disease classification based on the 2010 ACR/EULAR criteria. Indeed, we found that for some patients disease classification depended on the RF and/or ACPA assay used. This further illustrates the need to align clinical interpretation of test results between companies. Correct classification and diagnosis is important to initiate adequate treatment and to exclude self-limiting arthritis and avoid inappropriate treatment.5 27
The 2010 RA classification criteria give a score of 2 for a low-positive RF or ACPA and of 3 for a high-positive RF or ACPA. Our study revealed that the LR for RA of a high positive RF or ACPA test result (varying between 3.3 and 57, depending on the assay) was clearly higher than the LR of a low-positive RF or ACPA result (varying between 0.7 and 3.7). As previously pointed out,28 future improvements of the RA classification criteria should consider to give a higher relative weight to a high-positive RF or ACPA result compared with a low-positive RF or ACPA result.
In our study, LR for a negative test result was high (ranging from 0.5 to 0.6), indicating that a negative test result for both RF and ACPA does not exclude RA. By contrast, a strong positive RF or ACPA test had an LR >10 for most (but not all) assays. Such result has a significant effect on post-test probability.29
The prevalence of RF (36%–57%) and ACPA (32%–41%) in the RA population was lower than 60%, which is widely considered the sensitivity of RF and ACPA for RA.2 We hypothesise that the low seropositivity in our study is related to the inclusion of older patients and of patients with early arthritis. First, 56% of the patients with RA included were >70 years. Elderly patients with RA are typically seronegative,30 31 have a milder disease course and are referred to a secondary care hospital setting. If patients >65 years old were excluded, seropositivity increased to 45%–50% for RF and 45%–52% for ACPA, which is comparable to previous reports (48.5% for RF and 49% for ACPA).32 Second, 57.8% of the patients with RA included were patients with early RA (less than 3 months’ symptom duration before RA diagnosis). Although not statistically significant, there was a trend to lower RF/ACPA positivity in patients with early RA compared with the patients with established RA, as previously reported.11 32 33 A higher prevalence of ACPA was found in patients with RA with erosive disease, confirming the prognostic value of ACPA.34 35 We could not confirm the better diagnostic performance of CCP3 in early arthritis.15
All included patients with RA fulfilled either the 1987 or the 2010 ACR/EULAR criteria, with a higher proportion of patients fulfilling the 1987 criteria (95.7%) rather than the 2010 criteria (80.0%). This was an unexpected finding as the 2010 ACR/EULAR criteria intended to increase diagnostic sensitivity.7 27 This could be explained by the specific characteristics of the study population including many patients with oligoarthritis, given the fact that a seronegative patient can only fulfil the 2010 ACR/EULAR criteria when >10 joints are involved.7
In conclusion, we illustrated differences in technical and diagnostic performance between RF and ACPA assays from different manufacturers. There is a lack of harmonisation of RF and ACPA assays in terms of numerical values and diagnostic performance (sensitivity, specificity, LR). The differences in diagnostic performance can have an impact on 2010 ACR/EULAR criteria classification, which must be confirmed on a larger RA population.
We thank Abbott, Euro Diagnostica, Euroimmun, Inova, Menarini, Thermo Fisher and Roche for the donation of the assays. We are very thankful to the laboratory technicians for their most appreciated efforts.
Handling editor Josef S Smolen
Competing interests XB has received lecture fees from Thermo Fisher, Inova and Menarini and has been a consultant for Inova.
Ethics approval OLV Hospital Ethics Committee (Belgian registration number of ethical approval B126601627018).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.