Background Lack of accurate early diagnostic testing of Rheumatoid Arthritis (RA) remains the limiting factor for early treatment to achieve remission. We hypothesised that molecular pathways implicated in RA pathology in the joint can inform a biomarker discovery programme based on flow-cytometry phenotyping of blood cells. Based on published work, we selected 74 subset/phenotypes and used high-dimensional dataset analysis to established their potential value in discriminating patients from an early arthritis clinic with <6 months symptoms who developed RA from those with other rheumatic diseases or non-persistent inflammation.
Methods 46 patients were enrolled. 6–8 colour flowcytometry was performed using standard protocols. We recorded% of positive cells when 2 populations could be distinguished, and levels of expression (MFI) for single populations. A random forest ordered predictors from these 74 subset/phenotypes according to importance. Then a classification tree was fitted based on the most important factors and finally the variables identified from the classification tree were fitted in a logistic regression.
Results 23 of the 46 patients had RA, 11 non-persistent inflammation and 12 other rheumatic diseases (AS, SPA, gout, OA and reactive). 47 individual phenotypes were analysed and 36 additional ones using a combination of 2 markers. No difference in lineage representation was found; significant difference were observed for 11/47 subsets (P < 0.025) with 3 more borderline significant (P < 0.075). These suggested particularly high significant differences on CD4 and NKT cells as well as lesser one on B, NK and monocytes.
Due to the low number of patients, multivariable analysis was limited. We ran un-supervised cluster analysis which separated 2 groups (RA/non-RA) quite efficiently with 5-RA and 6-non-RA misclassified. A 3-node classification tree using 47 phenotypes classified 20/23 RA patients correctly. The 3 phenotypes showing best discrimination power were IL-6R+NKT-cells, naïve CD4+T-cells and CCR6+ monocytes. A logistic regression confirmed that these 3 phenotypes were highly significant phenotypes (P < 0.005). A random forest combining all phenotypes suggested that 8 CD4+T-cells phenotypes, 1 CD8+ T-cells, 2 B-cells, 1 monocyte and 1 NKT-cells have potential value for discriminating RA from non-RA patients based on accuracy.
Conclusion Despite the small sample size, this analysis demonstrated the value of hypothesis driven marker analysis, with 12 of the 47 phenotypes predicted to have potential showing significant differences. By triangulating different analysis approaches, the robustness of the findings was improved giving confidence in the identification of relevant biomarkers.