Background A major challenge for rheumatoid arthritis (RA) research is the detection of subjects at risk. The tradeoff between complexity and interpretability is a key issue in the application of predictive models. Advance recursive partitioning (tree-based) approach (ARPA) is widely used in predictive analyses as it accounts for non-linear effects, offers fast solutions for hidden complex substructure and provides truly non-biased, statistically significant analyses of high dimensional, seemingly unrelated data.
Objectives To evaluate a prediction model for RA using ARPA.
Methods Classification and regression trees (CART), TreeNet and random forest models implemented in the Salford Predictive Modeler® (SPM7) were used in order to evaluate the predictive power of 21 genetic markers including HLA and non-HLA together with clinical data in 368 patients with RA (ACR87) and 353 matched controls. Different set of hyperparameters like RA risk priors, costs matrices, number of cross-validations, splitting rules, among others were evaluated in search for an optimal model.
Results The TreeNet and random forest models showed good predictive performance, obtaining low misclassification error rates suitable for risk prediction (AUC=0.90 and 0.88, respectively) whereas the CART model showed lesser predictive power (AUC=0.71), but still acceptable. CART model allowed direct interpretation of genetic interactions. Interesting matches were observed including HLA-DRB1*04, HLA-DQB1, IL2/IL21 (rs6822844) and gender.
Conclusions Our findings offer a predictive framework for RA based on interactions of genetic variants and epidemiological data. Improvements in clinical risk prediction will allow to identify novel targets for translation into personalized health care.
Disclosure of Interest None declared