Article Text
Abstract
Background Different machine learning methods have been used to develop predictive models of high quality and precision [1]. Among them, Random Survival Forests (RSF) has been proposed as an alternative to traditional survival models [2], being able to overcome most of the limitation of traditional survival techniques, such as Cox proportional hazards models.
Objectives Our objective was to develop and internally validate a predictive model for rheumatoid arthritis (RA) mortality using Random Survival Forests (RSF).
Methods Retrospective longitudinal study involving 1,461 patients diagnosed with RA between January 1994 and August 2011, and followed at the outpatient clinic of the Rheumatology Department of the Hospital Clínico San Carlos (Madrid, Spain) until death or September 2013. Demographic and clinical-related variables collected during the first two years after disease diagnosis were used. RSF models were developed, based on 1,000 trees. 100 iterations of each model were performed to measure the mean and standard deviation (SD) of the predictive error and the integrated Brier score (IBS). Missing values were imputed using the function implemented by the randomForestSRC package [3]. The predictive capacity of the variables was assessed using the “variable importance” (VIMP). Two models were constructed using the log-rank (MLG) or log-rank score (MLGS) splitting rules. The model with the lowest prediction error was selected. Next, those variables with negative VIMP were excluded and a final model developed.
Results 148 patients died (10.1%). MLG showed the lowest prediction error. All variables exhibited a positive VIMP. Final model showed a mean (SD) prediction error and IBS of 0.187 (0.002) and 0.150 (0.003) respectively. The most important predictor variables were age at diagnosis, median erythrocyte sedimentation rate and number of hospital admissions in the first 2 years after RA diagnosis.
Conclusions We developed an accurate and precise model for RA mortality using RSF. Age and disease activity showed the highest influence in mortality.
References
Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP. Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med. 2016;44: 368–74.
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008;2: 841–860.
Ishwaran H, Kogalur U. Random Forests for Survival, Regression and Classification (RF-SRC) [Internet]. 2016 [cited 15 Dec 2016]. Available: https://cran.r-project.org/package=randomForestSRC.
References
Disclosure of Interest None declared