It is very important to balance the processes of creating the simplest and most effective predictive models in medicine. The predictors in the model determine its quality and practical relevance but selecting them is not always easy. The aim of the study is to compare different methods of prediction selection to create medical prognostic models.
Methods. We compare simple methods, such as correlation, predictor filtering based on basic statistics, and Hosmer-Lemeshow univariate analysis, with more complex methods often used in machine learning, such as recursive feature elimination, LASSO regression, and classification trees. The predictive models were built using the binary multiple logistic regression method. Statistical analysis was carried out using the programming language R (version 3.4.2).
Results. Based on the LASSO and random forest methods, as well as the stepwise regression method, the most accurate predictive models were constructed (minimum AIC value). The Hosmer-Lemeshow method and basic methods of statistical analysis have been found to be the least effective.
Conclusion. The use of predictor selection methods often significantly reduces their number, filtering out non-informative ones, which improves the quality of the predictive model.
Methods. We compare simple methods, such as correlation, predictor filtering based on basic statistics, and Hosmer-Lemeshow univariate analysis, with more complex methods often used in machine learning, such as recursive feature elimination, LASSO regression, and classification trees. The predictive models were built using the binary multiple logistic regression method. Statistical analysis was carried out using the programming language R (version 3.4.2).
Results. Based on the LASSO and random forest methods, as well as the stepwise regression method, the most accurate predictive models were constructed (minimum AIC value). The Hosmer-Lemeshow method and basic methods of statistical analysis have been found to be the least effective.
Conclusion. The use of predictor selection methods often significantly reduces their number, filtering out non-informative ones, which improves the quality of the predictive model.