Background: Due to the National Strategy for the Development of Artificial Intelligence, large-scale digitalization of healthcare is taking place in the Russian Federation, which leads to huge number of various practical and scientific tasks emergence of, which in turn require convenient tools to solve them. ROC analysis tool is one of them, which was developed and successfully applied within the framework of the project «Experiment on the use of innovative technologies in the field of computer vision for the analysis of medical images and further application in the healthcare system of the city of Moscow». However, there is an urgent need for the development of a module comparing ROC-curves in order to solve a wider range of problems related to analytics of the operation of technologies based on artificial intelligence.
Aim: to implement the ROC analysis tool module for comparing the area under the curve using statistical methods and calculating the p-value, and to test it on real data.
Materials and methods: the tool is implemented in Python 3.9. The 95% confidence interval for ROC curves was calculated using the bootstrapping and the DeLong method. Areas under the ROC curves comparison was carried out using a permutation test.
The testing of the tool was carried out on the 6 algorithms work results on 2 data sets. Area under the ROC curve pairwise comparison was carried out and the results were compared with the same data results analysis, calculated by the DeLong method (roc.test function, R language 3.6.1).
Results: the p-values obtained using the permutation test were in most cases comparable to the roc.test results, however, in 4 out of 30 cases, the p-values differed significantly, which led to changes in the test interpretation.
Discussion: the differences in the results calculated by two separate methods, in our opinion, are due to the peculiarities of the methods used: DeLong method is more conservative. Also, due to the use of the pseudorandomization method in the permutation test, variability of results is possible, which can lead to uncertainty. In addition, the developed tool compares data of the same length, which is a limitation of its use, but its further development is possible for data of different lengths.
Conclusion: the module for comparing ROC curves was successfully implemented and tested using statistical criteria with the calculation of the p-value.
Aim: to implement the ROC analysis tool module for comparing the area under the curve using statistical methods and calculating the p-value, and to test it on real data.
Materials and methods: the tool is implemented in Python 3.9. The 95% confidence interval for ROC curves was calculated using the bootstrapping and the DeLong method. Areas under the ROC curves comparison was carried out using a permutation test.
The testing of the tool was carried out on the 6 algorithms work results on 2 data sets. Area under the ROC curve pairwise comparison was carried out and the results were compared with the same data results analysis, calculated by the DeLong method (roc.test function, R language 3.6.1).
Results: the p-values obtained using the permutation test were in most cases comparable to the roc.test results, however, in 4 out of 30 cases, the p-values differed significantly, which led to changes in the test interpretation.
Discussion: the differences in the results calculated by two separate methods, in our opinion, are due to the peculiarities of the methods used: DeLong method is more conservative. Also, due to the use of the pseudorandomization method in the permutation test, variability of results is possible, which can lead to uncertainty. In addition, the developed tool compares data of the same length, which is a limitation of its use, but its further development is possible for data of different lengths.
Conclusion: the module for comparing ROC curves was successfully implemented and tested using statistical criteria with the calculation of the p-value.