Application of XGBoost and Catboost Algorithms for Elderly Hypertension Classification on IFLS 5 Data

Authors

  • Ekklesiafilifi Loyalita Crossesa Universitas Negeri Surabaya
  • A’yunin Sofro Universitas Negeri Surabaya

DOI:

https://doi.org/10.59632/leibniz.v6i01.734

Keywords:

CatBoost, Classification, Elderly Hypertension, IFLS 5, XGBoost

Abstract

Hypertension in the elderly poses complex classification challenges, characterized by noisy categorical features in health survey datasets. This study focuses on using XGBoost and CatBoost algorithms to overcome barriers when classifying hypertension in the elderly ( years) using IFLS 5 data. Unlike standard methods that focus on accuracy, this evaluation emphasizes the recall metric to reduce false negative errors, which is crucial for ensuring safety in medical screening. After carefully tuning the hyperparameters using GridSearchCV and 5-fold cross-validation on 2,774 participants, the models revealed clear algorithmic trade-offs. CatBoost demonstrated superior generalization stability and achieved the highest accuracy (66.49%), while XGBoost exhibited significant superiority in sensitivity (recall of 80.18%) by effectively applying regularization to detect minority class signals. Evaluating feature significance using the information gain and prediction values change metrics verified that biological indicators, particularly diabetes and BMI, were the main predictors compared to demographic variables. In summary, CatBoost is reliable, but XGBoost is better suited for building clinical decision support systems where the priority is detecting sensitivity.

Downloads

Download data is not yet available.

References

Berrar, D. (2019). Performance measures for machine learning classification. In Encyclopedia of bioinformatics and computational biology (pp. 988–994). Elsevier. https://doi.org/10.1016/B978-0-12-809633-8.20349-X

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Chowdhury, M. Z. I., et al. (2022). Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis. PLOS ONE, 17(4), e0266334. https://doi.org/10.1371/journal.pone.0266334

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

Gholamy, A., Kreinovich, V., & Kosheleva, O. (2018). Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation (Technical Report UTEP-CS-18-09). El Paso, TX: University of Texas at El Paso.

Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 94. https://doi.org/10.1186/s40537-020-00369-8

Handayani, A., et al. (2018). Hypertension prediction system using data mining techniques. Journal of Physics: Conference Series, 1196.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York, NY: Springer.

Hicks, S. A., et al. (2022). On evaluation metrics for medical applications of artificial intelligence. Scientific Reports, 12, 5979. https://doi.org/10.1038/s41598-022-09966-1

Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1.

Islam, M. M., et al. (2023). Predicting the risk of hypertension using machine learning algorithms: A cross-sectional study in Ethiopia. PLOS ONE, 18(8), e0289613.

James, P. A., et al. (2014). 2014 evidence-based guideline for the management of high blood pressure in adults: Report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA, 311(5), 507-520.

Kementerian Kesehatan RI. (2019). Laporan nasional Riskesdas 2018. Jakarta: Badan Penelitian dan Pengembangan Kesehatan.

Kurniawan, R., et al. (2023). Hypertension prediction using machine learning algorithm among Indonesian adults. Journal of Big Data.

Lathifah, N. B., & Pratiwi, D. (2022). Komparasi algoritma Support Vector Machine dan Naive Bayes untuk klasifikasi penyakit hipertensi. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(1), 127–133.

Lundberg, S. M., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56-67.

Nguyen, Q. H., et al. (2021). Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Mathematical Problems in Engineering, 2021.

Ogunleye, A., & Wang, Q. G. (2020). XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6), 2131-2140.

Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31.

Raschka, S. (2020). Model evaluation, model selection, and algorithm selection in machine learning. Cognitive Computation, 12(1), 1063–1093. https://doi.org/10.1007/s12559-020-09740-9

Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.

Sarkar, J. (2020). Practical machine learning with Python: A problem-solver's guide to building real-world intelligent systems. New Delhi: BPB Publications.

Siagian, T. H. (2022). Hipertensi pada lansia di Indonesia: Tinjauan data IFLS 5. Jurnal Epidemiologi Kesehatan Komunitas, 7(1).

Strauss, J., Witoelar, F., & Sikoki, B. (2016). The Fifth Wave of the Indonesia Family Life Survey (IFLS5): Overview and field report. Santa Monica, CA: RAND Corporation.

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 281.

World Health Organization. (2021). Hypertension. Retrieved from https://www.who.int/news-room/fact-sheets/detail/hypertension

Zhang, Z., Zhao, Y., Canes, A., Steinberg, D., & Lyashevska, O. (2019). Predictive analytics with gradient boosting trees in clinical medicine. Annals of Translational Medicine, 7(7).

Published

2026-01-18

How to Cite

Application of XGBoost and Catboost Algorithms for Elderly Hypertension Classification on IFLS 5 Data. (2026). Leibniz: Jurnal Matematika, 6(01), 01-14. https://doi.org/10.59632/leibniz.v6i01.734