Diabetes Prediction using Machine Learning Algorithms with Performance metrics and Holdout method on Egyptian dataset
DOI:
https://doi.org/10.21015/vtse.v14i2.2375Abstract
Diabetes is a non-communicable disease affecting people of all ages worldwide, therefore, early detection using machine learning techniques is crucial. This study aims to predict diabetes using multiple machine learning algorithms, performance metrics, and holdout validation on an Egyptian dataset. The dataset was divided into four age groups, including paediatric, early adulthood, middle age, and geriatric. Ten algorithms were applied and validated using 80:20, 70:30, and 60:40 split ratios with accuracy, precision, and recall as evaluation metrics. Results showed that Random Forest, Extra Trees, and Support Vector Machine performed best in the paediatric group, while Gradient Boosting, Random Forest, and Support Vector Machine achieved superior performance in early adulthood, middle age, and geriatric groups. In contrast, Decision Tree, K-Nearest Neighbors, and AdaBoost consistently demonstrated lower performance. Further analysis reveals that classification performance varies significantly across age groups, with the middle age and geriatric groups achieving the highest accuracy above 0.99, followed by the paediatric group 0.98–0.99, while early adulthood exhibits comparatively lower performance due to increased class overlap. Confusion matrix results indicate strong diagonal dominance in higher-performing groups, reflecting better class separability, whereas performance heatmaps confirm that top models maintain a balanced trade-off between accuracy, precision, and recall with minimal variation across different data splits. Feature importance analysis shows that higher performing models rely on a small number of dominant predictors, particularly in the middle age and geriatric groups, while more distributed feature contributions in early adulthood reduce predictive effectiveness. Therefore, the findings demonstrate that ensemble methods provide robust and consistent performance, and that age-based dataset segmentation enhances classification accuracy and model stability.
References
American Diabetes Association, “Standards of medical care in diabetes—2019 abridged for primary care providers,” Clinical Diabetes, vol. 37, no. 1, pp. 11–34, 2019.
World Health Organization, Global Report on Diabetes. Geneva, Switzerland: WHO, 2024.
T. A. A. Aaty, M. M. Rezk, M. H. Megallaa, M. E. Yousseif, and H. S. Kassab, “Serum leptin level and microvascular complications in type 2 diabetes,” Clinical Diabetology, vol. 9, no. 4, pp. 239–244, 2020.
A. Sapra, P. Bhandari, and A. W. Hughes, Diabetes Mellitus (Nursing). Treasure Island, FL, USA: StatPearls Publishing, 2021.
World Health Organization, Classification of Diabetes Mellitus. Geneva, Switzerland: WHO, 2019.
V. Khalate and P. B. Sukeshkumar, “Diabetes prediction using machine learning algorithm,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 3, pp. 1963–1969, 2024.
H. A. et al., “The application of unsupervised clustering methods to Alzheimer’s disease,” Front. Comput. Neurosci., vol. 13, p. 31, 2019.
A. Spector, W. Zhu, J. Hossain, and N. Roy, “Simulated forest environment and robot control framework for integration with cover detection algorithms,” in Proc. IEEE/ACM Int. Conf. Big Data Comput., Appl. Technol. (BDCAT), New York, NY, USA, 2022, pp. 277–283.
M. A. Khan, “Real-world applications and research directions for machine learning: Challenges and defies,” Cloud Comput. Data Sci., pp. 2949–2954, 2023.
H. Lee, M.-B. Park, and Y.-J. Won, “AI machine learning–based diabetes prediction in older adults in South Korea: Cross-sectional analysis,” JMIR Formative Research, vol. 9, p. e57874, 2025.
S. Afolabi, N. Ajadi, A. Jimoh, and I. Adenekan, “Predicting diabetes using supervised machine learning algorithms on e-health records,” Informatics in Health, vol. 2, no. 1, pp. 9–16, 2025.
S. K. Bigdeli, M. Ghazisaedi, S. M. Ayyoubzadeh, S. Hantoushzadeh, and M. Ahmadi, “Predicting gestational diabetes mellitus in the first trimester using machine learning algorithms: A cross-sectional study,” BMC Medical Informatics and Decision Making, vol. 25, no. 1, p. 3, 2025.
H. K. et al., “Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from the Fasa adults cohort study,” Endocrinology, Diabetes & Metabolism, vol. 7, no. 2, p. e00472, 2024.
G. K. Teimoory and M. R. Keyvanpour, “An effective feature selection for type II diabetes prediction,” in Proc. 10th Int. Conf. Web Research (ICWR), New York, NY, USA, 2024, pp. 64–69.
C.-Y. Chou, D.-Y. Hsu, and C.-H. Chou, “Predicting the onset of diabetes with machine learning methods,” Journal of Personalized Medicine, vol. 13, no. 3, p. 406, 2023.
I. J. Kakoly, M. R. Hoque, and N. Hasan, “Data-driven diabetes risk factor prediction using machine learning algorithms with feature selection technique,” Sustainability, vol. 15, no. 6, p. 4930, 2023.
S. O. Said, N. L. B. M. Zulkufli, A. B. A. Puzi, and A. Shah, “Performance evaluation metrics comparison between Weka and Google Colab for predicting type 1 and type 2 diabetes using machine learning algorithms on an Egyptian dataset,” in Proc. 10th Int. Conf. Inf. Commun. Technol. Muslim World (ICT4M), Kuala Lumpur, Malaysia, 2025, pp. 1–6.
S. O. Said, N. L. Zulkufli, A. B. Puzi, and A. Shah, “Performance metrics analysis for diabetes prediction using machine learning algorithm,” in Proc. 9th IEEE Int. Conf. Eng. Technol. Appl. Sci. (ICETAS), Kuala Lumpur, Malaysia, 2024, pp. 1–5.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY