Enhanced Diabetic Prediction Using Fuzzy C-Means Preprocessing and Random Forest Ensemble Learning
DOI:
https://doi.org/10.21015/vtse.v11i4.1657Abstract
Diabetes claims the lives of thousands each year, and many individuals remain oblivious to their condition until it reaches a critical stage. This study presents a data mining-based approach aimed at enhancing the early detection and prediction of diabetes, utilizing data from the Pima Indian Diabetes dataset. Despite the adaptability of fuzzy C-Means for various data types, the ultimate outcome of the clustering process hinges on the initial placement of cluster centers. Additionally, precision in data clustering is crucial; it can furnish either extensive, well-grouped data for the random forest or limited data, constraining its efficacy. Our principal objective was to enhance the accuracy of fuzzy C-means clustering and the random forest. To boost the model's performance, we incorporated PCA, fuzzy c-means, and the Random Forest approach. Various algorithmic combinations were employed, and the results unequivocally demonstrate that our model surpasses the original outcomes of the Pima Indian Diabetes Dataset in terms of accuracy. The diabetic prediction model achieved a remarkable accuracy of 97.40\% through the utilization of PCA, logistic regression, and K-Means. However, when employing PCA in conjunction with fuzzy C-means and random forests, an even higher accuracy of 98.96\% was attained. Empirical evidence confirms that the implementation of PCA significantly enhanced the accuracy of both the fuzzy C-means clustering approach and the random forest classifier, deviating from previous findings. To improve the model's performance, we used PCA, fuzzy c-means, and the Random Forest approach. Different algorithm combinations were used, and the results clearly show that our model outperforms the original Pima Indian Diabetes Dataset outcomes in terms of accuracy. The diabetic prediction model was improved to 97.40% accuracy using PCA, logistic regression, and K -Means. Using PCA with fuzzy C-means and random forests, however, we achieved an accuracy of 98.96%. Based on empirical evidence, it has been demonstrated that the implementation of PCA improved the accuracy of the fuzzy C-means clustering approach and the random forest classifier. These findings differ from previous findings.
References
bibitem{1} A. Iyer, S. Jeyalatha, and R. Sumbaly, "Diagnosis of diabetes using classification mining techniques," International Journal of Data Mining and Knowledge Management Process (IJDKP), vol. 5, no. 1, 2015. DOI: https://doi.org/10.5121/ijdkp.2015.5101
bibitem{2} T. Jhaldiyal and P. K. Mishra, "Analysis and prediction of diabetes mellitus using PCA, REP and SVM," International Journal of Engineering and Technology Research (IJETR), vol. 2, issue 8, ISSN: 2321-0869, 2014.
bibitem{3} W. Han, S. Y. Shengqi, H. Zhangqin, J. He, and X. Wang, "Type 2 diabetes mellitus prediction model based on data mining," Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018. DOI: https://doi.org/10.1016/j.imu.2017.12.006
bibitem{4} G. K. Asha, V. Punya, M. A. Jayaram, and A. S. Manjunath, "Rule-based classification for diabetic patients using cascaded K-means and decision tree C4.5," International Journal of Computer Applications, vol. 45, no. 12, ISSN: 0975 – 8887, 2012.
bibitem{5} B. M. Patil, R. C. Joshi, and D. Toshniwal, "Hybrid prediction model for Type-2 diabetic patients," Expert Systems with Applications, vol. 37, pp. 8102–8108, 2010. DOI: https://doi.org/10.1016/j.eswa.2010.05.078
bibitem{6} A. Khandegar and K. Pawar, "Diagnosis of Diabetes Mellitus Using PCA, Neural Network and Cultural Algorithm," International Journal of Digital Application & Contemporary Research, vol. 6, ISSN: 2319-4863, 2017.
bibitem{7} M. Rakesh, P. Viral, K. Balbindra, and A. R. Verma, "Diabetes mellitus forecast using different data mining techniques," Proceedings of the IEEE 4th International Conference on Computer and Communication Technology (ICCCT), pp. 99–103, IEEE, 2013.
bibitem{8} A. Khandegar, "Diagnosis of diabetes mellitus using PCA, neural Network and cultural algorithm," International Journal of Digital Application & Contemporary Research, vol. 5, no. 6, 2017.
bibitem{9} A. Kumari and R. Chitra, "Classification of Diabetes Disease Using Support Vector Machine," International Journal of Engineering Research and Applications (IJERA), March-April, pp. 1797-1801, ISSN: 2248-9622, 2013.
bibitem{10} S. Sanakal and S. T. Jayakumari, "Prognosis of diabetes using data mining approach - Fuzzy C means clustering and support vector machine," International Journal of Computer Trends and Technology (IJCTT), vol. 11, no. 2, 2014. DOI: https://doi.org/10.14445/22312803/IJCTT-V11P120
bibitem{11} N. Yilmaz, O. Inan, and M. S. Uzer, "A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases," Journal of Medical Systems, vol. 38, no. 5, 2014. DOI: https://doi.org/10.1007/s10916-014-0048-7
bibitem{12} C. Zhu, C. U. Idemudiaa, and W. Fengb, "Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques," Journal of Medical Imaging, ISSN: 2352-9148, 2019.
bibitem{13} Diabetes Daily, [Online]. Available: https://www.diabetesdaily.com/learn-about-diabetes/what-is-diabetes/how-many-people-have-diabetes/.
bibitem{14} Medium - Understanding Principal Component Analysis (PCA) Step by Step, [Online]. Available: https://medium.com/analytics-vidhya/understanding-principle-component-analysis-pca-step-by-step-e7a4bb4031d9.
bibitem{15} Data Clustering Algorithms - Fuzzy C-Means Clustering Algorithm, [Online]. Available: Data Clustering Algorithms
bibitem{16} Section.io - Introduction to Random Forest in Machine Learning, [Online]. Available: https://www.section.io/engineering-education/ introduction-to-random-forest-in-machine-learning/.
bibitem{17} Tutorials Point - Machine Learning with Python: Random Forest Classification Algorithms, [Online]. Available: https://www.tutorialspoint.com
bibitem{18} Tutorials Point - Machine Learning with Python: Random Forest Algorithm Image, [Online]. Available: https://www.tutorialspoint.com/machine learning with python.
bibitem{19} R. Huss, J. Raffler, and B. Märkl, "Artificial intelligence and digital biomarker in precision pathology guiding immune therapy selection and precision oncology," Cancer Reports, e1796, 2023.
bibitem{20} S. Karim, A. Qadir, U. Farooq, M. Shakir, and A. Laghari, "Hyperspectral imaging: a review and trends towards medical imaging," Current Medical Imaging, vol. 19, no. 5, pp. 417-427, 2023.
bibitem{21} A. V. Singh, V. Chandrasekar, N. Paudel, P. Laux, A. Luch, D. Gemmati, V. Tissato, K. S. Prabhu, S. Uddin, and S. P. Dakua, "Integrative toxicogenomics: Advancing precision medicine and toxicology through artificial intelligence and OMICs technology," Biomedicine & Pharmacotherapy, vol. 163, 114784, 2023.
bibitem{22} B. Ndzendze and T. Marwala, "Artificial Intelligence and International Relations," Springer Nature Singapore, pp. 33-54, 2023.
bibitem{23} A. A. Khan, A. A. Laghari, and S. A. Awan, "Machine learning in computer vision: a review," EAI Endorsed Transactions on Scalable Information Systems, vol. 8, no. 32, e4-e4, 2021.
bibitem{24} K. Ali, Z. A. Shaikh, A. A. Khan, and A. A. Laghari, "Multiclass skin cancer classification using EfficientNets–a first step towards preventing skin cancer," Neuroscience Informatics, vol. 2, no. 4, 100034, 2022.
bibitem{25} D. K. K. Reddy, H. S. Behera, J. Nayak, A. R. Routray, P. S. Kumar, and U. Ghosh, "A Fog-Based Intelligent Secured IoMT Framework for Early Diabetes Prediction," in Intelligent Internet of Things for Healthcare and Industry, Springer International Publishing, pp. 199-218, 2022.
bibitem{26} P. M. Lozano, M. Lane‐Fall, P. D. Franklin, R. L. Rothman, R. Gonzales, M. K. Ong, M. K. Gould, et al., "Training the next generation of learning health system scientists," Learning Health Systems, vol. 6, no. 4, e10342, 2022.
bibitem{27} A. A. Laghari and S. Yin, "How to Collect and Interpret Medical Pictures Captured in Highly Challenging Environments that Range from Nanoscale to Hyperspectral Imaging," Current Medical Imaging, 2022.
bibitem{28} R. Chauhan, A. Goel, H. Kaur, and B. Alankar, "Machine Learning: An Analytical Approach for Pattern Detection in Diabetes," in Soft Computing: Theories and Applications: Proceedings of SoCTA 2022, Springer Nature Singapore, pp. 135-145, 2022.
bibitem{29} M. E. Febrian, F. X. Ferdinan, G. P. Sendani, K. M. Suryanigrum, and R. Yunanda, "Diabetes prediction using supervised machine learning," Procedia Computer Science, vol. 216, pp. 21-30, 2023, DOI: 10.1016/j.procs.2022.12.107.
bibitem{30} M. E. Febrian, F. X. Ferdinan, G. P. Sendani, K. M. Suryanigrum, and R. Yunanda, "Diabetes prediction using supervised machine learning," Procedia Computer Science, vol. 7, issue 4, pp. 432-439, December 2021, DOI: 10.1016/j.procs.2022.12.107.
bibitem{31} R. Krishnamoorthi, S. Joshi, H. Z. Almarzouki, P. K. Shukla, A. Rizwan, C. Kalpana, B. Tiwari, "A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques," J. Healthc. Eng., vol. 2022, Art. no. 1684017, 2022. [Online]. Available: https://doi.org/10.1155/2022/1684017
bibitem{32} U. Ahmed et al., "Prediction of Diabetes Empowered With Fused Machine Learning," IEEE Access, vol. 10, pp. 8529-8538, 2022, doi: 10.1109/ACCESS.2022.3142097.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY