Abstract
The discovery of knowledge from medical database using machine learning approach is always beneficial as well as challenging task for diagnosis. Diabetes if left undiagnosed can affect many other organs (e.g., kidney and liver) of human body and this particular disease is very common in all ages young to adult. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, improvement of performance measures towards accuracy of identification of diabetes with a certain degree of confidence is a challenging task. Ensemble learning approach of classification of diabetes is one of such techniques in the parlour of machine learning classifier algorithms that provide a research gap for predicting the diabetes. This work presents classification algorithms for the prediction of diabetes based on two conventional machine learning classifiers (Naïve Bayes classifier model and decision tree) and four ensemble classifiers (Random Forest (RF), Bagging, AdaBoosting and Gradient Boosting). Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. This research underscores the significance of ensemble learning in diabetes prediction, comparing its efficiency with traditional classifiers. The study enhances accuracy assessment and identifies key features crucial for diabetes identification. These findings contribute valuable insights, paving the way for advancements in machine learning applications for healthcare diagnostics.






Similar content being viewed by others
References
Rani S, Kautish S. Association clustering and time series based data mining in continuous data for diabetes prediction. In: Second international conference on intelligent computing and control systems (ICICCS); 2018.
“IDF DIABETES ATLAS—8th Edition”, International Diabetes Federation; 2017. https://diabetesatlas.org/. Accessed: 15 Dec 2018.
Sisodia D, Sisodia DS. Prediction of diabetes using classification algorithms. Proc Comput Sci. 2018;132:1578–85.
Shailaja K, Seetharamulu B, Jabba MA. Machine learning in healthcare: a review. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp. 910–914. IEEE; 2018.
Sarwar, M.A., Kamal, N., Hamid, W., and Shah, M.A., Prediction of diabetes using machine learning algorithms in healthcare. In: 2018 24th International conference on automation and computing (ICAC), pp. 1–6. IEEE; 2018.
Orabi KM, Kamal YM, Rabah TM. Early predictive system for diabetes mellitus disease. In: Industrial conference on data mining, pp. 420–427. Springer; 2016.
GLOBAL REPORT ON DIABETES WHO LIBRARY: Cataloguing-in-Publication Data Global report on diabetes; 2016.
Nai-arun N, Moungmai R. Comparison of classifiers for the risk of diabetes prediction. Procedia Comput Sci. 2015;69:132–42.
Bamnote GR, Pradhan M. Design of classifier for detection of diabetes mellitus using genetic programming. Adv Intell Syst Comput. 2014;1:763–70. https://doi.org/10.1007/978-3-319-11933-5.
Bansal R, Kumar S, Mahajan A. Diagnosis of diabetes mellitus using PSO and KNN classifier. In: 2017 International conference on computing and communication technologies for smart nation (IC3TSN), pp. 32–38; 2017.
Saxena K, Khan Z, Singh S. Diagnosis of diabetes mellitus using k nearest neighbor algorithm. Int J Comput Sci Trends Technol (IJCST). 2014;2(4):1–8.
Dagliati A, Marini S, Sacchi L, Cogni G, Teliti M, Tibollo V, De Cata P, Chiovato L, Bellazzi R. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12(2):295–302.
Bhattacharya M, Datta D. Performance evaluation of predictive machine learning models for diabetic disease using Python. In: 2022 IEEE 3rd Global conference for advancement in technology (GCAT), ISBN: 978-1-6654-6855-8; 2022.
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017;15:104–16.
Prema NS, Pushpalatha MP. Prediction of gestational diabetes mellitus (GDM) using classification. In: 2017 IEEE international conference on science, technology, engineering and management (ICSTEM), Coimbatore; 2017.
Iyer A, Jeyalatha S, Sumbaly R. Diagnosis of diabetes using classification mining techniques. Int J Data Min Knowl Manage Process. 2015;5:1–14. https://doi.org/10.5121/ijdkp.2015.5101.
“PIMA Indian Diabetes Dataset, An open dataset”, UCI Machine Learning Repository. http://ftp.ics.uci.edu/pub/machine-learnigdatabases/pima-indians-diabetes/. Accessed 13 Oct 2022
Breiman L. Bagging predictors. Mach Learn. 1996;24:123–40.
Louppe G. Understanding random forests: from theory to practice. PhD Thesis, U. of Liege; 2014.
Salman R, Alzaatreh A, Sulieman H, Fisal S. A bootstrap framework for aggregating within and between feature selection methods. Entropy (Basel, Switzerland). 2021;23(2):200. https://doi.org/10.3390/e23020200.
Ayinala M, Parhi KK. Low complexity algorithm for seizure prediction using Adaboost. In: Conf Proc IEEE Eng Med Biol Soc., pp. 1061–1064; 2012.
Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78.
Acknowledgements
There is no funding involved in this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Both the authors declare that he/she has no conflict of interest.
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhattacharya, M., Datta, D. Intelligent Models for Diabetic Prediction Using Conventional Machine Learning Techniques and Ensemble Learning Algorithms. SN COMPUT. SCI. 6, 29 (2025). https://doi.org/10.1007/s42979-024-03479-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s42979-024-03479-9
