Early Diabetes Detection Using Machine Learning Models: A Case Study from Indonesian Clinical Data
DOI:
https://doi.org/10.37034/medinftech.v4i1.80Keywords:
Artificial Neural Network, Diabetes Mellitus, Early Detection, Indonesian Clinical Data, Machine LearningAbstract
Diabetes is a major health problem that can significantly reduce life expectancy and increase the risk of serious complications such as kidney failure, stroke, and cardiovascular disease. Therefore, early detection is essential to prevent the progression of the disease. This study proposes a machine learning-based approach for early diabetes detection using a private dataset obtained from RSUP Persahabatan General Hospital in Jakarta, Indonesia. The dataset consists of 501 patient records with clinical and laboratory features extracted from the hospital’s electronic medical record system. Several machine learning algorithms were implemented and compared, including Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, Naïve Bayes, Extreme Gradient Boosting, Ensemble methods, and Artificial Neural Networks. Feature selection was performed using ANOVA, and hyperparameter optimization was applied using GridSearchCV to improve model performance. The experimental results show that the Artificial Neural Network model achieved the best performance with an accuracy of 0.86 (86%). Statistical analysis using logistic regression identified systolic blood pressure, diastolic blood pressure, age, HDL cholesterol, and leukocyte levels as the most significant risk factors associated with diabetes. These findings demonstrate the potential of machine learning techniques to support early diabetes detection using clinical data from Indonesian healthcare settings.
Downloads
References
M. Wahidin, A. M. Latelay, and M. Nitami, “DANA ALOKASI KHUSUS DAN CAPAIAN STANDAR PELAYANAN MINIMAL DIABETES MELITUS DI INDONESIA,” Indones. J. Nurs. Heal. Sci., vol. 8, no. 1, pp. 84–90, 2023, doi: 10.47007/ijnhs.v8i1.6362.
E. Afsaneh, A. Sharifdini, H. Ghazzaghi, and M. Z. Ghobadi, “Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review,” Diabetol. Metab. Syndr., vol. 14, no. 1, p. 196, 2022, doi: 10.1186/s13098-022-00969-9.
A. S. Chauhan, M. S. Varre, K. Izuora, M. B. Trabia, and J. S. Dufek, “Prediction of Diabetes Mellitus Progression Using Supervised Machine Learning,” Sensors, vol. 23, no. 10, 2023, doi: 10.3390/s23104658.
B. F. Wee, S. Sivakumar, K. H. Lim, W. K. Wong, and F. H. Juwono, “Diabetes detection based on machine learning and deep learning approaches,” Multimed. Tools Appl., vol. 83, no. 8, pp. 24153–24185, 2024, doi: 10.1007/s11042-023-16407-5.
J. J. Sonia, P. Jayachandran, A. Q. Md, S. Mohan, A. K. Sivaraman, and K. F. Tee, “Machine-Learning-Based Diabetes Mellitus Risk Prediction Using Multi-Layer Neural Network No-Prop Algorithm,” Diagnostics, vol. 13, no. 4, 2023, doi: 10.3390/diagnostics13040723.
A. Dutta, T. Batabyal, M. Basu, and S. T. Acton, “An efficient convolutional neural network for coronary heart disease prediction,” Expert Syst. Appl., vol. 159, p. 113408, 2020, doi: https://doi.org/10.1016/j.eswa.2020.113408.
R. R. Achmad and H. Muhammad, “Hyperparameter Tuning Deep Learning for Imbalanced Data,” TEPIAN, vol. 4, no. 2, pp. 90–101, 2023, doi: 10.51967/tepian.v4i2.2216.
B. Sugara and A. Subekti, “PENERAPAN SUPPORT VECTOR MACHINE (SVM) PADA SMALL DATASET UNTUK DETEKSI DINI GANGGUAN AUTISME,” J. Pilar Nusa Mandiri, vol. 15, no. 2, pp. 177–182, 2019, doi: 10.33480/pilar.v15i2.649.
C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification,” Remote Sens., vol. 11, no. 2, 2019, doi: 10.3390/rs11020185.
K. A. Hasan and M. A. M. Hasan, “Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance,” in 2020 23rd International Conference on Computer and Information Technology (ICCIT), 2020, pp. 1–6. doi: 10.1109/ICCIT51783.2020.9392694.
M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risks from highly imbalanced data using random forest,” BMC Med. Inform. Decis. Mak., vol. 11, no. 1, p. 51, 2011, doi: 10.1186/1472-6947-11-51.
A. Jimeno-Yepes, “Hyperplane bounds for neural feature mappings,” CoRR, vol. abs/2201.05799, 2022, [Online]. Available: https://arxiv.org/abs/2201.05799
M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Heal. Inf. Sci. Syst., vol. 8, no. 1, p. 7, 2020, doi: 10.1007/s13755-019-0095-z.
H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian J. Mach. Learn., pp. 69–79, 2024, doi: 10.58496/BJML/2024/007.
M. Li, X. Fu, and D. Li, “Diabetes Prediction Based on XGBoost Algorithm,” IOP Conf. Ser. Mater. Sci. Eng., vol. 768, no. 7, p. 72093, Mar. 2020, doi: 10.1088/1757-899X/768/7/072093.
K. L. Priya, M. S. Charan Reddy Kypa, M. M. Sudhan Reddy, and G.s R. Mohan Reddy, “Retracted: A Novel Approach to Predict Diabetes by Using Naive Bayes Classifier,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), 2020, pp. 603–607. doi: 10.1109/ICOEI48184.2020.9142959.
R. Rachman and S. Moritamil, “SISTEM PAKAR DETEKSI PENYAKIT REFRAKSI MATA DENGAN METODE TEOREMA BAYES BERBASIS WEB,” J. Inform., vol. 7, no. 1, pp. 68–76, 2020, [Online]. Available: https://ojs.bsi.ac.id/index.php/ji/article/view/7267/pdf
N. S. El_Jerjawi and S. S. Abu-Naser, “Diabetes Prediction Using Artificial Neural Network,” J. Adv. Sci., vol. 121, pp. 55–64, 2018, doi: 10.14257/ijast.2018.121.05.
K. Oliullah, M. H. Rasel, M. M. Islam, M. R. Islam, M. A. H. Wadud, and M. Whaiduzzaman, “A stacked ensemble machine learning approach for the prediction of diabetes,” J. Diabetes Metab. Disord., vol. 23, no. 1, pp. 603–617, 2024, doi: 10.1007/s40200-023-01321-2.
T. Manimegalai, J. Manju, M. M. Rubiston, B. Vidhyashree, and R. T. Prabu, “Prediction of OPTIMIZED Stock Market Trends using Hybrid Approach Based on KNN and Bagging Classifier (KNNB),” in 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), 2022, pp. 257–262. doi: 10.1109/CSNT54456.2022.9787638.
A. Smiley, D. King, J. Harezlak, P. Dinh, and A. Bidulescu, “The association between sleep duration and lipid profiles: the NHANES 2013–2014,” J. Diabetes Metab. Disord., vol. 18, no. 2, pp. 315–322, 2019, doi: 10.1007/s40200-019-00415-0.
M. Azhar and H. F. Pardede, “Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest,” J. Media Inform. Budidarma, vol. 5, no. 2, pp. 439–446, 2021, doi: 10.30865/mib.v5i2.2754.
S. Shekhar, A. Bansode, and A. Salim, “A Comparative study of Hyper-Parameter Optimization Tools,” in 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2021, pp. 1–6. doi: 10.1109/CSDE53843.2021.9718485.
K. A. Hasan and M. A. M. Hasan, “Classification of Parkinson’s Disease by Analyzing Multiple Vocal Features Sets,” in 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 758–761. doi: 10.1109/TENSYMP50017.2020.9230842.







