DIABETES PREDICTION BASED ON MEDICAL RECORDS (PIMA INDIANS DIABETES DATASET) USING K-NN
DOI:
https://doi.org/10.54314/jssr.v8i3.2981Abstrak
Abstract: The development of predictive technologies, especially artificial intelligence (AI) and machine learning, has opened up great opportunities in the health sector, including early detection of chronic diseases such as diabetes. This study aims to implement the K-Nearest Neighbors (KNN) algorithm in predicting the likelihood of a person having diabetes based on medical record data from the Pima Indians Diabetes Dataset. The dataset consists of 768 samples with eight key health features. The analysis process includes data cleaning, data distribution exploration, and data preparation for the modelling process. The distance between data is calculated using the Euclidean formula, and normalization is performed so that all features have equal weight. The data was then divided into training and test data with a ratio of 80:20. The analysis results showed an unbalanced class distribution, with more non-diabetic patients than those with diabetes. The age group of 21-30 years dominates in the dataset. The implementation of KNN in this study shows that the method is effective for medical classification based on numerical data. This research demonstrates the potential of KNN as a practical and easy-to-implement early diagnosis tool in data-driven health systems.
Â
Keyword: K-Nearest Neighbors, diabetes prediction, machine learning, medical data, classification.
Â
Abstrak: Perkembangan teknologi prediktif, khususnya kecerdasan buatan (AI) dan pembelajaran mesin (machine learning), telah membuka peluang besar dalam bidang kesehatan, termasuk deteksi dini penyakit kronis seperti diabetes. Penelitian ini bertujuan untuk mengimplementasikan algoritma K-Nearest Neighbors (KNN) dalam memprediksi kemungkinan seseorang menderita diabetes berdasarkan data rekam medis dari Pima Indians Diabetes Dataset. Dataset terdiri dari 768 sampel dengan delapan fitur kesehatan utama. Proses analisis meliputi pembersihan data, eksplorasi distribusi data, serta persiapan data untuk proses modeling. Jarak antar data dihitung menggunakan rumus Euclidean, dan dilakukan normalisasi agar seluruh fitur memiliki bobot yang seimbang. Data kemudian dibagi menjadi data latih dan uji dengan rasio 80:20. Hasil analisis menunjukkan distribusi kelas yang tidak seimbang, dengan jumlah pasien non-diabetes lebih banyak dibandingkan yang menderita diabetes. Kelompok usia 21–30 tahun mendominasi dalam dataset. Implementasi KNN dalam studi ini menunjukkan bahwa metode ini efektif digunakan untuk klasifikasi medis berbasis data numerik. Penelitian ini mendemonstrasikan potensi KNN sebagai alat bantu diagnosis awal yang praktis dan mudah diimplementasikan dalam sistem kesehatan berbasis data.
Â
Kata kunci: K-Nearest Neighbors, prediksi diabetes, machine learning, data medis,
                   klasifikasi.
Unduhan
Referensi
Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90–108. https://doi.org/10.1016/J.ACI.2014.10.001
Ali, F., El-Sappagh, S., Islam, S. M. R., Kwak, D., Ali, A., Imran, M., & Kwak, K. S. (2020). A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion, 63, 208–222. https://doi.org/10.1016/J.INFFUS.2020.06.008
Chang, W., Liu, Y., Xiao, Y., Yuan, X., Xu, X., Zhang, S., & Zhou, S. (2019). A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data. Diagnostics 2019, Vol. 9, Page 178, 9(4), 178. https://doi.org/10.3390/DIAGNOSTICS9040178
Chirici, G., Corona, P., Marchetti, M., Mastronardi, A., Maselli, F., Bottai, L., & Travaglini, D. (2012). K-NN FOREST: a software for the non-parametric prediction and mapping of environmental variables by the k-Nearest Neighbors algorithm. European Journal of Remote Sensing, 45(1), 433–442. https://doi.org/10.5721/EUJRS20124536
Durney, C. P., & Donnelly, R. G. (2015). Managing the Effects of Rapid Technological Change on Complex Information Technology Projects. Journal of the Knowledge Economy, 6(4), 641–664. https://doi.org/10.1007/S13132-012-0099-2/METRICS
Eisenhardt, K. M. (2017). Making Fast Strategic Decisions In High-Velocity Environments. Https://Doi.Org/10.5465/256434, 32(3), 543–576. https://doi.org/10.5465/256434
Ghosh, S., Singh, A., Kavita, Jhanjhi, N. Z., Masud, M., & Aljahdali, S. (2022). SVM and KNN Based CNN Architectures for Plant Classification. Computers, Materials & Continua, 71(3), 4257–4274. https://doi.org/10.32604/CMC.2022.023414
Halder, R. K., Uddin, M. N., Uddin, M. A., Aryal, S., & Khraisat, A. (2024). Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications. Journal of Big Data 2024 11:1, 11(1), 1–55. https://doi.org/10.1186/S40537-024-00973-Y
Liu, H., Chen, S., Liu, M., Nie, H., & Lu, H. (2020). Comorbid Chronic Diseases are Strongly Correlated with Disease Severity among COVID-19 Patients: A Systematic Review and Meta-Analysis. Aging and Disease, 11(3), 668. https://doi.org/10.14336/AD.2020.0502
Lubis, A. R., Lubis, M., & Al-Khowarizmi. (2020). Optimization of distance formula in K-Nearest Neighbor method. Bulletin of Electrical Engineering and Informatics, 9(1), 326–338. https://doi.org/10.11591/EEI.V9I1.1464
Lubis, A. R., Lubis, M., & Khowarizmi, A.-. (2020). Optimization of distance formula in K-Nearest Neighbor method. Bulletin of Electrical Engineering and Informatics, 9(1), 326–338. https://doi.org/10.11591/eei.v9i1.1464
Mick, D. G., & Fournier, S. (1998). Paradoxes of Technology: Consumer Cognizance, Emotions, and Coping Strategies. Journal of Consumer Research, 25(2), 123–143. https://doi.org/10.1086/209531
Nearing, M. A., Lane, L. J., Alberts, E. E., & Laflen, J. M. (1990). Prediction Technology for Soil Erosion by Water: Status and Research Needs. Soil Science Society of America Journal, 54(6), 1702–1711. https://doi.org/10.2136/SSSAJ1990.03615995005400060033X
Raza, A., Siddiqui, H. U. R., Munir, K., Almutairi, M., Rustam, F., & Ashraf, I. (2022). Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLOS ONE, 17(11), e0276525. https://doi.org/10.1371/JOURNAL.PONE.0276525
Riyaz, L., Butt, M. A., Zaman, M., & Ayob, O. (2022). Heart Disease Prediction Using Machine Learning Techniques: A Quantitative Review. 81–94. https://doi.org/10.1007/978-981-16-3071-2_8
Ruziq, F. & Wayahdi, M. R. (2022). Sistem Pendukung Keputusan Seleksi Karyawan Baru dengan Simple Additive Weighting pada PT. Technology Laboratories Indonesia. Jurnal Minfo Polgan, 11(2), 153–159. https://doi.org/10.33395/JMP.V11I2.13506
Ruziq, F. & Wayahdi, M. R. (2024). Implementation of SAW Method in Website-Based Application (Case Study: New Employee Recruitment at PT. Technology Laboratories Indonesia). Jurnal Minfo Polgan, 13(1), 1220–1227. https://doi.org/10.33395/JMP.V13I1.13998
Sahoo, P. K., Mohapatra, S. K., & Wu, S. L. (2016). Analyzing Healthcare Big Data with Prediction for Future Health Condition. IEEE Access, 4, 9786–9799. https://doi.org/10.1109/ACCESS.2016.2647619
Sesen, M. B., Kadir, T., Alcantara, R. B., Fox, J., & Brady, M. (2012). Survival Prediction and Treatment Recommendation with Bayesian Techniques in Lung Cancer. AMIA Annual Symposium Proceedings, 2012, 838. https://pmc.ncbi.nlm.nih.gov/articles/PMC3540451/
Tanner, T., & Toivonen, H. (2010). Predicting and preventing student failure – using the k-nearest neighbour method to predict student performance in an online course environment. International Journal of Learning Technology, 5(4), 356. https://doi.org/10.1504/IJLT.2010.038772
Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 1–16. https://doi.org/10.1186/S12911-019-1004-8/FIGURES/12
Umamaheswaran, S. K., Kaur, G., Pankajam, A., Firos, A., Vashistha, P., Tripathi, V., & Mohammed, H. S. (2022). [Retracted] Empirical Analysis for Improving Food Quality Using Artificial Intelligence Technology for Enhancing Healthcare Sector. Journal of Food Quality, 2022(1), 1447326. https://doi.org/10.1155/2022/1447326
Wayahdi, M. R. & Ruziq, F. (2022). KNN and XGBoost Algorithms for Lung Cancer Prediction. Journal of Science Technology (JoSTec), 4(1), 179–186. https://doi.org/10.55299/JOSTEC.V4I1.251
Wayahdi, M. R., & Ruziq, F. (2024). Designing an Used Goods Donation System to Reduce Waste Accumulation Using the WASPAS Method. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(4), 2325–2334. https://doi.org/10.33395/SINKRON.V8I4.14115
Wormanns, D., Fiebich, M., Saidi, M., Diederich, S., & Heindel, W. (2002). Automatic detection of pulmonary nodules at spiral CT: Clinical application of a computer-aided diagnosis system. European Radiology, 12(5), 1052–1057. https://doi.org/10.1007/S003300101126/METRICS
Zhang, S. (2022). Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering, 34(10), 4663–4675. https://doi.org/10.1109/TKDE.2021.304925
Zikos, D., & Delellis, N. (2018). CDSS-RM: A clinical decision support system reference model. BMC Medical Research Methodology, 18(1), 1–14. https://doi.org/10.1186/S12874-018-0587-6/FIGURES/3




