Classification of Obesity Using The Naïve Bayes Method and K-Nearest Neighbor

Ilham Asy Ari, Muhammad Amin, Andi Saputra, Irianto Irianto, Nuriadi Manurung

Abstract


Abstract: Obesity is a major health problem that significantly impacts quality of life and can trigger various chronic diseases. Early detection of obesity levels is crucial for public health management, but traditional methods such as BMI often have limitations. Solution: This study proposes a data mining-based approach using feature engineering techniques to improve the accuracy of obesity classification. The purpose of this study is to classify obesity levels and compare the performance of Naïve Bayes and K-Nearest Neighbor (KNN) methods. This research method includes preprocessing stages, feature extraction using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), feature selection using CFS and Chi-Square, classification with Naïve Bayes and KNN, and model evaluation using accuracy and confusion matrix on 2,111 data sets from Kaggle. The results of this study show that on the original data without LDA, KNN achieves a higher accuracy (88.41%) than Naïve Bayes (63.82%). However, after using LDA, the accuracy of Naïve Bayes increased sharply to 93.61%, surpassing KNN's 92.19%. The study concluded that KNN was more effective on raw data, while Naïve Bayes was more optimal when combined with LDA-based dimensionality reduction.

 

Keyword:  classification_ naïve_bayes; data mining; k-nearest neighbor; obesity; LDA; PCA.

 

 

Abstract: Obesitas merupakan salah satu masalah kesehatan utama yang berdampak signifikan pada kualitas hidup dan dapat memicu berbagai penyakit kronis. Deteksi dini tingkat obesitas sangat krusial untuk manajemen kesehatan masyarakat, namun metode tradisional seperti BMI seringkali memiliki keterbatasan. Solusi: Penelitian ini mengusulkan pendekatan berbasis penambangan data (data mining) menggunakan teknik rekayasa fitur untuk meningkatkan akurasi klasifikasi tingkat obesitas. Tujuan penelitian ini adalah untuk mengklasifikasikan tingkat obesitas dan membandingkan kinerja metode Naïve Bayes dan K-Nearest Neighbor (KNN). Metode penelitian ini mencakup tahap prapemrosesan, ekstraksi fitur menggunakan Principal Component Analysis (PCA) dan Linear Discriminant Analysis (LDA), seleksi fitur menggunakan CFS dan Chi-Square, klasifikasi dengan Naïve Bayes dan KNN, serta evaluasi model menggunakan accuracy dan confusion matrix pada 2.111 data dari Kaggle. Hasil penelitian ini menunjukkan bahwa pada data asli tanpa LDA, KNN mencapai akurasi lebih tinggi (88,41%) dibandingkan Naïve Bayes (63,82%). Namun, setelah penggunaan LDA, akurasi Naïve Bayes meningkat tajam menjadi 93,61%, melampaui KNN yang mencapai 92,19%. Kesimpulan dari penelitian ini adalah KNN lebih efektif pada data mentah, sedangkan Naïve Bayes menjadi lebih optimal ketika dikombinasikan dengan reduksi dimensi berbasis LDA.

 

Keywords: klasifikasi naïve bayes; k-nearest neighbor; obesitas; PCA; penambangan data; LDA

Full Text:

PDF

References


Anisa, D. N., & Jumanto. (2022). Classification of diabetes disease using the Naive Bayes algorithm. Jurnal Dinamika Informatika, 14(1), 33–42.

Aprilita, W. Z., Akbar, R., & Prayogi, R. C. (2023). Comparison of K-Nearest Neighbor (KNN) and Naive Bayes Algorithms in the Classification of Parkinson's Disease. SENTIMAS: Seminar Nasional Penelitian dan Pengabdian Masyarakat, 188–193.

Argina, A. M. (2020). Application of the K-Nearest Neighbor classification method on a dataset of diabetes patients. Indonesian Journal of Data Science, 1(2), 29–33.

Atmaja, D. M. U. (2019). Application of the K-Nearest Neighbor algorithm. Jurnal Ilmiah, 1, 199–208.

Ikhromr, F. N., Sugiyarto, I., Faddillah, U., & Sudarsono, B. (2023). Implementation of data mining to predict diabetes using Naive Bayes and K-Nearest Neighbor algorithms. INTECOMS: Journal of Information Technology and Computer Science, 6(1).

Ling, J., Kencana, I. P. E. N., & Oka, T. B. (2014). Sentiment analysis using the Naive Bayes classifier method with Chi-Square feature selection. E-Jurnal Matematika, 3(3), 92.

Medea, M. J., Rantung, V. P., & Kembuan, O. (2024). Latent Dirichlet Allocation method in topic modeling of online news headlines about law and crime. JOINTER: Journal of Informatics Engineering, 5(2), 1–7.

Pieters, L. S. (2025). IoT-based pH monitoring of skincare products: A solution for consumer safety. Edumatic: Journal of Education Informatics, 9(1), 236–245.

Ramdan, H., Gunawan, A., & Gunawan, G. (2024). Analysis of cardiovascular effects in Covid-19 cases on obesity using the K-Medoid method. Indonesian Journal of Computer Science, 3(1), 16–24.

Rinanda, P. D., Delvika, B., Nurhidayarnis, S., Abror, N., & Hidayat, A. (2022). Comparison of classification between Naive Bayes and K-Nearest Neighbor on the risk of diabetes in pregnant women. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 2(2), 68–75.

Sholekhah, F., Putri, A. D., Rahmaddeni, R., & Efrizoni, L. (2024). Comparison of Naive Bayes and K-Nearest Neighbors algorithms for metabolic syndrome classification. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(2), 507–514.

Sibi, S. Y., & Widiarti, A. R. (2022). Classification of obesity levels using the KNN algorithm. Seminar Nasional Corisindo, 7(2).

Widiastuti, N. I., Rainarli, E., & Dewi, K. E. (2017). Summarization and support vector machine in document classification. Jurnal Infotel, 9(4), 416.




DOI: https://doi.org/10.54314/teknisi.v6i1.5529

Article Metrics

Abstract view : 46 times
PDF - 4 times

Refbacks

  • There are currently no refbacks.