KOMPARASI ALGORITMA K-NEAREST NEIGHBOR DAN NAIVE BAYES UNTUK KLASIFIKASI KELAYAKAN EKSPOR KOPI ARABIKA DENGAN CORRELATION-BASED FEATURE SELECTION

Authors

  • Diva Agustin Purba Universitas Asahan
  • Nazma Aulia Universitas Asahan
  • Pujawati Universitas Asahan
  • Dicky Apdillah Universitas Asahan
  • Bambang Irwansyah Universitas Asahan
  • Harmayani Universitas Asahan

DOI:

https://doi.org/10.54314/jssr.v9i3.6589

Keywords:

Arabica Coffee, Export Classification, K-Nearest Neighbor, Naive Bayes, Correlation-based Feature Selection

Abstract

Abstract: Arabica coffee is a high-value export commodity for the Indonesian economy. To maintain global competitiveness, coffee beans must meet export feasibility quality standards based on the cupping score from the Coffee Quality Institute (CQI). This study aims to compare the performance of the K-Nearest Neighbor (KNN) and Gaussian Naive Bayes algorithms in classifying the export feasibility of Arabica coffee beans. In sensory quality testing, high-dimensional attributes (10 parameters) can cause accuracy instability and increase computational load. Therefore, Correlation-based Feature Selection (CFS) was applied to reduce redundant features. The CFS process filtered the initial 10 features into 4 selected features (Flavor, Acidity, Cupper Points, and Aroma). The Dataset was obtained from the Kaggle Coffee Quality Database and had undergone Excel format adjustments before being loaded into the system. Modeling was conducted using 1,303 data records with a 30% split for testing data. The evaluation results showed that the Naive Bayes algorithm provided the best performance, achieving an accuracy rate of 97.95%. The use of CFS proved successful in reducing feature dimensions by 60% without significantly decreasing classification accuracy.

Keywords: Arabica Coffee; Export Classification; K-Nearest Neighbor; Naive Bayes; Correlation-based Feature Selection.

 

Abstrak: Kopi Arabika merupakan komoditas ekspor bernilai tinggi bagi perekonomian Indonesia. Untuk menjaga daya saing di tingkat global, biji kopi harus memenuhi standar kualitas kelayakan ekspor berdasarkan cupping score dari Coffee Quality Institute (CQI). Penelitian ini bertujuan membandingkan kinerja algoritma K-Nearest Neighbor (KNN) dan Gaussian Naive Bayes dalam klasifikasi kelayakan ekspor biji kopi Arabika. Dalam pengujian sensoris mutu, dimensionalitas atribut yang tinggi (10 parameter) dapat menyebabkan ketidakstabilan akurasi dan meningkatkan beban komputasi. Oleh karena itu, Correlation-based Feature Selection (CFS) diterapkan untuk mereduksi fitur redundan. Proses CFS menyaring 10 fitur awal menjadi 4 fitur terpilih (Flavor, Acidity, Cupper Points, Aroma). Dataset diambil dari Kaggle Coffee Quality Database dan telah melalui tahapan penyesuaian format Excel sebelum dimuat ke sistem. Pemodelan dilakukan menggunakan 1303 rekam data dengan pembagian data uji sebesar 30%. Hasil evaluasi menunjukkan bahwa algoritma Naive Bayes memberikan performa terbaik dengan tingkat akurasi mencapai 97,95%. Penggunaan CFS terbukti berhasil memangkas dimensi fitur sebesar 60% tanpa menurunkan akurasi klasifikasi secara signifikan.

Kata Kunci: Kopi Arabika; Klasifikasi Ekspor; K-Nearest Neighbor; Naive Bayes; Correlation-based Feature Selection.

Downloads

Download data is not yet available.

References

Aha, D. W., Kibler, D., Albert, M. K., &

Quinian, J. R. (1991). Instance-Based Learning Algorithms. Machine Learning, 6(1), 37–77.

Coffee Quality database from CQI. (n.d.). Retrieved June 14, 2026, from https://www.kaggle.com/datasets/volpatto/coffee-quality-database-from-cqi

Coffee Quality Institute provides coffee education throughout the coffee value chain. (n.d.). Retrieved June 14, 2026, from https://www.coffeeinstitute.org/

Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. University of Waikato.

Hasibuan, W. R., Sari, I. P., & Basri, M. (2025). Klasifikasi Kerusakan (Cacat) pada Biji Kopi Arabika Menggunakan Algoritma KNN (K-Nearest Neighbor). Blend Sains Jurnal Teknik, 3(4), 452–459. https://doi.org/10.56211/blendsains.v3i4.781

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R Second Edition. Springer.

Maulana Iksan, A., Hariyanto, R., & Aris Widodo, A. (2020). Klasifikasi kelayakan telur ayam ras (broiler) menggunakan metode Naïve Bayes Classifier. Jurnal Terapan Sains & Teknologi, 2(3), 10–18.

Murphy, K. P. (2022). Probabilistic Machine Learning An Introduction. The MIT Press.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. http://scikit-learn.sourceforge.net.

Ridwan, M., & Rakhmawati, F. (2023). Analysis of the selection of the best arabica coffee beans using apriori algorithms. Mathline : Jurnal Matematika Dan Pendidikan Matematika, 8(2), 739–752. https://doi.org/10.31943/mathline.v8i

Sebatubun, M. M., & Pujiarini, E. H. (2018). Pengenalan varietas kopi arabika berdasarkan fitur bentuk. Jurnal Informatika Dan Komputer, 3(2).

Suyanto. (2019). Data Mining untuk Klasifikasi dan Klasterisasi Data. In Penerbit Informatika. Penerbit

Informatika.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2017). Data Mining Practical Machine Learning Tools and Techniques Fourth Edition. Elsevier. https://www.elsevier.com

Zhang, H. (2004). The Optimality of Naive Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, 3–8. www.aaai.org

Downloads

Published

2026-06-23

Issue

Section

Artikel

Most read articles by the same author(s)

1 2 > >>