ANALISIS PENANGANAN DATA TIDAK SEIMBANG TERHADAP KINERJA KLASIFIKASI SENTIMEN MULTIKELAS PADA ULASAN MARKETPLACE TOKOPEDIA

Authors

  • Nauval Alfarizi Universitas Panca Budi
  • Satria Sinurat Universitas Panca Budi
  • Adi Putra Universitas Panca Budi
  • Muhammad Amin Universitas Panca Budi
  • Prima Lydia Universitas Panca Budi

DOI:

https://doi.org/10.54314/jssr.v9i1.5804

Abstract

Abstract: The development of digital marketplaces has led to an increasing number of user reviews, which can be used to understand consumer perceptions of products and services. However, sentiment analysis in marketplace reviews faces a major challenge: class imbalance, where positive sentiment often dominates to an extreme. This study aims to analyze the effects of various imbalanced data-handling techniques on the performance of machine-learning-based multiclass sentiment classification in Tokopedia marketplace reviews. The dataset used consists of 56,981 reviews with three sentiment classes, with more than 97% of them being positive. Feature extraction was performed using the TF-IDF method, resulting in 17,765 features. The handling of data imbalance was tested through four scenarios: class weighting, Random Oversampling, SMOTE, and ADASYN, with the Naive Bayes, Logistic Regression, and Random Forest algorithms. The experimental results show that Random Forest with SMOTE achieves the highest accuracy of 0.9749 but has limitations in recognizing minority classes, with a recall of 0.3786. In contrast, Logistic Regression with Random Oversampling provides the most balanced performance with the highest F1-score (macro) value of 0.4992 and recall of 0.5866. Keywords: Analysis, Sentiment, Imbalanced Data, Multi-Class Classification F1-Score Abstrak: Perkembangan marketplace digital menyebabkan meningkatnya jumlah ulasan pengguna yang dapat dimanfaatkan untuk memahami persepsi konsumen terhadap produk dan layanan. Namun, analisis sentimen pada ulasan marketplace menghadapi tantangan utama berupa ketidakseimbangan distribusi kelas, di mana sentimen positif sering kali mendominasi secara ekstrem. Penelitian ini bertujuan untuk menganalisis pengaruh berbagai teknik penanganan data tidak seimbang terhadap kinerja klasifikasi sentimen multikelas pada ulasan marketplace Tokopedia berbasis machine learning. Dataset yang digunakan terdiri dari 56.981 ulasan dengan tiga kelas sentiment, di mana proporsi sentimen positif mencapai lebih dari 97%. Ekstraksi fitur dilakukan menggunakan metode TF-IDF yang menghasilkan 17.765 fitur. Penanganan ketidakseimbangan data diuji melalui empat skenario, yaitu class weighting, Random Oversampling, SMOTE, dan ADASYN, dengan algoritma Naive Bayes, Logistic Regression, dan Random Forest. Hasil eksperimen menunjukkan bahwa Random Forest dengan SMOTE menghasilkan akurasi tertinggi sebesar 0,9749, namun memiliki keterbatasan dalam mengenali kelas minoritas dengan nilai recall 0,3786. Sebaliknya, Logistic Regression dengan Random Oversampling memberikan performa paling seimbang dengan nilai F1-score (macro) tertinggi sebesar 0,4992 dan recall 0,5866. Kata kunci: Analisis, Sentimen, Data Tidak Seimbang, Klasifikasi Multi Kelas F1-Score

Downloads

Download data is not yet available.

References

U. F. Tanamal, “Analisis Kepuasan Pengguna Aplikasi Livin by Mandiri Menggunakan Metode E-Servqual dan Importance Performance Analysis (IPA),†J. Pendidik. dan Teknol. Indones., vol. 5, no. 2, pp. 547–568, 2025, doi: 10.52436/1.jpti.664.

G. D. P. Dewi and A. E. Lusikooy, “E-commerce Transformation in Indonesia,†Nation State J. Int. Stud., vol. 6, no. 2, pp. 117–138, 2024, doi: 10.24076/nsjis.v6i2.1304.

A. Syahbani, M. Fatchurrohman, S. Shobikin, and N. K. Kusmayati, “Pengaruh Ulasan Online dan Rating terhadap Keputusan Pembelian di Marketplace Tiktok Shop,†RIGGS J. Artif. Intell. Digit. Bus., vol. 4, no. 3, pp. 7207–7215, 2025, doi: 10.31004/riggs.v4i3.3021.

A. Daza, N. D. González Rueda, M. S. Aguilar Sánchez, W. F. Robles Espíritu, and M. E. Chauca Quiñones, “Sentiment Analysis on E-Commerce Product Reviews Using Machine Learning and Deep Learning Algorithms: A Bibliometric Analysisand Systematic Literature Review, Challenges and Future Works,†Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, 2024, doi: 10.1016/j.jjimei.2024.100267.

M. R. Tanjung, M. Iqbal, and Z. Sitorus, “Analisis Sentimen Google Review terhadap Mutu Kualitas Pendidikan pada Perguruan Tinggi STIE Al-Washliyah Sibolga dengan Metode Lexicon dan Algoritma Naive Bayes-Miftah Rusydi Tanjung et al Analisis Sentimen Google Review terhadap Mutu Kualitas Pendidikan,†vol. 07, no. 02, pp. 2721–1800, 2025, [Online]. Available: https://journal.cattleyadf.org/index.php/jatilima/index

P. M. Susanti, M. Afdal, I. Permana, and A. Marsal, “Klasifikasi Sentimen Pengguna X Terhadap Pemboikotan Produk Pro Israel Menggunakan Algoritma Machine Learning,†Technol. Sci., vol. 6, no. 4, pp. 2271–2280, 2025, doi: 10.47065/bits.v6i4.6533.

A. Putri, M. Mustakim, R. Novita, and M. Afdal, “Analisis Sentimen Terhadap Publisher Rights Dalam Mengunggah Konten Digital Menggunakan Ensemble Learning,†Build. Informatics, Technol. Sci., vol. 6, no. 1, pp. 64–73, 2024, doi: 10.47065/bits.v6i1.5179.

A. Ichwani and R. Gantino, “Sentiment Analysis of Marketplace Application Reviews Using Support Vector Machine (SVM) and K-Nearest Neighbors (KNN),†J. Technol. Open Source, vol. 8, no. 2, 2024, [Online]. Available: https://doi.org/10.36378/jtos.v7i2.4972

M. C. Untoro and M. A. N. M. Yusuf, “Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset,†IT J. Res. Dev., vol. 8, no. 1, pp. 1–13, 2023,

E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,†MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 21, no. 3, pp. 677–690, 2022, doi: 10.30812/matrik.v21i3.1726.

H. Cnn, “JURNAL RESTI Sentiment Analysis of ChatGPT on Indonesian Text,†vol. 5, no. 158, pp. 327–333, 2026.

M. Amin, “Writing to Speech Conversion Application With Using an Android-Based Camera to Talk,†Sci. Dev. Technol., vol. 3, no. 1, 2023, [Online]. Available: http://creativecommons.org/licenses/by-sa/4.0/

Qorry Meidianingsih, D. E. Wardani, E. Salsabila, L. Nafisah, and A. N. Mutia, “Perbandingan Performa Metode Berbasis Support Vector Machine untuk Penanganan Klasifikasi Multi Kelas Tidak Seimbang,†Stat. J. Theor. Stat. Its Appl., vol. 23, no. 1, pp. 8–18, 2023, doi: 10.29313/statistika.v23i1.1660.

A. B. Putra Negara, “The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification,†Lontar Komput. J. Ilm. Teknol. Inf., vol. 14, no. 3, p. 172, 2023, doi: 10.24843/lkjiti.2023.v14.i03.p05.

U. Ungkawa and M. A. Rafi, “Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases,†J. Online Inform., vol. 9, no. 1, pp. 138–147, 2024, doi: 10.15575/join.v9i1.1293.

Salman Abdurrahman. (2025). Tokopedia Product Reviews 2025 [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/8992580

Downloads

Published

2026-02-12

How to Cite

ANALISIS PENANGANAN DATA TIDAK SEIMBANG TERHADAP KINERJA KLASIFIKASI SENTIMEN MULTIKELAS PADA ULASAN MARKETPLACE TOKOPEDIA. (2026). JOURNAL OF SCIENCE AND SOCIAL RESEARCH, 9(1), 474-482. https://doi.org/10.54314/jssr.v9i1.5804

Most read articles by the same author(s)