OPTIMALISASI VALIDITAS KLASTERISASI IPM MELALUI PENERAPAN VARIASI DISTANCE MEASURE PADA ALGORITMA K-MEANS++
DOI:
https://doi.org/10.54314/jssr.v8i4.5420Abstract
Abstract: The Human Development Index (HDI) is an important indicator for measuring the quality of regional development through the dimensions of health, education, and decent living standards. In North Sumatra Province, HDI achievements between districts/cities still show significant disparities, requiring a data-based analytical approach to map development patterns objectively. This study aims to optimize the validity of regional HDI clustering through the application of the K-Means++ algorithm with distance measure variations. This study uses a quantitative approach with an unsupervised learning method. The data analyzed includes HDI, Average Length of Schooling (ALS), and Adjusted Per Capita Expenditure sourced from the Central Statistics Agency. The research stages include data preprocessing and standardization, determining the optimal number of clusters using the Elbow method, applying the K-Means++ algorithm, and evaluating cluster quality using the Davies–Bouldin Index (DBI) and Purity Index. In addition, a comparison of clustering performance based on Euclidean, Manhattan, and Cosine distances was conducted. The results of the study show that the optimal number of clusters is three clusters representing high, medium, and low levels of human development. A DBI value of 0.60 and a Purity Index of 0.61 indicate good clustering quality. Euclidean and Manhattan distances produced the best performance compared to Cosine distance. Keyword: Human Development Index; K-Means++; Clustering; Distance Measure; Davies–Bouldin Index; Purity Index. Abstrak: Indeks Pembangunan Manusia (IPM) merupakan indikator penting untuk mengukur kualitas pembangunan wilayah melalui dimensi kesehatan, pendidikan, dan standar hidup layak. Di Provinsi Sumatera Utara, capaian IPM antar kabupaten/kota masih menunjukkan ketimpangan yang cukup signifikan, sehingga diperlukan pendekatan analitis berbasis data untuk memetakan pola pembangunan secara objektif. Penelitian ini bertujuan untuk mengoptimalkan validitas klasterisasi IPM wilayah melalui penerapan algoritma K-Means++ dengan variasi distance measure. Penelitian ini menggunakan pendekatan kuantitatif dengan metode unsupervised learning. Data yang dianalisis meliputi IPM, Rata Lama Sekolah (RLS), dan Pengeluaran per Kapita Disesuaikan yang bersumber dari Badan Pusat Statistik. Tahapan penelitian mencakup praproses dan standarisasi data, penentuan jumlah klaster optimal menggunakan metode Elbow, penerapan algoritma K-Means++, serta evaluasi kualitas klaster menggunakan Davies–Bouldin Index (DBI) dan Purity Index. Selain itu, dilakukan perbandingan kinerja klasterisasi berdasarkan Euclidean, Manhattan, dan Cosine distance. Hasil penelitian menunjukkan bahwa jumlah klaster optimal adalah tiga klaster yang merepresentasikan tingkat pembangunan manusia tinggi, menengah, dan rendah. Nilai DBI sebesar 0,60 dan Purity Index sebesar 0,61 menunjukkan kualitas klasterisasi yang baik. Euclidean dan Manhattan distance menghasilkan performa terbaik dibandingkan Cosine distance. Kata kunci: Indeks Pembangunan Manusia; K-Means++; Klasterisasi; Distance Measure; Davies–Bouldin Index; Purity Index.Downloads
References
Aggarwal, Charu C. 2005. “Data Mining Data Mining.†Mining of Massive Datasets 2(January 2013):5–20. https://www.cambridge.org/core/product/identifier/CBO9781139058452A007/type/book_part.
Arthur, David, and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. Vol. 8.
Badan Pusat Statistik. 2024. “Berita Resmi Statistik Indeks Pembangunan Manusia 2023.†18:1–282.
Fahmiyah, Indah, Ratih Ardiati Ningrum, and Universitas Airlangga. 2023. “Journal+5 (1).†02(01):27–33.
Hasugian, Paska Marto, Bosker Sinaga, Jonson Manurung, and Safa Ayoub Al Hashim. 2021. “Best Cluster Optimization with Combination of K-Means Algorithm And Elbow Method Towards Rice Production Status Determination.†International Journal of Artificial Intelligence Research 5(1):102–10. doi:10.29099/ijair.v6i1.232.
Huang, Jiale, Jingtong Dai, and Yanjin Li. 2023. “Research on PCA-Kmeans++ Clustering Algorithm Considering Spatiotemporal Dimension.†Pp. 195–201 in 2023 2nd International Conference on 3D Immersion, Interaction and Multi-sensory Experiences (ICDIIME).
Idrus, Ali, Nafan Tarihoran, Ucup Supriatna, Ahmad Tohir, Suwarni Suwarni, and Robbi Rahim. 2022. “Distance Analysis Measuring for Clustering Using K-Means and Davies Bouldin Index Algorithm.†TEM Journal 11(4):1871–76. doi:10.18421/TEM114-55.
Jain, Anil K. 2010. “Data Clustering: 50 Years beyond K-Means.†Pattern Recognition Letters 31(8):651–66. doi:https://doi.org/10.1016/j.patrec.2009.09.011.
Maharani, Aulia, Gomal Juni Yanris, and Fitri Aini Nasution. 2025. “Implementation of the K-Means Clustering Method in Clustering Poor Population in Bandar Kumbul Village, Labuhanbatu Regency.†International Journal of Science, Technology & Management 6(1):248–56. doi:10.46729/ijstm.v6i1.1209.
Saraiva, Carolina, and Jorge Caiado. 2025. “Global Development Patterns: A Clustering Analysis of Economic, Social and Environmental Indicators.†Sustainable Futures 10(June):100907. doi:10.1016/j.sftr.2025.100907.
Satyahadewi, Neva, Steven Jansen Sinaga, and Hendra Perdana. 2023. “Hierarchical Cluster Analysis of Districts/Cities in
North Sumatra Province Based on Human Development Index Indicators Using Pseudo-F.†Barekeng 17(3):1429–38. doi:10.30598/barekengvol17iss3pp1429-1438.
To, Introduction, and Data Mining. n.d. Data Mining Tan.




