Undersampled K-means approach for handling imbalanced distributed data

被引：0

作者：

Kumar, N. Santhosh ^{[1
]}

Rao, K. Nageswara ^{[2
]}

Govardhan, A. ^{[3
,4
]}

Reddy, K. Sudheer ^{[5
]}

Mahmood, Ali Mirza ^{[6
]}

机构：

[1] JNTU, Dept CSE, Hyderabad, Andhra Prades, India

[2] PSCMR Coll Engn & Technol, Vijayawada, Andhra Prades, India

[3] CSE, Hyderabad, Andhra Prades, India

[4] JNTU, SIT, Hyderabad, Andhra Prades, India

[5] Infosys, Hyderabad, Andhra Prades, India

[6] DMS SVH Coll Engn, Machilipatam, Andhra Prades, India

来源：

PROGRESS IN ARTIFICIAL INTELLIGENCE | 2014年 / 3卷 / 01期

关键词：

Imbalanced data; K-means clustering algorithms; Undersampling; USKM;

D O I：

10.1007/s13748-014-0045-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

K-means is a partitional clustering technique that is well known and widely used for its low computational cost. However, the performance of K-means algorithm tends to be affected by skewed data distributions, i.e., imbalanced data. They often produce clusters of relatively uniform sizes, even if input data have varied cluster size, which is called the "uniform effect". In this paper, we analyze the causes of this effect and illustrate that it probably occurs more in the K-means clustering process. As the minority class decreases in size, the "uniform effect" becomes evident. To prevent the effect of the "uniform effect", we revisit the well-known K-means algorithm and provide a general method to properly cluster imbalance distributed data. The proposed algorithm consists of a novel undersampling technique implemented by intelligently removing noisy and weak instances from majority class. We conduct experiments using twelve UCI datasets from various application domains using five algorithms for comparison on eight evaluation metrics. Experimental results show the effectiveness of the proposed clustering algorithm in clustering balanced and imbalanced data.

引用

页码：29 / 38

页数：10

共 50 条

[31] A K-means triangular synthesis large margin classifier with unified pinball loss for imbalanced data
Shao, Danlin
Dai, Yixi
Li, Junjie
Li, Shenglin
Chen, Rui
APPLIED SOFT COMPUTING, 2024, 167
[32] Distributed Clustering Based on K-means and CPGA
Zhou, Jun
Liu, Zhijing
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
[33] Conceptualized phrase clustering with distributed k-means
Anoop, V. S.
Asharaf, S.
INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (02): : 153 - 160
[34] Distributed k-Means with Outliers in General Metrics
Dandolo, Enrico
Pietracaprina, Andrea
Pucci, Geppino
EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 474 - 488
[35] Entropy and sigmoid based K-means clustering and AGWO for effective big data handling
Vankdothu, Ramdas
Hameed, Mohd Abdul
Bhukya, Raju
Garg, Gaurav
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15287 - 15304
[36] An Efficient Approach for Privacy Preserving Distributed K-Means Clustering in Unsecured Environment
Shewale, Amit
Keshavamurthy, B. N.
Modi, Chirag N.
RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 425 - 431
[37] Efficient Privacy Preserving Distributed K-Means for Non-IID Data
Brandao, Andre
Mendes, Ricardo
Vilela, Joao P.
ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021, 2021, 12695 : 439 - 451
[38] Efficient privacy-preserving outsourced k-means clustering on distributed data
Qiu, Guowei
Zhao, Yingliang
Gui, Xiaolin
INFORMATION SCIENCES, 2024, 674
[39] K-means for Evolving Data Streams
Bidaurrazaga, Arkaitz
Perez, Aritz
Capo, Marco
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1006 - 1011
[40] K-means algorithms for functional data
Lopez Garcia, Maria Luz
Garcia-Rodenas, Ricardo
Gonzalez Gomez, Antonia
NEUROCOMPUTING, 2015, 151 : 231 - 245

← 1 2 3 4 5 →