Hyperplane Division in Fuzzy C-Means: Clustering Big Data

被引：22

作者：

Shen, Yinghua ^{[1
]}

Pedrycz, Witold ^{[2
,3
,4
]}

Chen, Yuan ^{[5
]}

Wang, Xianmin ^{[6
]}

Gacek, Adam ^{[7
]}

机构：

[1] Chongqing Univ, Sch Econ & Business Adm, Chongqing 400044, Peoples R China

[2] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6R 2G7, Canada

[3] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21589, Saudi Arabia

[4] Polish Acad Sci, Syst Res Inst, PL-01447 Warsaw, Poland

[5] Tianjin Univ, Coll Management & Econ, Tianjin 300072, Peoples R China

[6] China Univ Geosci, Inst Geophys & Geomat, Hubei Subsurface Multiscale Imaging Key Lab, Wuhan 430074, Peoples R China

[7] Inst Med Technol & Equipment ITAM, PL-41800 Zabrze, Poland

来源：

IEEE TRANSACTIONS ON FUZZY SYSTEMS | 2020年 / 28卷 / 11期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Clustering algorithms; Big Data; Indexes; Prototypes; Task analysis; Clustering methods; Data structures; Big data clustering; clustering requirements; fuzzy C-means (FCM); hyperplane division; many clusters; MAPREDUCE; FRAMEWORK; ALGORITHM; FCM;

D O I：

10.1109/TFUZZ.2019.2947231

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Big data with a large number of observations (samples) have posed genuine challenges for fuzzy clustering algorithms and fuzzy C-means (FCM), in particular. In this article, we propose an original algorithm referred to as a hyperplane division method to split the entire data set into disjoint subsets. By disjoint subsets, we mean that the data subspaces (parts of the entire data space), each of which is supported or spanned by the data points in the corresponding subset, do not overlap each other. The disjoint subsets turned out to be beneficial to the improvement of the quality of the clusters formed by the clustering algorithms. Moreover, considering that either a large number (say, thousands) or a small number (say, a few) of clusters may be pursued in the clustering task, we propose corresponding strategies (based on the hyperplane division method) to make clustering processes feasible, efficient, and effective. By validating the proposed strategies on both synthetic and publicly available data, we show their superiority (in terms of both efficiency and effectiveness) manifested in a visible way over the method of clustering the entire data and over some representative big data clustering methods.

引用

页码：3032 / 3046

页数：15

共 50 条

[1] Random projections fuzzy c-means (RPFCM) for big data clustering
Popescu, Mihail
Keller, James
Bezdek, James
Zare, Alina
2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
[2] A fuzzy clustering model of data and fuzzy c-means
Nascimento, S
Mirkin, B
Moura-Pires, F
NINTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2000), VOLS 1 AND 2, 2000, : 302 - 307
[3] Fuzzy c-Means and Cluster Ensemble with Random Projection for Big Data Clustering
Ye, Mao
Liu, Wenfen
Wei, Jianghong
Hu, Xuexian
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[4] A Weighted Fuzzy c-Means Clustering Algorithm for Incomplete Big Sensor Data
Li, Peng
Chen, Zhikui
Hu, Yueming
Leng, Yonglin
Li, Qiucen
WIRELESS SENSOR NETWORKS (CWSN 2017), 2018, 812 : 55 - 63
[5] Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms
Perez-Ortega, Joaquin
Silvia Roblero-Aguilar, Sandra
Nely Almanza-Ortega, Nelva
Frausto Solis, Juan
Zavala-Diaz, Crispin
Hernandez, Yasmin
Landero-Najera, Vanesa
AXIOMS, 2022, 11 (08)
[6] Fuzzy c-means clustering of incomplete data
Hathaway, RJ
Bezdek, JC
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05): : 735 - 744
[7] A weighted fuzzy c-means clustering model for fuzzy data
D'Urso, P
Giordani, P
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (06) : 1496 - 1523
[8] Mapreduce fuzzy c-means ensemble clustering with gentle adaboost for big data analytics
Padmapriya K.M.
Anandhi B.
Vijayakumar M.
International Journal of Business Intelligence and Data Mining, 2021, 19 (02): : 170 - 188
[9] Clustering healthcare big data using advanced and enhanced fuzzy C-means algorithm
Purandhar, N.
Ayyasamy, S.
Saravanakumar, N. M.
INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2021, 34 (01)
[10] A fuzzy C-means algorithm for optimizing data clustering
Hashemi, Seyed Emadedin
Gholian-Jouybari, Fatemeh
Hajiaghaei-Keshteli, Mostafa
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227

← 1 2 3 4 5 →