Hyperplane Division in Fuzzy C-Means: Clustering Big Data

被引:22
|
作者
Shen, Yinghua [1 ]
Pedrycz, Witold [2 ,3 ,4 ]
Chen, Yuan [5 ]
Wang, Xianmin [6 ]
Gacek, Adam [7 ]
机构
[1] Chongqing Univ, Sch Econ & Business Adm, Chongqing 400044, Peoples R China
[2] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6R 2G7, Canada
[3] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21589, Saudi Arabia
[4] Polish Acad Sci, Syst Res Inst, PL-01447 Warsaw, Poland
[5] Tianjin Univ, Coll Management & Econ, Tianjin 300072, Peoples R China
[6] China Univ Geosci, Inst Geophys & Geomat, Hubei Subsurface Multiscale Imaging Key Lab, Wuhan 430074, Peoples R China
[7] Inst Med Technol & Equipment ITAM, PL-41800 Zabrze, Poland
基金
加拿大自然科学与工程研究理事会;
关键词
Clustering algorithms; Big Data; Indexes; Prototypes; Task analysis; Clustering methods; Data structures; Big data clustering; clustering requirements; fuzzy C-means (FCM); hyperplane division; many clusters; MAPREDUCE; FRAMEWORK; ALGORITHM; FCM;
D O I
10.1109/TFUZZ.2019.2947231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data with a large number of observations (samples) have posed genuine challenges for fuzzy clustering algorithms and fuzzy C-means (FCM), in particular. In this article, we propose an original algorithm referred to as a hyperplane division method to split the entire data set into disjoint subsets. By disjoint subsets, we mean that the data subspaces (parts of the entire data space), each of which is supported or spanned by the data points in the corresponding subset, do not overlap each other. The disjoint subsets turned out to be beneficial to the improvement of the quality of the clusters formed by the clustering algorithms. Moreover, considering that either a large number (say, thousands) or a small number (say, a few) of clusters may be pursued in the clustering task, we propose corresponding strategies (based on the hyperplane division method) to make clustering processes feasible, efficient, and effective. By validating the proposed strategies on both synthetic and publicly available data, we show their superiority (in terms of both efficiency and effectiveness) manifested in a visible way over the method of clustering the entire data and over some representative big data clustering methods.
引用
收藏
页码:3032 / 3046
页数:15
相关论文
共 50 条
  • [31] A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Li, Kang
    INTELLIGENT COMPUTING, NETWORKED CONTROL, AND THEIR ENGINEERING APPLICATIONS, PT II, 2017, 762 : 3 - 12
  • [32] Analysis of spectroscopic imaging data by fuzzy C-means clustering
    Mansfield, JR
    Sowa, MG
    Scarth, GB
    Somorjai, RL
    Mantsch, HH
    ANALYTICAL CHEMISTRY, 1997, 69 (16) : 3370 - 3374
  • [34] Fuzzy C-Means and Fuzzy TLBO for Fuzzy Clustering
    Krishna, P. Gopala
    Bhaskari, D. Lalitha
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 479 - 486
  • [35] Clustering of COVID-19 data for knowledge discovery using c-means and fuzzy c-means
    Afzal, Asif
    Ansari, Zahid
    Alshahrani, Saad
    Raj, Arun K.
    Kuruniyan, Mohamed Saheer
    Saleel, C. Ahamed
    Nisar, Kottakkaran Sooppy
    RESULTS IN PHYSICS, 2021, 29
  • [36] An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT
    Bu, Fanyu
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 675 - 682
  • [37] Spectral partitioning and fuzzy C-means based clustering algorithm for big data wireless sensor networks
    Wang, Quyuan
    Guo, Songtao
    Hu, Jianji
    Yang, Yuanyuan
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [38] Spectral partitioning and fuzzy C-means based clustering algorithm for big data wireless sensor networks
    Quyuan Wang
    Songtao Guo
    Jianji Hu
    Yuanyuan Yang
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [39] A Multiple Fuzzy C-Means Ensemble Cluster Forest for Big Data
    Lahmar, Ines
    Zaier, Aida
    Yahia, Mohamed
    Boaullegue, Ridha
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 442 - 451
  • [40] FAST, FUZZY C-MEANS CLUSTERING OF DATA SETS WITH MANY FEATURES
    ALSBERG, BK
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 1995, 16 (04) : 414 - 421