One-pass MapReduce-based clustering method for mixed large scale data

被引:0
|
作者
Mohamed Aymen Ben HajKacem
Chiheb-Eddine Ben N’cir
Nadia Essoussi
机构
[1] Université de Tunis,Institut Supérieur de Gestion de Tunis, LARODEC
关键词
K-prototypes; One-pass MapReduce; Large scale data; Mixed data; Pruning strategy;
D O I
暂无
中图分类号
学科分类号
摘要
Big data is often characterized by a huge volume and a mixed types of attributes namely, numeric and categorical. K-prototypes has been fitted into MapReduce framework and hence it has become a solution for clustering mixed large scale data. However, k-prototypes requires computing all distances between each of the cluster centers and the data points. Many of these distance computations are redundant, because data points usually stay in the same cluster after first few iterations. Also, k-prototypes is not suitable for running within MapReduce framework: the iterative nature of k-prototypes cannot be modeled through MapReduce since at each iteration of k-prototypes, the whole data set must be read and written to disks and this results a high input/output (I/O) operations. To deal with these issues, we propose a new one-pass accelerated MapReduce-based k-prototypes clustering method for mixed large scale data. The proposed method reads and writes data only once which reduces largely the I/O operations compared to existing MapReduce implementation of k-prototypes. Furthermore, the proposed method is based on a pruning strategy to accelerate the clustering process by reducing the redundant distance computations between cluster centers and data points. Experiments performed on simulated and real data sets show that the proposed method is scalable and improves the efficiency of the existing k-prototypes methods.
引用
收藏
页码:619 / 636
页数:17
相关论文
共 50 条
  • [1] One-pass MapReduce-based clustering method for mixed large scale data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (03) : 619 - 636
  • [2] One-pass Multi-view Clustering for Large-scale Data
    Liu, Jiyuan
    Liu, Xinwang
    Yang, Yuexiang
    Liu, Li
    Wang, Siqi
    Liang, Weixuan
    Shi, Jiangyong
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12324 - 12333
  • [3] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    [J]. PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [4] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
    Tripathi, Ashish Kumar
    Saxena, Pranav
    Gupta, Siddharth
    [J]. 2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
  • [5] A One-Pass Clustering Based Sketch Method for Network Monitoring
    Fu, Yongquan
    An, Lun
    Shen, Siqi
    Chen, Kai
    Barlet-Ros, Pere
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 2604 - 2613
  • [6] MapReduce-based K-Prototypes Clustering Method for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1030 - 1036
  • [7] A MapReduce-based parallel K-means clustering for large-scale CIM data verification
    Deng, Chuang
    Liu, Yang
    Xu, Lixiong
    Yang, Jie
    Liu, Junyong
    Li, Siguang
    Li, Maozhen
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3096 - 3114
  • [8] One-Pass Clustering Superpixels
    Kesavan, Yogarajah
    Ramanan, Amirthalingam
    [J]. 2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,
  • [9] MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data
    Visalakshi, Karthikeyani N.
    Shanthi, S.
    Lakshmi, K.
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [10] Large Scale Text Clustering Method Study Based on MapReduce
    Sun, Zhanquan
    Li, Feng
    Zhao, Yanling
    Song, Lifeng
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2015, 2015, 9377 : 365 - 372