Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation

被引:0
|
作者
Ketu, Shwet [1 ]
Agarwal, Sonali [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Big data; Big data analytic; Distributed K-Mean; Hadoop MapReduce; Apche Spark; On- disk computation; In-memory computation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods. For assisting software developers to deal with big data challenges, new programming frameworks are required. In this research paper Hadoop MapReduce and Apache Spark are taken for this purpose which supports on-disk and in-memory computation respectively. Clustering is one of the important tasks of big data mining used for information retrieval and knowledge discovery. In this research work, we are analyzing the performance of distributed K-Means clustering based on in-memory and on-disk computational models. For performance enhancement of distributed K-Means clustering, in-memory and on-disk computational models have been adopted and an experimental analysis has been performed.
引用
下载
收藏
页码:318 / 324
页数:7
相关论文
共 50 条
  • [21] A Novel K-Means based Clustering Algorithm for Big Data
    Sinha, Ankita
    Jana, Prasanta K.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879
  • [22] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [23] Improvement of K-Means Algorithm for Accelerated Big Data Clustering
    Wu, Chunqiong
    Yan, Bingwen
    Yu, Rongrui
    Huang, Zhangshu
    Yu, Baoqin
    Yu, Yanliang
    Chen, Na
    Zhou, Xiukao
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2021, 14 (02) : 99 - 119
  • [24] Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor
    Awad, Fouad H.
    Hamad, Murtadha M.
    ELECTRONICS, 2022, 11 (06)
  • [25] Bridging High Velocity and High Volume Industrial Big Data Through Distributed In-Memory Storage & Analytics
    Williams, Jenny Weisenberg
    Aggour, Kareem S.
    Interrante, John
    McHugh, Justin
    Pool, Eric
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 932 - 941
  • [26] Enhancement of the K-Means Algorithm for Mixed Data in Big Data Platforms
    Koren, Oded
    Hallin, Carina Antonia
    Perel, Nir
    Bendet, Dror
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 1025 - 1040
  • [27] K-Means Clustering with Distributed Dimensions
    Ding, Hu
    Liu, Yu
    Huang, Lingxiao
    Li, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [28] NEW ALGORITHM FOR CLUSTERING DISTRIBUTED DATA USING K-MEANS
    Khedr, Ahmed M.
    Bhatnagar, Raj K.
    COMPUTING AND INFORMATICS, 2014, 33 (04) : 943 - 964
  • [29] An Enhancement of K-means Clustering Algorithm
    Gu, Jirong
    Zhou, Jieming
    Chen, Xianwei
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 237 - 240
  • [30] In-Memory Performance for Big Data
    Graefe, Goetz
    Volos, Haris
    Kimura, Hideaki
    Kuno, Harumi
    Tucek, Joseph
    Lillibridge, Mark
    Veitch, Alistair
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (01): : 37 - 48