Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation

被引:0
|
作者
Ketu, Shwet [1 ]
Agarwal, Sonali [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Big data; Big data analytic; Distributed K-Mean; Hadoop MapReduce; Apche Spark; On- disk computation; In-memory computation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods. For assisting software developers to deal with big data challenges, new programming frameworks are required. In this research paper Hadoop MapReduce and Apache Spark are taken for this purpose which supports on-disk and in-memory computation respectively. Clustering is one of the important tasks of big data mining used for information retrieval and knowledge discovery. In this research work, we are analyzing the performance of distributed K-Means clustering based on in-memory and on-disk computational models. For performance enhancement of distributed K-Means clustering, in-memory and on-disk computational models have been adopted and an experimental analysis has been performed.
引用
收藏
页码:318 / 324
页数:7
相关论文
共 50 条
  • [31] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [32] A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data
    Dierckens, Karl E.
    Harrison, Adrian B.
    Leung, Carson K.
    Pind, Adrienne V.
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 925 - 932
  • [33] Effect of Corpus Size Selection on Performance of Map-Reduce Based Distributed K-Means for Big Textual Data Clustering
    Ketu, Shwet
    Prasad, Bakshi Rohit
    Agarwal, Sonali
    [J]. 6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 256 - 260
  • [34] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
    Xie, Ting
    Liu, Ruihua
    Wei, Zhengyuan
    [J]. APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
  • [35] Design of Intelligent K-Means Based on Spark for Big Data Clustering
    Kusuma, Ilham
    Ma'sum, M. Anwar
    Habibie, Novian
    Jatmiko, Wisnu
    Suhartanto, Heru
    [J]. 2016 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2016, : 89 - 95
  • [36] Using In-Memory Analytics to Quickly Crunch Big Data
    Garber, Lee
    [J]. COMPUTER, 2012, 45 (10) : 16 - 18
  • [37] Automatic Determination of K in Distributed K-Means Clustering
    Kotary, Dinesh Kumar
    Nanda, Satyasai Jagannath
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 556 - 564
  • [38] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
    Bandyopadhyay, Soumyendu Sekhar
    Halder, Anup Kumar
    Chatterjee, Piyali
    Nasipuri, Mita
    Basu, Subhadip
    [J]. 2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
  • [39] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [40] Distributed threshold k-means clustering for privacy preserving data mining
    Baby, Vadlana
    Chandra, N. Subhash
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2286 - 2289