Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation

被引:0
|
作者
Ketu, Shwet [1 ]
Agarwal, Sonali [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
关键词
Big data; Big data analytic; Distributed K-Mean; Hadoop MapReduce; Apche Spark; On- disk computation; In-memory computation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods. For assisting software developers to deal with big data challenges, new programming frameworks are required. In this research paper Hadoop MapReduce and Apache Spark are taken for this purpose which supports on-disk and in-memory computation respectively. Clustering is one of the important tasks of big data mining used for information retrieval and knowledge discovery. In this research work, we are analyzing the performance of distributed K-Means clustering based on in-memory and on-disk computational models. For performance enhancement of distributed K-Means clustering, in-memory and on-disk computational models have been adopted and an experimental analysis has been performed.
引用
收藏
页码:318 / 324
页数:7
相关论文
共 50 条
  • [1] Canopy with k-means Clustering Algorithm for Big Data Analytics
    Sagheer, Noor S.
    Yousif, Suhad A.
    [J]. FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
  • [2] Performance Enhancement of Distributed Clustering for Big Data Analytics
    Mohamed, Omar Hesham
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 415 - 425
  • [3] Big Data Analytics Model for Distributed Document Using Hybrid Optimization with K-Means Clustering
    Sharma, Kapil
    Saini, Satish
    Sharma, Shailja
    Kang, Hardeep Singh
    Bouye, Mohamed
    Krah, Daniel
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [4] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [5] k-Means Clustering of Lines for Big Data
    Marom, Yair
    Feldman, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Performance of the K-means and fuzzy C-means algorithms in big data analytics
    Salman Z.
    Alomary A.
    [J]. International Journal of Information Technology, 2024, 16 (1) : 465 - 470
  • [7] Big Data Clustering with Kernel k-Means: Resources, Time and Performance
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (04)
  • [8] Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm
    Al-kababchee, Sarah Ghanim Mahmood
    Algamal, Zakariya Yahya
    Qasim, Omar Saber
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [9] How to Use K-means for Big Data Clustering?
    Mussabayev, Rustam
    Mladenovic, Nenad
    Jarboui, Bassem
    Mussabayev, Ravil
    [J]. PATTERN RECOGNITION, 2023, 137
  • [10] Modified K-means Algorithm for Big Data Clustering
    Sengupta, Debapriya
    Roy, Sayantan Singha
    Ghosh, Sarbani
    Dasgupta, Ranjan
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1443 - 1448