Performance Enhancement of Distributed Clustering for Big Data Analytics

被引:0
|
作者
Mohamed, Omar Hesham [1 ]
Shehab, Mohamed Elemam [2 ]
El Fakharany, Essam [3 ]
机构
[1] Arab Acad Sci Technol & Maritime Transport, Informat Syst Dept, Cairo, Egypt
[2] Arab Acad Sci Technol & Maritime Transport, Cairo, Egypt
[3] Arab Acad Sci Technol & Maritime Transport, Coll Comp & Informat Technol, Cairo, Egypt
关键词
Big Data; Apache Spark; Machine learning algorithms; K-Means algorithm; In-memory computation; Big data analytic;
D O I
10.1007/978-3-319-74690-6_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data analytics are recently coming up as prominent research area in the field of data science. Apache Spark is an open source distributed data processing platform that uses distributed memory abstraction to process large volume of streaming data efficiently. Performance improvement of analytic computational model of streaming big data is important to meet the requirements of many real-time data analysis. Researchers focus on Analytic algorithm improvement to reduce analysis time. This paper presents performance enhancement of in-memory computational model by selecting the most important attributes after caching data to Apache spark. Performance analysis of distributed K-Means clustering algorithm based on in-memory computational model has been conducted. The results show improvement in the performance of the model.
引用
收藏
页码:415 / 425
页数:11
相关论文
共 50 条
  • [1] Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation
    Ketu, Shwet
    Agarwal, Sonali
    [J]. 2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 318 - 324
  • [2] An algebra for distributed Big Data analytics
    Fegaras, Leonidas
    [J]. JOURNAL OF FUNCTIONAL PROGRAMMING, 2017, 27
  • [3] Distributed Analytics For Big Data: A Survey
    Berloco, Francesco
    Bevilacqua, Vitoantonio
    Colucci, Simona
    [J]. NEUROCOMPUTING, 2024, 574
  • [4] Continuous Clustering in Big Data Learning Analytics
    Govindarajan, Kannan
    Somasundaram, Thamarai Selvi
    Kumar, Vivekanandan S.
    Kinshuk
    [J]. 2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON TECHNOLOGY FOR EDUCATION (T4E 2013), 2013, : 61 - 64
  • [5] Distributed Big Data Analytics in the Internet of Signals
    Anavangot, Vijay
    Menon, Varun G.
    Nayyar, Anand
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 73 - 77
  • [6] Distributed algorithm for big data analytics in healthcare
    Forestiero, Agostino
    Papuzzo, Giuseppe
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 776 - 779
  • [7] Distributed Big Data Analytics in Service Computing
    Yu, Weider D.
    Gottumukkala, AvinashChander
    Senthailselvi, Deenash Arivazhagan
    Maniraj, Prabhu
    Khonde, Tushar
    [J]. 2017 IEEE 13TH INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS (ISADS 2017), 2017, : 55 - 60
  • [8] Students' Performance Tracking in Distributed Open Education using Big Data Analytics
    Hussein, Ashraf S.
    Khan, Hamayun A.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, DATA AND CLOUD COMPUTING (ICC 2017), 2017,
  • [9] Different Clustering Algorithms for Big Data Analytics: A Review
    Dave, Meenu
    Gianey, Hemant
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
  • [10] Speculative Distributed CSV Data Parsing for Big Data Analytics
    Ge, Chang
    Li, Yinan
    Eilebrecht, Eric
    Chandramouli, Badrish
    Kossmann, Donald
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 883 - 899