Performance Enhancement of Distributed Clustering for Big Data Analytics

被引：0

作者：

Mohamed, Omar Hesham ^{[1
]}

Shehab, Mohamed Elemam ^{[2
]}

El Fakharany, Essam ^{[3
]}

机构：

[1] Arab Acad Sci Technol & Maritime Transport, Informat Syst Dept, Cairo, Egypt

[2] Arab Acad Sci Technol & Maritime Transport, Cairo, Egypt

[3] Arab Acad Sci Technol & Maritime Transport, Coll Comp & Informat Technol, Cairo, Egypt

来源：

INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018) | 2018年 / 723卷

关键词：

Big Data; Apache Spark; Machine learning algorithms; K-Means algorithm; In-memory computation; Big data analytic;

D O I：

10.1007/978-3-319-74690-6_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Big Data analytics are recently coming up as prominent research area in the field of data science. Apache Spark is an open source distributed data processing platform that uses distributed memory abstraction to process large volume of streaming data efficiently. Performance improvement of analytic computational model of streaming big data is important to meet the requirements of many real-time data analysis. Researchers focus on Analytic algorithm improvement to reduce analysis time. This paper presents performance enhancement of in-memory computational model by selecting the most important attributes after caching data to Apache spark. Performance analysis of distributed K-Means clustering algorithm based on in-memory computational model has been conducted. The results show improvement in the performance of the model.

引用

页码：415 / 425

页数：11

共 50 条

[1] Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation
Ketu, Shwet
Agarwal, Sonali
[J]. 2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 318 - 324
[2] An algebra for distributed Big Data analytics
Fegaras, Leonidas
[J]. JOURNAL OF FUNCTIONAL PROGRAMMING, 2017, 27
[3] Distributed Analytics For Big Data: A Survey
Berloco, Francesco
Bevilacqua, Vitoantonio
Colucci, Simona
[J]. NEUROCOMPUTING, 2024, 574
[4] Continuous Clustering in Big Data Learning Analytics
Govindarajan, Kannan
Somasundaram, Thamarai Selvi
Kumar, Vivekanandan S.
Kinshuk
[J]. 2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON TECHNOLOGY FOR EDUCATION (T4E 2013), 2013, : 61 - 64
[5] Distributed Big Data Analytics in the Internet of Signals
Anavangot, Vijay
Menon, Varun G.
Nayyar, Anand
[J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 73 - 77
[6] Distributed algorithm for big data analytics in healthcare
Forestiero, Agostino
Papuzzo, Giuseppe
[J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 776 - 779
[7] Distributed Big Data Analytics in Service Computing
Yu, Weider D.
Gottumukkala, AvinashChander
Senthailselvi, Deenash Arivazhagan
Maniraj, Prabhu
Khonde, Tushar
[J]. 2017 IEEE 13TH INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS (ISADS 2017), 2017, : 55 - 60
[8] Students' Performance Tracking in Distributed Open Education using Big Data Analytics
Hussein, Ashraf S.
Khan, Hamayun A.
[J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, DATA AND CLOUD COMPUTING (ICC 2017), 2017,
[9] Different Clustering Algorithms for Big Data Analytics: A Review
Dave, Meenu
Gianey, Hemant
[J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
[10] Speculative Distributed CSV Data Parsing for Big Data Analytics
Ge, Chang
Li, Yinan
Eilebrecht, Eric
Chandramouli, Badrish
Kossmann, Donald
[J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 883 - 899

← 1 2 3 4 5 →