Performance Optimization of Machine Learning Algorithms Based on Spark

被引:0
|
作者
Luo W. [1 ]
Zhang S. [2 ]
Xu Y. [1 ]
机构
[1] School of Information Management, Jiangxi University of Finance and Economics, Jiangxi, Nanchang
[2] College of Software Engineering, Guangxi Normal University, Guangxi, Guilin
来源
关键词
Machine learning algorithm; RDD; Shuffle; Spark;
D O I
10.2478/amns-2024-0416
中图分类号
学科分类号
摘要
This paper proposes a performance optimization strategy for Spark-based machine learning algorithms in Shuffle and memory data management modules. The Shuffle module is optimized by introducing Observer monitoring module in Spark cluster to achieve task status monitoring and dynamic ShuffleWrite task generation. Meanwhile, an adaptive caching mechanism for RDD data addresses the lack of in-memory data caching. The performance-optimized algorithm performs well in the experiments, with a clustering accuracy of 89% and a response time that is 5% faster than the Random Forest algorithm. In road network traffic state discrimination, the optimized algorithm's classification decision F-measure value is as high as 99.53%, which is 5.32% higher than that before unoptimization, and the running time is 767 seconds less than that of the unoptimized algorithm when dealing with about 6, 880, 000 pieces of data, which significantly improves the efficiency and accuracy. © 2023 Weikang Luo, Shenglin Zhang and Yinggen Xu, published by Sciendo.
引用
收藏
相关论文
共 50 条
  • [1] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Ali Mostafaeipour
    Amir Jahangard Rafsanjani
    Mohammad Ahmadi
    Joshuva Arockia Dhanraj
    The Journal of Supercomputing, 2021, 77 : 1273 - 1300
  • [2] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Mostafaeipour, Ali
    Rafsanjani, Amir Jahangard
    Ahmadi, Mohammad
    Dhanraj, Joshuva Arockia
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
  • [3] Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark
    Liu, Tiantian
    Fang, Zhiyi
    Zhao, Chen
    Zhou, Yingmin
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1075 - 1079
  • [4] Performance Evaluation of Machine Learning Algorithms in Apache Spark for Intrusion Detection
    Dobson, Anthony
    Roy, Kaushik
    Yuan, Xiaohong
    Xu, Jinsheng
    2018 28TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2018, : 374 - 379
  • [5] A Novel Approach for Evaluating Web Page Performance Based on Machine Learning Algorithms and Optimization Algorithms
    Ghattas, Mohammad
    Mora, Antonio M.
    Odeh, Suhail
    AI, 2025, 6 (02)
  • [6] Survey of Machine Learning Algorithms on Spark Over DHT-based Structures
    Sioutas, Spyros
    Mylonas, Phivos
    Panaretos, Alexandros
    Gerolymatos, Panagiotis
    Vogiatzis, Dimitrios
    Karavaras, Eleftherios
    Spitieris, Thomas
    Kanavos, Andreas
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2016, 2017, 10230 : 146 - 156
  • [7] Usages of Spark Framework with Different Machine Learning Algorithms
    Mohamed, Mohamed Ali
    El-Henawy, Ibrahim Mahmoud
    Salah, Ahmad
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [8] Automotive Performance Tests Based on Machine Learning Algorithms
    Geissler, M.
    Kunisch, J.
    Oikonomopoulos-Zachos, C.
    Friedrich, A.
    2022 16TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2022,
  • [9] Performance evaluation of genetic algorithms and evolutionary programming in optimization and machine learning
    Abu-Zitar, R
    Nuseirat, AMA
    CYBERNETICS AND SYSTEMS, 2002, 33 (03) : 203 - 223
  • [10] Optimization and comparison of machine learning algorithms for the prediction of the performance of football players
    Gianluca Morciano
    Andrea Zingoni
    Giuseppe Calabrò
    Neural Computing and Applications, 2024, 36 (31) : 19653 - 19666