Performance Optimization of Machine Learning Algorithms Based on Spark

被引:0
|
作者
Luo W. [1 ]
Zhang S. [2 ]
Xu Y. [1 ]
机构
[1] School of Information Management, Jiangxi University of Finance and Economics, Jiangxi, Nanchang
[2] College of Software Engineering, Guangxi Normal University, Guangxi, Guilin
来源
关键词
Machine learning algorithm; RDD; Shuffle; Spark;
D O I
10.2478/amns-2024-0416
中图分类号
学科分类号
摘要
This paper proposes a performance optimization strategy for Spark-based machine learning algorithms in Shuffle and memory data management modules. The Shuffle module is optimized by introducing Observer monitoring module in Spark cluster to achieve task status monitoring and dynamic ShuffleWrite task generation. Meanwhile, an adaptive caching mechanism for RDD data addresses the lack of in-memory data caching. The performance-optimized algorithm performs well in the experiments, with a clustering accuracy of 89% and a response time that is 5% faster than the Random Forest algorithm. In road network traffic state discrimination, the optimized algorithm's classification decision F-measure value is as high as 99.53%, which is 5.32% higher than that before unoptimization, and the running time is 767 seconds less than that of the unoptimized algorithm when dealing with about 6, 880, 000 pieces of data, which significantly improves the efficiency and accuracy. © 2023 Weikang Luo, Shenglin Zhang and Yinggen Xu, published by Sciendo.
引用
收藏
相关论文
共 50 条
  • [21] Machine learning algorithms in ship design optimization
    Peri, Daniele
    SHIP TECHNOLOGY RESEARCH, 2024, 71 (01) : 1 - 13
  • [22] Test Set Optimization by Machine Learning Algorithms
    Fu, Kaiming
    Jin, Yulu
    Chen, Zhousheng
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5673 - 5675
  • [23] Machine Learning for Performance Prediction of Spark Cloud Applications
    Maros, Alexandre
    Murai, Fabricio
    Couto da Silva, Ana Paula
    Almeida, Jussara M.
    Lattuada, Marco
    Gianniti, Eugenio
    Hosseini, Marjan
    Ardagna, Danilo
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 99 - 106
  • [24] ADMM based Scalable Machine Learning on Spark
    Dhar, Sauptik
    Yi, Congrui
    Ramakrishnan, Naveen
    Shah, Mohak
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1174 - 1182
  • [25] HeteroSpark: A Heterogeneous CPU/GPU Spark Platform for Machine Learning Algorithms
    Li, Peilong
    Luo, Yan
    Zhang, Ning
    Cao, Yu
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2015, : 347 - 350
  • [26] A Research Study on Running Machine Learning Algorithms on Big Data with Spark
    Kerestely, Arpad
    Baicoianu, Alexandra
    Bocu, Razvan
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 307 - 318
  • [27] Thermal performance of a novel ultrasonic evaporator based on machine learning algorithms
    Song, Jitian
    Tian, Wei
    Xu, Xiaofei
    Wang, Yening
    Li, Zhanyong
    APPLIED THERMAL ENGINEERING, 2019, 148 : 438 - 446
  • [28] Ranking of Machine learning Algorithms Based on the Performance in Classifying DDoS Attacks
    Robinson, Rejimol R. R.
    Thomas, Ciza
    PROCEEDINGS OF THE 2015 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS), 2015, : 185 - 190
  • [29] Performance prediction of perovskite materials based on different machine learning algorithms
    Zheng W.-D.
    Zhang H.-R.
    Hu H.-Q.
    Liu Y.
    Li S.-Z.
    Ding G.-T.
    Zhang J.-C.
    Zhongguo Youse Jinshu Xuebao/Chinese Journal of Nonferrous Metals, 2019, 29 (04): : 803 - 809
  • [30] Machine Learning and Deep Learning Optimization Algorithms for Unconstrained Convex Optimization Problem
    Naeem, Kainat
    Bukhari, Amal
    Daud, Ali
    Alsahfi, Tariq
    Alshemaimri, Bader
    Alhajlah, Mousa
    IEEE ACCESS, 2025, 13 : 1817 - 1833