Using machine learning to optimize parallelism in big data applications

被引:47
|
作者
Brandon Hernandez, Alvaro [1 ]
Perez, Maria S. [1 ]
Gupta, Smrati [2 ]
Muntes-Mulero, Victor [2 ]
机构
[1] Univ Politecn Madrid, Ontol Engn Grp, Calle Ciruelos, E-28660 Madrid, Spain
[2] CA Technol, Pl Pau,WTC Almeda Pk Edif 2 Planta 4, Barcelona 08940, Spain
基金
欧盟地平线“2020”;
关键词
Machine learning; Spark; Parallelism; Big data;
D O I
10.1016/j.future.2017.07.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In-memory cluster computing platforms have gained momentum in the last years, due to their ability to analyse big amounts of data in parallel. These platforms are complex and difficult-to-manage environments. In addition, there is a lack of tools to better understand and optimize such platforms that consequently form the backbone of big data infrastructure and technologies. This directly leads to underutilization of available resources and application failures in such environment. One of the key aspects that can address this problem is optimization of the task parallelism of application in such environments. In this paper, we propose a machine learning based method that recommends optimal parameters for task parallelization in big data workloads. By monitoring and gathering metrics at system and application level, we are able to find statistical correlations that allow us to characterize and predict the effect of different parallelism settings on performance. These predictions are used to recommend an optimal configuration to users before launching their workloads in the cluster, avoiding possible failures, performance degradation and wastage of resources. We evaluate our method with a benchmark of 15 Spark applications on the Grid5000 testbed. We observe up to a 51% gain on performance when using the recommended parallelism settings. The model is also interpretable and can give insights to the user into how different metrics and parameters affect the performance. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1076 / 1092
页数:17
相关论文
共 50 条
  • [1] Tension in big data using machine learning: Analysis and applications
    Wang, Huamao
    Yao, Yumei
    Salhi, Said
    [J]. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2020, 158
  • [2] Survey of Machine Learning Methods for Big Data Applications
    Vinothini, A.
    Priya, S. Baghavathi
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [3] Advanced Machine Learning Applications in Big Data Analytics
    Li, Taiyong
    Deng, Wu
    Wu, Jiang
    [J]. ELECTRONICS, 2023, 12 (13)
  • [4] Optimized Extreme Learning Machine for Big Data Applications using Python']Python
    Dogaru, Radu
    Dogaru, Ioana
    [J]. 2018 12TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2018, : 189 - 192
  • [5] Weather Forecasting Prediction Using Ensemble Machine Learning for Big Data Applications
    Shaiba, Hadil
    Marzouk, Radwa
    Nour, Mohamed K.
    Negm, Noha
    Hilal, Anwer Mustafa
    Mohamed, Abdullah
    Motwakel, Abdelwahed
    Yaseen, Ishfaq
    Zamani, Abu Sarwar
    Rizwanullah, Mohammed
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3367 - 3382
  • [6] Current applications of big data and machine learning in cardiology
    Renato Cuocolo
    Teresa Perillo
    Eliana De Rosa
    Lorenzo Ugga
    Mario Petretta
    [J]. Journal of Geriatric Cardiology, 2019, 16 (08) : 601 - 607
  • [7] Current applications of big data and machine learning in cardiology
    Cuocolo, Renato
    Perillo, Teresa
    De Rosa, Eliana
    Ugga, Lorenzo
    Petretta, Mario
    [J]. JOURNAL OF GERIATRIC CARDIOLOGY, 2019, 16 (08) : 601 - 607
  • [8] Producing personalized statin treatment plans to optimize clinical outcomes using big data and machine learning
    Chi, Chih-Lin
    Wang, Jin
    Yew, Pui Ying
    Lenskaia, Tatiana
    Loth, Matt
    Pradhan, Prajwal Mani
    Liang, Yue
    Kurella, Prashanth
    Mehta, Rishabh
    Robinson, Jennifer G.
    Tonellato, Peter J.
    Adam, Terrence J.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 128
  • [9] Data Mining and Machine Learning Applications for Educational Big Data in the University
    Abe, Keisuke
    [J]. IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 350 - 355
  • [10] CryptoML: Secure Outsourcing of Big Data Machine Learning Applications
    Mirhoseini, Azalia
    Sadeghi, Ahmad-Reza
    Koushanfar, Farinaz
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HARDWARE ORIENTED SECURITY AND TRUST (HOST), 2016, : 149 - 154