Big data execution time based on Spark Machine Learning Libraries

被引:1
|
作者
Garate-Escamilla, Anna Karen [1 ]
Hajjam El Hassani, Amir [1 ]
Andres, Emmanuel [1 ,2 ]
机构
[1] Univ Bourgogne Franche Comte, UTBM, Nanomed Lab, 12 Rue Thierry Mieg,Rue Edouard Branly, F-90000 Belfort, France
[2] CHRU Strasbourg, Serv Med Interne Diabet & Malad Metab Clin Med B, 5 Ave Moliere, F-67200 Strasbourg, France
关键词
Machine Learning; Apache Spark; Performance prediction model; Execution time prediction;
D O I
10.1145/3358505.3358519
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper focuses on exploring the time consumption of supervised and unsupervised models of Apache Spark framework in massive datasets. Big Data analytics has been relevant in the industry due to the need to convert information into knowledge. Among the challenge of big data is the creation of strategies to improve the execution costs of running machine learning models to make a prediction. Apache Spark is a powerful in-memory platform that offers an extensive machine learning library for regression, classification, clustering, and rule extraction. This investigation, from a computation cost perspective, performs different experiments using real datasets. The main contribution of the paper is to compare the execution time of different machine learning models, such as random forests, decision tree, logistic regression, linear support vector machine, and kNN. The present work expects to combine the areas of big data and machine learning, comparing the results with different configurations and the use of the optimization methods, cache and persist. The evaluation experiments show that logistic regression performed the shortest execution time of the Spark MLlib models.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [1] A Machine Learning Approach for Predicting Execution Time of Spark Jobs
    Mustafa, Sara
    Elghandour, Iman
    Ismail, Mohamed A.
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (04) : 3767 - 3778
  • [2] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [3] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [4] SPARK-A Big Data Processing Platform for Machine Learning
    Fu, Jian
    Sun, Junwei
    Wang, Kaiyuan
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 48 - 51
  • [5] Research on Visual Machine Learning Algorithms Based on Apache Spark in Big Data Environment
    Wang, Jialin
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 144 - 144
  • [6] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    [J]. 2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [7] A Research Study on Running Machine Learning Algorithms on Big Data with Spark
    Kerestely, Arpad
    Baicoianu, Alexandra
    Bocu, Razvan
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 307 - 318
  • [8] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [9] Applying spark based machine learning model on streaming big data for health status prediction
    Nair, Lekha R.
    Shetty, Sujala D.
    Shetty, Siddhanth D.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 65 : 393 - 399
  • [10] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
    Boachie, Emmanuel
    Li, Chunlin
    [J]. INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20