Big Data Machine Learning using Apache Spark MLlib

被引:0
|
作者
Assefi, Mehdi [1 ]
Behravesh, Ehsun [2 ]
Liu, Guangchi [3 ]
Tafti, Ahmad P. [4 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
[2] IEEE Memebr, Kuala Lumpur, Malaysia
[3] Stratifyd Inc, Charlotte, NC 28208 USA
[4] Marshfield Clin Res Inst, Biomed Informat Res Ctr, Marshfield, WI 54449 USA
关键词
Apache Spark MLlib; Big Data Machine Learning; Big Data Analytics; Machine Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial intelligence, and particularly machine learning, has been used in many ways by the research community to turn a variety of diverse and even heterogeneous data sources into high quality facts and knowledge, providing premier capabilities to accurate pattern discovery. However, applying machine learning strategies on big and complex datasets is computationally expensive, and it consumes a very large amount of logical and physical resources, such as data file space, CPU, and memory. A sophisticated platform for efficient big data analytics is becoming more important these days as the data amount generated in a daily basis exceeds over quintillion bytes. Apache Spark MLlib is one of the most prominent platforms for big data analysis which offers a set of excellent functionalities for different machine learning tasks ranging from regression, classification, and dimension reduction to clustering and rule extraction. In this contribution, we explore, from the computational perspective, the expanding body of the Apache Spark MLlib 2.0 as an open-source, distributed, scalable, and platform independent machine learning library. Specifically, we perform several real world machine learning experiments to examine the qualitative and quantitative attributes of the platform. Furthermore, we highlight current trends in big data machine learning research and provide insights for future work.
引用
收藏
页码:3492 / 3498
页数:7
相关论文
共 50 条
  • [1] MLlib: Machine learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D.B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. Journal of Machine Learning Research, 2016, 17
  • [2] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [3] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    [J]. 2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [4] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    [J]. DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [5] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [6] Performance evaluation of DNN with other machine learning techniques in a cluster using Apache Spark and MLlib
    JayaLakshmi, A. N. M.
    Kishore, K. V. Krishna
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (01) : 1311 - 1319
  • [7] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [8] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [9] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
    Boachie, Emmanuel
    Li, Chunlin
    [J]. INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20
  • [10] Mobile Big Data Analytics Using Deep Learning and Apache Spark
    Abu Alsheikh, Mohammad
    Niyato, Dusit
    Lin, Shaowei
    Tan, Hwee-Pink
    Han, Zhu
    [J]. IEEE NETWORK, 2016, 30 (03): : 22 - 29