Big Data Machine Learning using Apache Spark MLlib

被引:0
|
作者
Assefi, Mehdi [1 ]
Behravesh, Ehsun [2 ]
Liu, Guangchi [3 ]
Tafti, Ahmad P. [4 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
[2] IEEE Memebr, Kuala Lumpur, Malaysia
[3] Stratifyd Inc, Charlotte, NC 28208 USA
[4] Marshfield Clin Res Inst, Biomed Informat Res Ctr, Marshfield, WI 54449 USA
关键词
Apache Spark MLlib; Big Data Machine Learning; Big Data Analytics; Machine Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial intelligence, and particularly machine learning, has been used in many ways by the research community to turn a variety of diverse and even heterogeneous data sources into high quality facts and knowledge, providing premier capabilities to accurate pattern discovery. However, applying machine learning strategies on big and complex datasets is computationally expensive, and it consumes a very large amount of logical and physical resources, such as data file space, CPU, and memory. A sophisticated platform for efficient big data analytics is becoming more important these days as the data amount generated in a daily basis exceeds over quintillion bytes. Apache Spark MLlib is one of the most prominent platforms for big data analysis which offers a set of excellent functionalities for different machine learning tasks ranging from regression, classification, and dimension reduction to clustering and rule extraction. In this contribution, we explore, from the computational perspective, the expanding body of the Apache Spark MLlib 2.0 as an open-source, distributed, scalable, and platform independent machine learning library. Specifically, we perform several real world machine learning experiments to examine the qualitative and quantitative attributes of the platform. Furthermore, we highlight current trends in big data machine learning research and provide insights for future work.
引用
下载
收藏
页码:3492 / 3498
页数:7
相关论文
共 50 条
  • [21] Big Data Network Flow Processing Using Apache Spark
    Jerabek, Kamil
    Rysavy, Ondrej
    PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,
  • [22] Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark
    Alomari, Ebtesam
    Mehmood, Rashid
    Katib, Iyad
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1888 - 1895
  • [23] MLlib*: Fast Training of GLMs using Spark MLlib
    Zhang, Zhipeng
    Jiang, Jiawei
    Wu, Wentao
    Zhang, Ce
    Yu, Lele
    Cui, Bin
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1778 - 1789
  • [24] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [25] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [26] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [28] Testing of algorithms for anomaly detection in Big data using apache spark
    Lighari, Sheeraz Niaz
    Hussain, Dil Muhammad Akbar
    2017 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2017, : 97 - 100
  • [29] Efficient Big Data Analysis on a Single Machine using Apache Spark and Self-Organizing Map Libraries
    Andresic, David
    Saloun, Petr
    Anagnostopoulos, Ioannis
    2017 12TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2017), 2017, : 1 - 5
  • [30] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49