Big data Predictive Analytics for Apache Spark using Machine Learning

被引:0
|
作者
Junaid, Muhammad [1 ]
Wagan, Shiraz Ali [1 ]
Qureshi, Nawab Muhammad Faseeh [2 ]
Nam, Choon Sung [1 ]
Shin, Dong Ryeol [1 ]
机构
[1] Sungkyunkwan Univ, Elect & Comp Engn, Suwon, South Korea
[2] Sungkyunkwan Univ, Dept Comp Educ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
apache-spark; clusters; predictive analysis; Mllib; pandas; 5Vs of big data; PLACEMENT STRATEGY; HADOOP;
D O I
10.1109/GCWOT49901.2020.9391620
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
In today's digital world data is producing at a rapid speed and handling this massive diverse data become more challenging. The environment of big data is capable of handling data efficiently from data warehouses and in real-time. In Big data environment, Apache Spark is cluster-based, open-source computing technology explicitly designed for bulky data handling. Apache spark services are to perform composite Analytics through in-memory processing. This plays an active role in making meaningful exploration through machine learning and processes a large amount of data. Machine learning API is known as Mllib. It is highly prominent and efficient for big data platforms also offers excellent functionalities. In this paper, we have performed an experiment to look at the analytical qualities of Mllib in the apache spark environment. Likewise, we have highlighted the modern tendencies of Machine learning in big data studies and provides an understanding of upcoming work.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [2] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [3] Mobile Big Data Analytics Using Deep Learning and Apache Spark
    Abu Alsheikh, Mohammad
    Niyato, Dusit
    Lin, Shaowei
    Tan, Hwee-Pink
    Han, Zhu
    [J]. IEEE NETWORK, 2016, 30 (03): : 22 - 29
  • [4] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [5] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    [J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [6] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [7] Big Data, Predictive Analytics and Machine Learning
    Ongsulee, Pariwat
    Chotchaung, Veena
    Bamrungsi, Eak
    Rodcheewit, Thanaporn
    [J]. 2018 16TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2018, : 37 - 42
  • [8] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    [J]. DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [9] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [10] Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
    Alotaibi, Shoayee
    Mehmood, Rashid
    Katib, Iyad
    Rana, Omer
    Albeshri, Aiiad
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):