Big data Predictive Analytics for Apache Spark using Machine Learning

被引:0
|
作者
Junaid, Muhammad [1 ]
Wagan, Shiraz Ali [1 ]
Qureshi, Nawab Muhammad Faseeh [2 ]
Nam, Choon Sung [1 ]
Shin, Dong Ryeol [1 ]
机构
[1] Sungkyunkwan Univ, Elect & Comp Engn, Suwon, South Korea
[2] Sungkyunkwan Univ, Dept Comp Educ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
apache-spark; clusters; predictive analysis; Mllib; pandas; 5Vs of big data; PLACEMENT STRATEGY; HADOOP;
D O I
10.1109/GCWOT49901.2020.9391620
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
In today's digital world data is producing at a rapid speed and handling this massive diverse data become more challenging. The environment of big data is capable of handling data efficiently from data warehouses and in real-time. In Big data environment, Apache Spark is cluster-based, open-source computing technology explicitly designed for bulky data handling. Apache spark services are to perform composite Analytics through in-memory processing. This plays an active role in making meaningful exploration through machine learning and processes a large amount of data. Machine learning API is known as Mllib. It is highly prominent and efficient for big data platforms also offers excellent functionalities. In this paper, we have performed an experiment to look at the analytical qualities of Mllib in the apache spark environment. Likewise, we have highlighted the modern tendencies of Machine learning in big data studies and provides an understanding of upcoming work.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Big Data Network Flow Processing Using Apache Spark
    Jerabek, Kamil
    Rysavy, Ondrej
    [J]. PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,
  • [32] Using Machine Learning and Big Data Analytics to Prioritize Outpatients in HetNets
    Hadi, Mohammed
    Lawey, Ahmed
    El-Gorashi, Taisir
    Elmirghani, Jaafar
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 726 - 731
  • [33] Big data analytics and classification of cardiovascular disease using machine learning
    Narejo, Sanam
    Shaikh, Anoud
    Memon, Mehak Maqbool
    Mahar, Kainat
    Aleem, Zonera
    Zardari, Bisharat
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 2025 - 2033
  • [34] A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 635 - 639
  • [35] Using Semantics in Predictive Big Data Analytics
    Nural, Mustafa V.
    Cotterell, Michael E.
    Miller, John A.
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 254 - 261
  • [36] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [37] Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark
    Alomari, Ebtesam
    Mehmood, Rashid
    Katib, Iyad
    [J]. 2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1888 - 1895
  • [38] Predictive Analytics of Sensor Data Using Distributed Machine Learning Techniques
    Kejela, Girma
    Esteves, Rui Maximo
    Rong, Chunming
    [J]. 2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, : 626 - 631
  • [39] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [40] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    [J]. 2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225