Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm

被引:1
|
作者
Boachie, Emmanuel [1 ,2 ]
Li, Chunlin [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China
[2] Kumasi Tech Univ, Dept Comp Sci, Box 854, Kumasi, Ghana
关键词
spark streaming; big data processing; machine learning algorithm;
D O I
10.1504/IJCEELL.2019.099217
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Data processing is an effective tool for educational sector, which can improve admission selection procedures and decisions. Most research papers focus on computational and theoretical aspect of education though little effort have been put on technological aspect of applying data mining techniques on students admission process. We therefore design a simple spark streaming framework together with machine learning algorithm to guide admission processing. We implement the spark streaming model and the proposed machine learning algorithm in a selected university using its admissions' data. The focus is on the number of students that can be admitted and those that should be rejected to reduce time and cost. The case study we evaluated show the practical usefulness of Spark streaming and machine learning algorithm for data processing in a real-time to reduce time and cost. The experiment results also confirm meaningful graphical interpretation of data using spark streaming and machine learning algorithm for students' selection for admissions.
引用
收藏
页码:5 / 20
页数:16
相关论文
共 50 条
  • [1] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [2] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [3] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [4] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [5] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [6] Apache Spark: A Unified Engine for Big Data Processing
    Zaharia, Matei
    Xin, Reynold S.
    Wendell, Patrick
    Das, Tathagata
    Armbrust, Michael
    Dave, Ankur
    Meng, Xiangrui
    Rosen, Josh
    Venkataraman, Shivaram
    Franklin, Michael J.
    Ghodsi, Ali
    Gonzalez, Joseph
    Shenker, Scott
    Stoica, Ion
    COMMUNICATIONS OF THE ACM, 2016, 59 (11) : 56 - 65
  • [7] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [8] SPARK-A Big Data Processing Platform for Machine Learning
    Fu, Jian
    Sun, Junwei
    Wang, Kaiyuan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 48 - 51
  • [9] Scalable Manifold Learning for Big Data with Apache Spark
    Schoeneman, Frank
    Zola, Jaroslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
  • [10] Applying Apache Spark on Streaming Big Data for Health Status Prediction
    Ebada, Ahmed Ismail
    Elhenawy, Ibrahim
    Jeong, Chang-Won
    Nam, Yunyoung
    Elbakry, Hazem
    Abdelrazek, Samir
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3511 - 3527