Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm

被引：1

作者：

Boachie, Emmanuel ^{[1
,2
]}

Li, Chunlin ^{[1
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China

[2] Kumasi Tech Univ, Dept Comp Sci, Box 854, Kumasi, Ghana

来源：

INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING | 2019年 / 29卷 / 1-2期

关键词：

spark streaming; big data processing; machine learning algorithm;

D O I：

10.1504/IJCEELL.2019.099217

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Data processing is an effective tool for educational sector, which can improve admission selection procedures and decisions. Most research papers focus on computational and theoretical aspect of education though little effort have been put on technological aspect of applying data mining techniques on students admission process. We therefore design a simple spark streaming framework together with machine learning algorithm to guide admission processing. We implement the spark streaming model and the proposed machine learning algorithm in a selected university using its admissions' data. The focus is on the number of students that can be admitted and those that should be rejected to reduce time and cost. The case study we evaluated show the practical usefulness of Spark streaming and machine learning algorithm for data processing in a real-time to reduce time and cost. The experiment results also confirm meaningful graphical interpretation of data using spark streaming and machine learning algorithm for students' selection for admissions.

引用

页码：5 / 20

页数：16

共 50 条

[1] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
Hai, Ameen Abdel
Forouraghi, Babak
BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
[2] Big Data Machine Learning using Apache Spark MLlib
Assefi, Mehdi
Behravesh, Ehsun
Liu, Guangchi
Tafti, Ahmad P.
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
[3] Big Spatial Data Processing With Apache Spark
Boyi Shangguan
Peng Yue
Wu, Zhaoyan
Jiang, Liangcun
2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
[4] Apache Spark: A Big Data Processing Engine
Shaikh, Eman
Mohiuddin, Iman
Alufaisan, Yasmeen
Nahvi, Irum
2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
[5] Big data Predictive Analytics for Apache Spark using Machine Learning
Junaid, Muhammad
Wagan, Shiraz Ali
Qureshi, Nawab Muhammad Faseeh
Nam, Choon Sung
Shin, Dong Ryeol
2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
[6] Apache Spark: A Unified Engine for Big Data Processing
Zaharia, Matei
Xin, Reynold S.
Wendell, Patrick
Das, Tathagata
Armbrust, Michael
Dave, Ankur
Meng, Xiangrui
Rosen, Josh
Venkataraman, Shivaram
Franklin, Michael J.
Ghodsi, Ali
Gonzalez, Joseph
Shenker, Scott
Stoica, Ion
COMMUNICATIONS OF THE ACM, 2016, 59 (11) : 56 - 65
[7] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
Mogha, Garima
Ahlawat, Khyati
Singh, Amit Prakash
DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
[8] SPARK-A Big Data Processing Platform for Machine Learning
Fu, Jian
Sun, Junwei
Wang, Kaiyuan
2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 48 - 51
[9] Scalable Manifold Learning for Big Data with Apache Spark
Schoeneman, Frank
Zola, Jaroslaw
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
[10] Applying Apache Spark on Streaming Big Data for Health Status Prediction
Ebada, Ahmed Ismail
Elhenawy, Ibrahim
Jeong, Chang-Won
Nam, Yunyoung
Elbakry, Hazem
Abdelrazek, Samir
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3511 - 3527

← 1 2 3 4 5 →