Big data analytics on Apache Spark

被引：1

作者：

Salloum S. ^{[1
]}

Dautov R. ^{[1
]}

Chen X. ^{[1
]}

Peng P.X. ^{[1
]}

Huang J.Z. ^{[1
]}

机构：

[1] Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong

来源：

International Journal of Data Science and Analytics | 2016年 / 1卷 / 3-4期

关键词：

Apache Spark; Big data; Cluster computing; Data analysis; Distributed and parallel computing; Graph analysis; Machine learning; Resilient Distributed Datasets; Stream processing;

D O I：

10.1007/s41060-016-0027-9

中图分类号：

学科分类号：

摘要：

Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics. © 2016, Springer International Publishing Switzerland.

引用

页码：145 / 164

页数：19

共 50 条

[1] Big Data Software Analytics with Apache Spark
Gousios, Georgios
[J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
[2] Big Data Analytics for the ATLAS EventIndex Project with Apache Spark
Casani, Alvaro Fernandez
Montoro, Carlos Garcia
de la Hoz, Santiago Gonzalez
Salt, Jose
Sanchez, Javier
Perez, Miguel Villaplana
[J]. COMPUTATIONAL AND MATHEMATICAL METHODS, 2023, 2023
[3] Apache Spark a Big Data Analytics Platform for Smart Grid
Shyam, R.
Ganesh, Bharathi H. B.
Kumar, Sachin S.
Poornachandran, Prabaharan
Soman, K. P.
[J]. SMART GRID TECHNOLOGIES (ICSGT- 2015), 2015, 21 : 171 - 178
[4] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
Gulzar, Muhammad Ali
Interlandi, Matteo
Condie, Tyson
Kim, Miryung
[J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
[5] Mobile Big Data Analytics Using Deep Learning and Apache Spark
Abu Alsheikh, Mohammad
Niyato, Dusit
Lin, Shaowei
Tan, Hwee-Pink
Han, Zhu
[J]. IEEE NETWORK, 2016, 30 (03): : 22 - 29
[6] Big data Predictive Analytics for Apache Spark using Machine Learning
Junaid, Muhammad
Wagan, Shiraz Ali
Qureshi, Nawab Muhammad Faseeh
Nam, Choon Sung
Shin, Dong Ryeol
[J]. 2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
[7] Predictors of outpatients' no-show: big data analytics using apache spark
Daghistani, Tahani
AlGhamdi, Huda
Alshammari, Riyad
AlHazme, Raed H.
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)
[8] Predictors of outpatients’ no-show: big data analytics using apache spark
Tahani Daghistani
Huda AlGhamdi
Riyad Alshammari
Raed H. AlHazme
[J]. Journal of Big Data, 7
[9] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
Hafez, Manar Mohamed
Shehab, Mohamed Elemam
El Fakharany, Essam
Hegazy, Abd El Ftah Abdel Ghfar
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
[10] Efficient Incremental Data Analytics with Apache Spark
Gholamian, Sina
Golab, Wojciech
Ward, Paul A. S.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2859 - 2868

← 1 2 3 4 5 →