Big data analytics on Apache Spark

被引:1
|
作者
Salloum S. [1 ]
Dautov R. [1 ]
Chen X. [1 ]
Peng P.X. [1 ]
Huang J.Z. [1 ]
机构
[1] Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong
关键词
Apache Spark; Big data; Cluster computing; Data analysis; Distributed and parallel computing; Graph analysis; Machine learning; Resilient Distributed Datasets; Stream processing;
D O I
10.1007/s41060-016-0027-9
中图分类号
学科分类号
摘要
Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics. © 2016, Springer International Publishing Switzerland.
引用
收藏
页码:145 / 164
页数:19
相关论文
共 50 条
  • [1] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [2] Big Data Analytics for the ATLAS EventIndex Project with Apache Spark
    Casani, Alvaro Fernandez
    Montoro, Carlos Garcia
    de la Hoz, Santiago Gonzalez
    Salt, Jose
    Sanchez, Javier
    Perez, Miguel Villaplana
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS, 2023, 2023
  • [3] Apache Spark a Big Data Analytics Platform for Smart Grid
    Shyam, R.
    Ganesh, Bharathi H. B.
    Kumar, Sachin S.
    Poornachandran, Prabaharan
    Soman, K. P.
    [J]. SMART GRID TECHNOLOGIES (ICSGT- 2015), 2015, 21 : 171 - 178
  • [4] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Condie, Tyson
    Kim, Miryung
    [J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
  • [5] Mobile Big Data Analytics Using Deep Learning and Apache Spark
    Abu Alsheikh, Mohammad
    Niyato, Dusit
    Lin, Shaowei
    Tan, Hwee-Pink
    Han, Zhu
    [J]. IEEE NETWORK, 2016, 30 (03): : 22 - 29
  • [6] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    [J]. 2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [7] Predictors of outpatients' no-show: big data analytics using apache spark
    Daghistani, Tahani
    AlGhamdi, Huda
    Alshammari, Riyad
    AlHazme, Raed H.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [8] Predictors of outpatients’ no-show: big data analytics using apache spark
    Tahani Daghistani
    Huda AlGhamdi
    Riyad Alshammari
    Raed H. AlHazme
    [J]. Journal of Big Data, 7
  • [9] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704
  • [10] Efficient Incremental Data Analytics with Apache Spark
    Gholamian, Sina
    Golab, Wojciech
    Ward, Paul A. S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2859 - 2868