Unified Programming Model and Software Framework for Big Data Machine Learning and Data Analytics

被引:3
|
作者
Gu, Rong [1 ]
Tang, Yun [1 ]
Dong, Qianhao [1 ]
Wang, Zhaokang [1 ]
Liu, Zhiqiang [1 ]
Wang, Shuai [1 ]
Yuan, Chunfeng [1 ]
Huang, Yihua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210093, Jiangsu, Peoples R China
关键词
big data analysis; ease-to-use; matrix computation; parallel algorithm; MATRIX MULTIPLICATION;
D O I
10.1109/COMPSAC.2015.275
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In a new era of Big Data, the rapid growth of the applications, such as social media and web-search, requires efficient and scalable machine learning and statistical analytical algorithms. However, there lacks easy-to-use and efficient software frameworks or systems that can support fast development of such big data analytical algorithms. To solve these problems, we propose Octopus, an easy-to-use and efficient analytical system for big data. Octopus allows data analysts conduct complex data analytics for big data with traditional programming languages and methods in an easy and efficient way. To achieve the goal of ease-to-use, we propose a matrix-based unified programming model, which is the core of many data-intensive statistical applications such as numerical analysis and data mining. Further, in order to improve the performance, the Octopus software framework adopts various distributed computing platforms, including Hadoop MapReduce, Spark and MPI. On these computing platforms, we design several parallel matrix computation algorithms, which are suitable for various scenarios. Finally, the features of Octopus are encapsulated into a library with matrix-based APIs and exposed to users as an R package. R is a widely-used statistical programming language and supports diversified data analysis tasks through extension packages. Experimental results show that Octopus achieves efficient performance and near linear scalability.
引用
收藏
页码:562 / 567
页数:6
相关论文
共 50 条
  • [1] Machine learning for big data analytics
    [J]. Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [2] Machine learning for Big Data analytics in plants
    Ma, Chuang
    Zhang, Hao Helen
    Wang, Xiangfeng
    [J]. TRENDS IN PLANT SCIENCE, 2014, 19 (12) : 798 - 808
  • [3] Machine Learning Technologies for Big Data Analytics
    Gandomi, Amir H.
    Chen, Fang
    Abualigah, Laith
    [J]. ELECTRONICS, 2022, 11 (03)
  • [4] Big Data, Predictive Analytics and Machine Learning
    Ongsulee, Pariwat
    Chotchaung, Veena
    Bamrungsi, Eak
    Rodcheewit, Thanaporn
    [J]. 2018 16TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2018, : 37 - 42
  • [5] A Unified Scaling Model in the Era of Big Data Analytics
    Li, Zhongwei
    Duan, Feng
    Che, Hao
    [J]. 2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 67 - 77
  • [6] A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    [J]. PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 635 - 639
  • [7] Big data analytics and machine learning: 2015 and beyond
    Passos, Ives Cavalcante
    Mwangi, Benson
    Kapczinski, Flavio
    [J]. LANCET PSYCHIATRY, 2016, 3 (01): : 13 - 15
  • [8] Machine learning with big data analytics for cloud security
    Mohammad, Abdul Salam
    Pradhan, Manas Ranjan
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2021, 96
  • [9] Advanced Machine Learning Applications in Big Data Analytics
    Li, Taiyong
    Deng, Wu
    Wu, Jiang
    [J]. ELECTRONICS, 2023, 12 (13)
  • [10] Machine learning and big data analytics in mood disorders
    Yang, Lu
    Chen, Jun
    [J]. FRONTIERS IN PSYCHIATRY, 2024, 15