Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引:2
|
作者
Choi, Unho [1 ]
Lee, Kyungyong [1 ]
机构
[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;
D O I
10.1109/TBDATA.2022.3199197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.
引用
收藏
页码:637 / 652
页数:16
相关论文
共 50 条
  • [41] Research and Implementation of Efficient Parallel Processing of Big Data at TELBE User Facility
    Bawatna, Mohammed
    Green, Bertram
    Kovalev, Sergey
    Deinert, Jan-Christoph
    Knodel, Oliver
    Spallek, Rainer G.
    2019 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (SPECTS), 2019,
  • [42] Infrastructure-Assisted Cooperative Multi-UAV Deep Reinforcement Energy Trading Learning for Big-Data Processing
    Jung, Soyi
    Yun, Won Joon
    Kim, Joongheon
    Kim, Jae-Hyun
    35TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2021), 2021, : 159 - 162
  • [43] Sleepiness should be reinvestigated through the lens of clinical neurophysiology: A mixed expertal and big-data Natural Language Processing approach
    Martin, Vincent P.
    Gauld, Christophe
    Taillard, Jacques
    Peter-Derex, Laure
    Lopez, Regis
    Micoulaud-Franchi, Jean-Arthur
    NEUROPHYSIOLOGIE CLINIQUE-CLINICAL NEUROPHYSIOLOGY, 2024, 54 (02):
  • [44] A benchmark approach and its toolkit for online scheduling of multiple deadline-constrained workflows in big-data processing systems
    Zhang, Dongzhan
    Yan, Wenjing
    Bugingo, Emmanuel
    Zheng, Wei
    Chen, Jinjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 85 : 222 - 234
  • [45] OPTIMAL ASSIGNMENT STRATEGY FOR DYNAMIC WORKFLOW OF REMOTE SENSING BIG DATA PROCESSING
    Zhang, Sheng
    Xue, Yong
    Ming, Yang
    Zhang, Xiaopeng
    Jin, Chunlin
    Jiang, Xingxing
    Zhou, Xiran
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 4042 - 4045
  • [46] Design and Implementation of an Optimal Travel Route Recommender System on Big Data for Tourists in Jeju
    Hang, Lei
    Kang, Sang-Hun
    Jin, Wenquan
    Kim, Do-Hyeun
    PROCESSES, 2018, 6 (08)
  • [47] Design and Implementation of Smart City Big Data Processing Platform Based On Distributed Architecture
    Ma, Shuang-mei
    Liang, Zheng-li
    2016 INTERNATIONAL CONFERENCE ON BUSINESS AND MANAGEMENT (ICBM 2016), 2016, : 43 - 48
  • [48] Design and Implementation of Smart City Big Data Processing Platform Based On Distributed Architecture
    Ma, Shuangmei
    Liang, Zhengli
    2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 428 - 433
  • [49] Implementation of on-process aggregation for Efficient Big Data Processing in Hadoop MapReduce Environment
    Pol, Vidya V.
    Patil, S. M.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 445 - 449
  • [50] A Machine Learning Methodology for Optimal Big Data Processing in Advanced Smart City Environments
    Cuzzocrea, Alfredo
    Canade, Luigi
    Nicolicchia, Riccardo
    Roldo, Luca
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2023, PT I, 2023, 13956 : 713 - 730