Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引:2
|
作者
Choi, Unho [1 ]
Lee, Kyungyong [1 ]
机构
[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;
D O I
10.1109/TBDATA.2022.3199197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.
引用
收藏
页码:637 / 652
页数:16
相关论文
共 50 条
  • [31] A high-throughput big-data orchestration and processing system for the High Energy Photon Source
    Li, Xiang
    Zhang, Yi
    Liu, Yu
    Li, Pengcheng
    Hu, Hao
    Wang, Liwen
    He, Ping
    Dong, Yuhui
    Zhang, Chenglong
    JOURNAL OF SYNCHROTRON RADIATION, 2023, 30 (Pt 6) : 1086 - 1091
  • [32] Implementation of Message Passing Interface in MANETs in Processing of Big Data
    Sinha, Vishal
    Raj, Tilak
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 113 - 117
  • [33] Bi-SON: Big-Data Self Organizing Network for Energy Efficient Ultra-Dense Small Cells
    Wang, Li-Chun
    Cheng, Shao-Hung
    Tsai, Ang-Hsun
    2016 IEEE 84TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2016,
  • [34] Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud
    Chao-Tung Yang
    Jung-Chun Liu
    Shuo-Tsung Chen
    Hsin-Wen Lu
    Journal of Medical Systems, 2017, 41
  • [35] Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud
    Yang, Chao-Tung
    Liu, Jung-Chun
    Chen, Shuo-Tsung
    Lu, Hsin-Wen
    JOURNAL OF MEDICAL SYSTEMS, 2017, 41 (10)
  • [36] Fast Band-Limited Sparse Signal Reconstruction Algorithms for Big Data Processing
    Wang, Longhui
    Wang, Qiexiang
    Wang, Jian
    Zhang, Xudong
    IEEE SENSORS JOURNAL, 2023, 23 (12) : 13084 - 13099
  • [37] Data-Oriented Language Implementation of the Lattice-Boltzmann Method for Dense and Sparse Geometries
    Tomczak, Tadeusz
    APPLIED SCIENCES-BASEL, 2021, 11 (20):
  • [38] Optimal operator deployment and replication for elastic distributed data stream processing
    Cardellini, Valeria
    Lo Presti, Francesco
    Nardelli, Matteo
    Russo, Gabriele Russo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (09):
  • [39] Effective-Capacity Based Gaming for Optimal Power and Spectrum Allocations Over Big-Data Virtual Wireless Networks
    Zhu, Qixuan
    Zhang, Xi
    2015 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2015,
  • [40] Research and Implementation of Efficient Parallel Processing of Big Data at TELBE User Facility
    Bawatna, Mohammed
    Green, Bertram
    Kovalev, Sergey
    Deinert, Jan-Christoph
    Knodel, Oliver
    Spallek, Rainer G.
    PROCEEDINGS OF THE 2019 SUMMER SIMULATION CONFERENCE (SUMMERSIM '19), 2019,