Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引:2
|
作者
Choi, Unho [1 ]
Lee, Kyungyong [1 ]
机构
[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;
D O I
10.1109/TBDATA.2022.3199197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.
引用
收藏
页码:637 / 652
页数:16
相关论文
共 50 条
  • [1] Optimal Least-Squares Design of Sparse FIR Filters for Big-Data Signal Processing
    Nakamoto, Masayoshi
    Itani, Taro
    Konishi, Katsumi
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [2] Implementation of a Distributed Processing Engine for Spatial Big-Data Processing based on Batch and Stream
    Kim, Sang-Su
    Song, Kwaun-Sik
    2017 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2017, : 1196 - 1198
  • [3] Data Modifications in Blockchain Architecture for Big-Data Processing
    Tulkinbekov, Khikmatullo
    Kim, Deok-Hwan
    SENSORS, 2023, 23 (21)
  • [4] A big-data processing framework for uncertainties in transportation data
    Yang, Jie
    Ma, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [5] Analysis and Optimization of Big-Data Stream Processing
    Vakilinia, Shahin
    Zhang, Xinyao
    Qiu, Dongyu
    2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [6] Enabling Scientific Data Storage and Processing on Big-data Systems
    Biookaghazadeh, Saman
    Xu, Yiqi
    Zhou, Shujia
    Zhao, Ming
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1978 - 1984
  • [7] SPBD:Streamlining Big-Data Processing in Cloud Environments
    Tung Nguyen
    Jingwen Zhang
    Weisong Shi
    ZTE Communications, 2013, 11 (02) : 30 - 37
  • [8] Big-Data Processing Techniques and Their Challenges in Transport Domain
    Aftab Ahmed Chandio
    Nikos Tziritas
    Cheng-Zhong Xu
    ZTE Communications, 2015, 13 (01) : 50 - 59
  • [9] Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems
    Biookaghazadeh, Saman
    Zhou, Shujia
    Zhao, Ming
    2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 121 - 130
  • [10] A Novel Big-Data Processing Framwork for Healthcare Applications Big-Data-Healthcare-in-a-Box
    Rahman, Fuad
    Slepian, Marvin
    Mitra, Ari
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3548 - 3555