Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引:2
|
作者
Choi, Unho [1 ]
Lee, Kyungyong [1 ]
机构
[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;
D O I
10.1109/TBDATA.2022.3199197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.
引用
收藏
页码:637 / 652
页数:16
相关论文
共 50 条
  • [21] Benchmarking Elastic Query Processing on Big Data
    Vorona, Dimitri
    Funke, Florian
    Kemper, Alfons
    Neumann, Thomas
    BIG DATA BENCHMARKING, WBDB 2014, 2015, 8991 : 37 - 44
  • [22] On-Line Big-Data Processing for Visual Analytics with Argus-Panoptes
    Vlantis, Panayiotis, I
    Delis, Alex
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING (ALGOCLOUD 2018), 2019, 11409 : 102 - 117
  • [23] Brief Announcement: Deadline-Aware Scheduling of Big-Data Processing Jobs
    Bodik, Peter
    Menache, Ishai
    Naor, Joseph
    Yaniv, Jonathan
    PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 211 - 213
  • [24] Application of Open-Source Big-Data Framework in Marine Information Processing
    Gao, Xiaoxing
    Wang, Hanxin
    Li, Xiaoxia
    JOURNAL OF COASTAL RESEARCH, 2019, : 187 - 190
  • [25] Design and Implementation of Big-Data Analysis Application on Spark for Distribution Network Based on Data Interception
    Zhang, Pan
    Ding, Lingyun
    Jiang, Ning
    Ling, Wanshui
    Ding, Yi
    CLEANER ENERGY FOR CLEANER CITIES, 2018, 152 : 1170 - 1175
  • [26] Semantic Interoperability at Big-Data Scale with the open62541 OPC UA Implementation
    Pfrommer, Julius
    INTEROPERABILITY AND OPEN-SOURCE SOLUTIONS FOR THE INTERNET OF THINGS (INTEROSS-IOT 2016), 2017, 10218 : 173 - 185
  • [27] MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech
    Paschalides, Demetris
    Stephanidis, Dimosthenis
    Andreou, Andreas
    Orphanou, Kalia
    Pallis, George
    Dikaiakos, Marios D.
    Markatos, Evangelos
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2020, 20 (02)
  • [28] Cost optimization for deadline-aware scheduling of big-data processing jobs on clouds
    Zheng, Wei
    Qin, Yingsheng
    Emmanuel, Bugingo
    Zhang, Dongzhan
    Chen, Jinjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 244 - 255
  • [29] Evidence Updating for Stream-Processing in Big-Data: Robust Conditioning in Soft and Hard Data Fusion Environments
    Wickramarathne, Thanuka
    2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 327 - 333
  • [30] mBalloon: Enabling Elastic Memory Management for Big Data Processing
    Chen, Wei
    Pi, Aidi
    Rao, Jia
    Zhou, Xiaobo
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 654 - 654