Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引：2

作者：

Choi, Unho ^{[1
]}

Lee, Kyungyong ^{[1
]}

机构：

[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea

来源：

IEEE TRANSACTIONS ON BIG DATA | 2023年 / 9卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;

D O I：

10.1109/TBDATA.2022.3199197

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.

引用

页码：637 / 652

页数：16

共 50 条

[1] Optimal Least-Squares Design of Sparse FIR Filters for Big-Data Signal Processing
Nakamoto, Masayoshi
Itani, Taro
Konishi, Katsumi
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[2] Implementation of a Distributed Processing Engine for Spatial Big-Data Processing based on Batch and Stream
Kim, Sang-Su
Song, Kwaun-Sik
2017 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2017, : 1196 - 1198
[3] Data Modifications in Blockchain Architecture for Big-Data Processing
Tulkinbekov, Khikmatullo
Kim, Deok-Hwan
SENSORS, 2023, 23 (21)
[4] A big-data processing framework for uncertainties in transportation data
Yang, Jie
Ma, Jun
2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
[5] Analysis and Optimization of Big-Data Stream Processing
Vakilinia, Shahin
Zhang, Xinyao
Qiu, Dongyu
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[6] Enabling Scientific Data Storage and Processing on Big-data Systems
Biookaghazadeh, Saman
Xu, Yiqi
Zhou, Shujia
Zhao, Ming
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1978 - 1984
[7] SPBD:Streamlining Big-Data Processing in Cloud Environments
Tung Nguyen
Jingwen Zhang
Weisong Shi
ZTE Communications, 2013, 11 (02) : 30 - 37
[8] Big-Data Processing Techniques and Their Challenges in Transport Domain
Aftab Ahmed Chandio
Nikos Tziritas
Cheng-Zhong Xu
ZTE Communications, 2015, 13 (01) : 50 - 59
[9] Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems
Biookaghazadeh, Saman
Zhou, Shujia
Zhao, Ming
2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 121 - 130
[10] A Novel Big-Data Processing Framwork for Healthcare Applications Big-Data-Healthcare-in-a-Box
Rahman, Fuad
Slepian, Marvin
Mitra, Ari
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3548 - 3555

← 1 2 3 4 5 →