Dense or Sparse : Elastic SPMM Implementation for Optimal Big-Data Processing

被引：2

作者：

Choi, Unho ^{[1
]}

Lee, Kyungyong ^{[1
]}

机构：

[1] Kookmin Univ, Dept Comp Sci, Seoul 02707, South Korea

来源：

IEEE TRANSACTIONS ON BIG DATA | 2023年 / 9卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Sparse matrices; Indexes; Sparks; Predictive models; Machine learning algorithms; Task analysis; Cluster computing; Sparse matrix multiplication; spark optimization; optimal SPMM recommendation; NEURAL-NETWORKS; PARALLEL;

D O I：

10.1109/TBDATA.2022.3199197

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many real-world graph datasets can be represented using a sparse matrix format, and they are widely used for various big-data applications. The multiplication of two sparse matrices (SPMM) is a major kernel for various machine learning algorithms when using a sparsely expressed dataset. Apache Spark, a general-purpose big-data processing engine, includes the SPMM operation in its linear algebra package. The default Spark SPMM implementation, however, always converts a right sparse matrix to a dense format before performing multiplication, which can result in significant performance overhead for diverse SPMM scenarios. To address a limitation of the current Spark implementation, we describe an SPMM implementation that keeps the right matrix in a Compressed Sparse Column (CSC) format and propose an SPMM task latency prediction model based on a Deep Neural Network (DNN) architecture. Using the SPMM latency prediction model, we implement an elastic SPMM implementation recommendation service, which we name DoS (Dense or Sparse). The proposed DoS recommends an optimal SPMM implementation method of either transforming a right matrix to a dense format or keeping it as a sparse format during the multiplication. Through evaluation of the proposed system using a real-world graph reveals that the proposed service can improve the SPMM latency of default Spark implementation by 2.2 times while shortening the overall execution time.

引用

页码：637 / 652

页数：16

共 50 条

[21] Benchmarking Elastic Query Processing on Big Data
Vorona, Dimitri
Funke, Florian
Kemper, Alfons
Neumann, Thomas
BIG DATA BENCHMARKING, WBDB 2014, 2015, 8991 : 37 - 44
[22] On-Line Big-Data Processing for Visual Analytics with Argus-Panoptes
Vlantis, Panayiotis, I
Delis, Alex
ALGORITHMIC ASPECTS OF CLOUD COMPUTING (ALGOCLOUD 2018), 2019, 11409 : 102 - 117
[23] Brief Announcement: Deadline-Aware Scheduling of Big-Data Processing Jobs
Bodik, Peter
Menache, Ishai
Naor, Joseph
Yaniv, Jonathan
PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 211 - 213
[24] Application of Open-Source Big-Data Framework in Marine Information Processing
Gao, Xiaoxing
Wang, Hanxin
Li, Xiaoxia
JOURNAL OF COASTAL RESEARCH, 2019, : 187 - 190
[25] Design and Implementation of Big-Data Analysis Application on Spark for Distribution Network Based on Data Interception
Zhang, Pan
Ding, Lingyun
Jiang, Ning
Ling, Wanshui
Ding, Yi
CLEANER ENERGY FOR CLEANER CITIES, 2018, 152 : 1170 - 1175
[26] Semantic Interoperability at Big-Data Scale with the open62541 OPC UA Implementation
Pfrommer, Julius
INTEROPERABILITY AND OPEN-SOURCE SOLUTIONS FOR THE INTERNET OF THINGS (INTEROSS-IOT 2016), 2017, 10218 : 173 - 185
[27] MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech
Paschalides, Demetris
Stephanidis, Dimosthenis
Andreou, Andreas
Orphanou, Kalia
Pallis, George
Dikaiakos, Marios D.
Markatos, Evangelos
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2020, 20 (02)
[28] Cost optimization for deadline-aware scheduling of big-data processing jobs on clouds
Zheng, Wei
Qin, Yingsheng
Emmanuel, Bugingo
Zhang, Dongzhan
Chen, Jinjun
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 244 - 255
[29] Evidence Updating for Stream-Processing in Big-Data: Robust Conditioning in Soft and Hard Data Fusion Environments
Wickramarathne, Thanuka
2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 327 - 333
[30] mBalloon: Enabling Elastic Memory Management for Big Data Processing
Chen, Wei
Pi, Aidi
Rao, Jia
Zhou, Xiaobo
PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 654 - 654

← 1 2 3 4 5 →