Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms

被引：16

作者：

Gu, Rong ^{[1
]}

Tang, Yun ^{[1
]}

Tian, Chen ^{[1
]}

Zhou, Hucheng ^{[2
]}

Li, Guanru ^{[2
]}

Zheng, Xudong ^{[2
]}

Huang, Yihua ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210000, Jiangsu, Peoples R China

[2] Microsoft Res, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2017年 / 28卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Parallel matrix multiplication; data-parallel algorithms; machine learning library;

D O I：

10.1109/TPDS.2017.2686384

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Matrix multiplication is a dominant but very time-consuming operation in many big data analytic applications. Thus its performance optimization is an important and fundamental research issue. The performance of large-scale matrix multiplication on distributed data-parallel platforms is determined by both computation and IO costs. For existing matrix multiplication execution strategies, when the execution concurrency scales up above a threshold, their execution performance deteriorates quickly because the increase of the IO cost outweighs the decrease of the computation cost. This paper presents a novel parallel execution strategy CRMM (Concurrent Replication-based Matrix Multiplication) along with a parallel algorithm, Marlin, for large-scale matrix multiplication on data-parallel platforms. The CRMM strategy exploits higher execution concurrency for sub-block matrix multiplication with the same IO cost. To further improve the performance of Marlin, we also propose a number of novel system-level optimizations, including increasing the concurrency of local data exchange by calling native library in batch, reducing the overhead of block matrix transformation, and reducing disk heavy shuffle operations by exploiting the semantics of matrix computation. We have implemented Marlin as a library along with a set of related matrix operations on Spark and also contributed Marlin to the open-source community. For large-sized matrix multiplication, Marlin outperforms existing systems including Spark MLlib, SystemML and SciDB, with about 1.29x, 3.53x and 2.21x speedup on average, respectively. The evaluation upon a real-world DNN workload also indicates that Marlin outperforms above systems by about 12.8x, 5.1x and 27.2x speedup, respectively.

引用

页码：2539 / 2552

页数：14

共 50 条

[1] Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Quintin, Jean-Noel
Hasanov, Khalid
Lastovetsky, Alexey
[J]. 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 754 - 762
[2] Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Hasanov, Khalid
Quintin, Jean-Noel
Lastovetsky, Alexey
[J]. JOURNAL OF SUPERCOMPUTING, 2015, 71 (11): : 3991 - 4014
[3] Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Khalid Hasanov
Jean-Noël Quintin
Alexey Lastovetsky
[J]. The Journal of Supercomputing, 2015, 71 : 3991 - 4014
[4] Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
Acer, Seher
Selvitopi, Oguz
Aykanat, Cevdet
[J]. PARALLEL COMPUTING, 2016, 59 : 71 - 96
[5] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
Bo Zhao
Hucheng Zhou
Guoqiang Li
Yihua Huang
[J]. Big Data Mining and Analytics, 2018, (01) : 57 - 74
[6] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
Zhao, Bo
Zhou, Hucheng
Li, Guoqiang
Huang, Yihua
[J]. BIG DATA MINING AND ANALYTICS, 2018, 1 (01): : 57 - 74
[7] Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation
Gu, Rong
Zuo, Zhiqiang
Jiang, Xi
Yin, Han
Wang, Zhaokang
Wang, Linzhang
Li, Xuandong
Huang, Yihua
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (04) : 867 - 883
[8] On Execution Platforms for Large-Scale Aggregate Computing
Viroli, Mirko
Casadei, Roberto
Pianini, Danilo
[J]. UBICOMP'16 ADJUNCT: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 1321 - 1326
[9] On Distributed Multiplication of Large-Scale Matrices
Glushan, V. M.
Lozovoy, A. Yu
[J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
[10] ParNCL and ParGAL: Data-parallel tools for postprocessing of large-scale Earth science data
Jacob, Robert
Krishna, Jayesh
Xu, Xiabing
Tautges, Tim
Grindeanu, Iulian
Latham, Rob
Peterson, Kara
Bochev, Pavel
Haley, Mary
Brown, David
Brownrigg, Richard
Shea, Dennis
Huang, Wei
Middleton, Don
[J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1245 - 1254

← 1 2 3 4 5 →