Optimizing Sparse Tensor Times Matrix on Multi-core and Many-core Architectures

被引：0

作者：

Li, Jiajia ^{[1
]}

Ma, Yuchen ^{[2
]}

Yan, Chenggang ^{[2
]}

Vuduc, Richard ^{[1
]}

机构：

[1] Georgia Inst Technol, Computat Sci & Engn, Atlanta, GA 30332 USA

[2] Hangzhou Dianzi Univ, Inst Informat & Control, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF 2016 6TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURE AND ALGORITHMS (IA3) | 2016年

基金：

美国国家科学基金会;

关键词：

DECOMPOSITIONS;

D O I：

10.1109/IA3.2016.10

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5x faster than the SpTTM from Tensor Toolbox and 1.5x over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1x speedup on multicore Intel Core i7 and 18.8x speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.

引用

页码：26 / 33

页数：8

共 50 条

[1] Solving Matrix Equations on Multi-Core and Many-Core Architectures
Benner, Peter
Ezzatti, Pablo
Mena, Hermann
Quintana-Orti, Enrique S.
Remon, Alfredo
[J]. ALGORITHMS, 2013, 6 (04) : 857 - 870
[2] Revision of Relational Joins for Multi-Core and Many-Core Architectures
Krulis, Martin
Yaghob, Jakub
[J]. DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
[3] RTL Test Generation on Multi-Core and Many-Core Architectures
Varadarajan, Aravind Krishnan
Hsiao, Michael S.
[J]. 2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105
[4] Finite element assembly strategies on multi-core and many-core architectures
Markall, G. R.
Slemmer, A.
Ham, D. A.
Kelly, P. H. J.
Cantwell, C. D.
Sherwin, S. J.
[J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2013, 71 (01) : 80 - 97
[5] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
Gan, Lin
Fu, Haohuan
Xue, Wei
Xu, Yangtong
Yang, Chao
Wang, Xinliang
Lv, Zihong
You, Yang
Yang, Guangwen
Ou, Kaijian
[J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
[6] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
Datta, Amitava
Kaur, Amardeep
Lauer, Tobias
Chabbouh, Sami
[J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
[7] Optimizing Machine Learning Algorithms on Multi-core and Many-core Architectures using Thread and Data Mapping
Serpa, Matheus S.
Krause, Arthur M.
Cruz, Eduardo H. M.
Navaux, Philippe O. A.
Pasin, Marcelo
Felber, Pascal
[J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 329 - 333
[8] Sparse Matrix Operations on Multi-core Architectures
Trinitis, Carsten
Kuestner, Tilman
Weidendorfer, Josef
Smajic, Jasmin
[J]. PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2009, 5698 : 41 - +
[9] Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors
Nagasaka, Yusuke
Matsuoka, Satoshi
Azad, Ariful
Buluc, Aydin
[J]. PARALLEL COMPUTING, 2019, 90
[10] A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
Rokos, Georgios
Gorman, Gerard
Kelly, Paul H. J.
[J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 414 - 425

← 1 2 3 4 5 →