Optimizing Sparse Tensor Times Matrix on Multi-core and Many-core Architectures

被引:0
|
作者
Li, Jiajia [1 ]
Ma, Yuchen [2 ]
Yan, Chenggang [2 ]
Vuduc, Richard [1 ]
机构
[1] Georgia Inst Technol, Computat Sci & Engn, Atlanta, GA 30332 USA
[2] Hangzhou Dianzi Univ, Inst Informat & Control, Hangzhou, Zhejiang, Peoples R China
基金
美国国家科学基金会;
关键词
DECOMPOSITIONS;
D O I
10.1109/IA3.2016.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5x faster than the SpTTM from Tensor Toolbox and 1.5x over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1x speedup on multicore Intel Core i7 and 18.8x speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.
引用
收藏
页码:26 / 33
页数:8
相关论文
共 50 条
  • [1] Solving Matrix Equations on Multi-Core and Many-Core Architectures
    Benner, Peter
    Ezzatti, Pablo
    Mena, Hermann
    Quintana-Orti, Enrique S.
    Remon, Alfredo
    [J]. ALGORITHMS, 2013, 6 (04) : 857 - 870
  • [2] Revision of Relational Joins for Multi-Core and Many-Core Architectures
    Krulis, Martin
    Yaghob, Jakub
    [J]. DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
  • [3] RTL Test Generation on Multi-Core and Many-Core Architectures
    Varadarajan, Aravind Krishnan
    Hsiao, Michael S.
    [J]. 2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105
  • [4] Finite element assembly strategies on multi-core and many-core architectures
    Markall, G. R.
    Slemmer, A.
    Ham, D. A.
    Kelly, P. H. J.
    Cantwell, C. D.
    Sherwin, S. J.
    [J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2013, 71 (01) : 80 - 97
  • [5] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
    Gan, Lin
    Fu, Haohuan
    Xue, Wei
    Xu, Yangtong
    Yang, Chao
    Wang, Xinliang
    Lv, Zihong
    You, Yang
    Yang, Guangwen
    Ou, Kaijian
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
  • [6] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
    Datta, Amitava
    Kaur, Amardeep
    Lauer, Tobias
    Chabbouh, Sami
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
  • [7] Optimizing Machine Learning Algorithms on Multi-core and Many-core Architectures using Thread and Data Mapping
    Serpa, Matheus S.
    Krause, Arthur M.
    Cruz, Eduardo H. M.
    Navaux, Philippe O. A.
    Pasin, Marcelo
    Felber, Pascal
    [J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 329 - 333
  • [8] Sparse Matrix Operations on Multi-core Architectures
    Trinitis, Carsten
    Kuestner, Tilman
    Weidendorfer, Josef
    Smajic, Jasmin
    [J]. PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2009, 5698 : 41 - +
  • [9] Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors
    Nagasaka, Yusuke
    Matsuoka, Satoshi
    Azad, Ariful
    Buluc, Aydin
    [J]. PARALLEL COMPUTING, 2019, 90
  • [10] A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
    Rokos, Georgios
    Gorman, Gerard
    Kelly, Paul H. J.
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 414 - 425