Memory-efficient Parallel Tensor Decompositions

被引:0
|
作者
Baskaran, Muthu [1 ]
Henretty, Tom [1 ]
Pradelle, Benoit [1 ]
Langston, M. Harper [1 ]
Bruns-Smith, David [1 ]
Ezick, James [1 ]
Lethin, Richard [1 ]
机构
[1] Reservoir Labs Inc, New York, NY 10012 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 1 1 x reduction in memory usage and up to 7 x improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition
    Ravindran, Niranjay
    Sidiropoulos, Nicholas D.
    Smith, Shaden
    Karypis, George
    [J]. CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 581 - 585
  • [2] MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures
    Lin, Yu-Sheng
    Chen, Wei-Chao
    Chien, Shao-Yi
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (03) : 791 - 804
  • [3] Parallel and Memory-efficient Preprocessing for Metagenome Assembly
    Rengasamy, Vasudevan
    Medvedev, Paul
    Madduri, Kamesh
    [J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 283 - 292
  • [4] Parallel Memory-Efficient Processing of BCI Data
    Alexander, Trevor
    Kuh, Anthony
    Hamada, Katsuhiko
    Mori, Hiromu
    Shinoda, Hiroyuki
    Rutkowski, Tomasz
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [5] A scalable memory-efficient architecture for parallel shared memory switches
    Matthews, Brad
    Elhanany, Itamar
    [J]. 2007 WORKSHOP ON HIGH PERFORMANCE SWITCHING AND ROUTING, 2007, : 74 - +
  • [6] TETRIS: Memory-efficient Serverless Inference through Tensor Sharing
    Li, Jie
    Zhao, Laiping
    Yang, Yanan
    Zhan, Kunlin
    Li, Keqiu
    [J]. PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 473 - 488
  • [7] Memory-Efficient Pipeline-Parallel DNN Training
    Narayanan, Deepak
    Phanishayee, Amar
    Shi, Kaiyu
    Chen, Xie
    Zaharia, Matei
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] Multiplexer and Memory-Efficient Circuits for Parallel Bit Reversal
    Garrido, Mario
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (04) : 657 - 661
  • [9] Work and memory-efficient parallel algorithms for the knapsack problem
    Ferreira, A
    [J]. INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING, 1995, 7 (04): : 595 - 606
  • [10] Parallel and Memory-Efficient Reads Indexing for Genome Assembly
    Chapuis, Guillaume
    Chikhi, Rayan
    Lavenier, Dominique
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT II, 2012, 7204 : 272 - 280