Memory-efficient Parallel Tensor Decompositions

被引：0

作者：

Baskaran, Muthu ^{[1
]}

Henretty, Tom ^{[1
]}

Pradelle, Benoit ^{[1
]}

Langston, M. Harper ^{[1
]}

Bruns-Smith, David ^{[1
]}

Ezick, James ^{[1
]}

Lethin, Richard ^{[1
]}

机构：

[1] Reservoir Labs Inc, New York, NY 10012 USA

来源：

2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 1 1 x reduction in memory usage and up to 7 x improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.

引用

页数：7

共 50 条

[1] Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition
Ravindran, Niranjay
Sidiropoulos, Nicholas D.
Smith, Shaden
Karypis, George
[J]. CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 581 - 585
[2] MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures
Lin, Yu-Sheng
Chen, Wei-Chao
Chien, Shao-Yi
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (03) : 791 - 804
[3] Parallel and Memory-efficient Preprocessing for Metagenome Assembly
Rengasamy, Vasudevan
Medvedev, Paul
Madduri, Kamesh
[J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 283 - 292
[4] Parallel Memory-Efficient Processing of BCI Data
Alexander, Trevor
Kuh, Anthony
Hamada, Katsuhiko
Mori, Hiromu
Shinoda, Hiroyuki
Rutkowski, Tomasz
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[5] A scalable memory-efficient architecture for parallel shared memory switches
Matthews, Brad
Elhanany, Itamar
[J]. 2007 WORKSHOP ON HIGH PERFORMANCE SWITCHING AND ROUTING, 2007, : 74 - +
[6] TETRIS: Memory-efficient Serverless Inference through Tensor Sharing
Li, Jie
Zhao, Laiping
Yang, Yanan
Zhan, Kunlin
Li, Keqiu
[J]. PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 473 - 488
[7] Memory-Efficient Pipeline-Parallel DNN Training
Narayanan, Deepak
Phanishayee, Amar
Shi, Kaiyu
Chen, Xie
Zaharia, Matei
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[8] Multiplexer and Memory-Efficient Circuits for Parallel Bit Reversal
Garrido, Mario
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (04) : 657 - 661
[9] Work and memory-efficient parallel algorithms for the knapsack problem
Ferreira, A
[J]. INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING, 1995, 7 (04): : 595 - 606
[10] Parallel and Memory-Efficient Reads Indexing for Genome Assembly
Chapuis, Guillaume
Chikhi, Rayan
Lavenier, Dominique
[J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT II, 2012, 7204 : 272 - 280

← 1 2 3 4 5 →