Parallel Tensor Compression for Large-Scale Scientific Data

被引:84
|
作者
Austin, Woody [1 ]
Ballard, Grey [2 ]
Kolda, Tamara G. [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Sandia Natl Labs, Livermore, CA USA
关键词
Tucker tensor decomposition; compression; SINGULAR-VALUE DECOMPOSITION; COLLECTIVE COMMUNICATION; ALGORITHMS;
D O I
10.1109/IPDPS.2016.67
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
引用
收藏
页码:912 / 922
页数:11
相关论文
共 50 条
  • [1] Parallel visualization of large-scale multifield scientific data
    Cao, Yi
    Mo, Zeyao
    Ai, Zhiwei
    Wang, Huawei
    Xiao, Li
    Zhang, Zhe
    [J]. JOURNAL OF VISUALIZATION, 2019, 22 (06) : 1107 - 1123
  • [2] Parallel visualization of large-scale multifield scientific data
    Yi Cao
    Zeyao Mo
    Zhiwei Ai
    Huawei Wang
    Li Xiao
    Zhe Zhang
    [J]. Journal of Visualization, 2019, 22 : 1107 - 1123
  • [3] Applying Practical Parallel Grammar Compression to Large-scale Data
    Matsushita, Masaki
    Inoguchi, Yasushi
    [J]. DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 473 - 473
  • [4] Parallel Compression and Indexing of Large-Scale Geospatial Raster Data with GPGPUs
    Kaligirwa, Nathalie
    Leal, Eleazar
    Gruenwald, Le
    Zhang, Jianting
    You, Simin
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 137 - 144
  • [5] Parallel implementation of large-scale CFD data compression toward aeroacoustic analysis
    Sakai, Ryotaro
    Sasaki, Daisuke
    Nakahashi, Kazuhiro
    [J]. COMPUTERS & FLUIDS, 2013, 80 : 116 - 127
  • [6] Tuning Parallel Data Compression and I/O for Large-scale Earthquake Simulation
    Tang, Houjun
    Byna, Suren
    Petersson, N. Anders
    McCallen, David
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2992 - 2997
  • [7] On a Pipeline-based Architecture for Parallel Visualization of Large-scale Scientific Data
    Chu, Dongliang
    Wu, Chase Q.
    [J]. PROCEEDINGS OF 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2016), 2016, : 88 - 97
  • [8] Large-scale parallel data clustering
    Judd, D
    McKinley, PK
    Jain, AK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (08) : 871 - 876
  • [9] TuckerMPI: A Parallel C plus plus /MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition
    Ballard, Grey
    Klinvex, Alicia
    Kolda, Tamara G.
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2020, 46 (02):
  • [10] Research on parallel visualization in large-scale scientific computing
    Gao, Jiaquan
    Zhao, Duanyang
    [J]. ICAT 2006: 16TH INTERNATIONAL CONFERENCE ON ARTIFICIAL REALITY AND TELEXISTENCE - WORSHOPS, PROCEEDINGS, 2006, : 149 - +