Parallel Tensor Compression for Large-Scale Scientific Data

被引:84
|
作者
Austin, Woody [1 ]
Ballard, Grey [2 ]
Kolda, Tamara G. [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Sandia Natl Labs, Livermore, CA USA
关键词
Tucker tensor decomposition; compression; SINGULAR-VALUE DECOMPOSITION; COLLECTIVE COMMUNICATION; ALGORITHMS;
D O I
10.1109/IPDPS.2016.67
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
引用
收藏
页码:912 / 922
页数:11
相关论文
共 50 条
  • [41] Extension of Parallel Primitives and Their Applications to Large-Scale Data Processing
    Nakano, Masashi
    Chang, Qiong
    Miyazaki, Jun
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT II, DEXA 2024, 2024, 14911 : 248 - 253
  • [42] DATA PARALLEL LARGE-SCALE MOLECULAR-DYNAMICS FOR LIQUIDS
    HEDMAN, F
    LAAKSONEN, A
    [J]. INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 1993, 46 (01) : 27 - 38
  • [43] Parallel Clustering Algorithm for Large-Scale Biological Data Sets
    Wang, Minchao
    Zhang, Wu
    Ding, Wang
    Dai, Dongbo
    Zhang, Huiran
    Xie, Hao
    Chen, Luonan
    Guo, Yike
    Xie, Jiang
    [J]. PLOS ONE, 2014, 9 (04):
  • [44] Designing Parallel Data Processing for Large-Scale Sensor Orchestration
    Kabac, Milan
    Consel, Charles
    [J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 57 - 65
  • [45] Design and Evaluation of Parallel Hashing over Large-scale Data
    Cheng, Long
    Kotoulas, Spyros
    Ward, Tomas E.
    Theodoropoulos, Georgios
    [J]. 2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [46] In-situ multi-resolution and temporal data compression for visual exploration of large-scale scientific simulations
    Lehmann, Henry
    Jung, Bernhard
    [J]. 2014 IEEE 4TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2014, : 51 - 58
  • [47] Application representations for multiparadigm performance modeling of large-scale parallel scientific codes
    Adve, V
    Sakellariou, R
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2000, 14 (04): : 304 - 316
  • [48] A simulation study of data distribution strategies for large-scale scientific data collaborations
    Al Kiswany, Samer
    Ripeanu, Matei
    [J]. 2007 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, 2007, : 223 - 226
  • [49] Parallel simulation of large-scale parallel applications
    Bagrodia, R
    Deelman, E
    Phan, T
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (01): : 3 - 12
  • [50] Large-scale electrophysiology: Acquisition, compression, encryption, and storage of big data
    Brinkmann, Benjamin H.
    Bower, Mark R.
    Stengel, Keith A.
    Worrell, Gregory A.
    Stead, Matt
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2009, 180 (01) : 185 - 192