Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

被引:0
|
作者
Hye-Kyung Yang [1 ]
Hwan-Seung Yong [2 ]
机构
[1] Department of Computer Software, Korean Bible University
[2] Department of Computer Science and Engineering, Ewha Womans University
基金
新加坡国家研究基金会;
关键词
PARAFAC; Tensor decomposition; Incremental tensor decomposition; Apache Spark; Big data;
D O I
暂无
中图分类号
TP311.13 []; O183.2 [张量分析];
学科分类号
1201 ;
摘要
Purpose: We propose In Par Ten2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach: Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings: The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations: There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.Practical implications: The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor. Originality/value: The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, In Par Ten2 can handle static as well as incremental tensor decomposition.
引用
收藏
页码:13 / 32
页数:20
相关论文
共 50 条
  • [31] LSTM-based Memory Profiling for Predicting Data Attacks in Distributed Big Data Systems
    Aditham, Santosh
    Ranganathan, Nagarajan
    Katkoori, Srinivas
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1259 - 1267
  • [32] PERFORMANCE ANALYSIS ON SPC-MAB BASED MULTI-ASPECT DATA ACQUISITION MODE
    Shen, Wenjie
    Lin, Yun
    Zheng, Baowen
    Tan, Weixian
    Hong, Wen
    Yu, Lingjuan
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 1114 - 1117
  • [33] Power Distribution System Stream Data Compression Based on Incremental Tensor Decomposition
    Zhao, Hongshan
    Ma, Libo
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (04) : 2469 - 2476
  • [34] ADTT: A Highly Efficient Distributed Tensor-Train Decomposition Method for IIoT Big Data
    Wang, Xiaokang
    Yang, Laurence T.
    Wang, Yihao
    Ren, Lei
    Deen, M. Jamal
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (03) : 1573 - 1582
  • [35] Bridging High Velocity and High Volume Industrial Big Data Through Distributed In-Memory Storage & Analytics
    Williams, Jenny Weisenberg
    Aggour, Kareem S.
    Interrante, John
    McHugh, Justin
    Pool, Eric
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 932 - 941
  • [36] DISTIL: A Distributed In-Memory Data Processing System for Location-Based Services
    Patrou, Maria
    Alam, Md Mahbub
    Memarzia, Puya
    Ray, Suprio
    Bhavsar, Virendra C.
    Kent, Kenneth B.
    Dueck, Gerhard W.
    26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 496 - 499
  • [37] Distributed Range-Based Meta-Data Management for an In-Memory Storage
    Klein, Florian
    Beineke, Kevin
    Schoettner, Michael
    EURO-PAR 2015: PARALLEL PROCESSING WORKSHOPS, 2015, 9523 : 3 - 15
  • [38] Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation
    Ketu, Shwet
    Agarwal, Sonali
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 318 - 324
  • [39] Target Scattering Feature Extraction Based on Parametric Model Using Multi-Aspect SAR Data
    Yue, Xiaoyang
    Teng, Fei
    Lin, Yun
    Hong, Wen
    REMOTE SENSING, 2023, 15 (07)
  • [40] Big data cleaning model of smart grid based on Tensor Tucker decomposition
    Yin, Jun
    Zhang, Jianye
    Li, Degao
    Wang, Tianjun
    Jing, Kang
    2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 166 - 169