Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems

被引:0
|
作者
Hye-Kyung Yang [1 ]
Hwan-Seung Yong [2 ]
机构
[1] Department of Computer Software, Korean Bible University
[2] Department of Computer Science and Engineering, Ewha Womans University
基金
新加坡国家研究基金会;
关键词
PARAFAC; Tensor decomposition; Incremental tensor decomposition; Apache Spark; Big data;
D O I
暂无
中图分类号
TP311.13 []; O183.2 [张量分析];
学科分类号
1201 ;
摘要
Purpose: We propose In Par Ten2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach: Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings: The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations: There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data.Practical implications: The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor. Originality/value: The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, In Par Ten2 can handle static as well as incremental tensor decomposition.
引用
收藏
页码:13 / 32
页数:20
相关论文
共 50 条
  • [1] Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems
    Yang, Hye-Kyung
    Yong, Hwan-Seung
    JOURNAL OF DATA AND INFORMATION SCIENCE, 2020, 5 (02) : 13 - 32
  • [2] DisMASTD: An Efficient Distributed Multi-Aspect Streaming Tensor Decomposition
    Yang, Keyu
    Gao, Yunjun
    Shen, Yifeng
    Zheng, Baihua
    Chen, Lu
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1080 - 1091
  • [3] Multi-Aspect Streaming Tensor Ring Completion for Dynamic Incremental Data
    Huang, Zhenhao
    Qiu, Yuning
    Yu, Jinshi
    Zhou, Guoxu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2657 - 2661
  • [4] Distributed PARAFAC Decomposition Method Based on In-memory Big Data System
    Yang, Hye-Kyung
    Yong, Hwan-Seung
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 292 - 295
  • [5] Scalable Tensor Decompositions for Multi-aspect Data Mining
    Kolda, Tamara G.
    Sun, Jimeng
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 363 - +
  • [6] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [7] Research on Tensor Multi-Clustering Distributed Incremental Updating Method for Big Data
    Zhang, Hongjun
    Zhang, Zeyu
    Ruan, Yilong
    Ye, Hao
    Li, Peng
    Shi, Desheng
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1409 - 1432
  • [8] Neural Tensor Model for Learning Multi-Aspect Factors in Recommender Systems
    Chen, Huiyuan
    Li, Jing
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2449 - 2455
  • [9] LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data
    Tang, Mingjie
    Yu, Yongyang
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1565 - 1568
  • [10] Incremental qr-based tensor-train decomposition for industrial big data
    Yanping C.
    Xiaodong J.
    Hong X.
    Zhongmin W.
    Journal of China Universities of Posts and Telecommunications, 2021, 28 (01): : 10 - 23