All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark

被引:0
|
作者
Gudibanda, Aditya [1 ]
Henretty, Tom [1 ]
Baskaran, Muthu [1 ]
Ezick, James [1 ]
Lethin, Richard [1 ]
机构
[1] Reservoir Labs, 632 Broadway Suite 803, New York, NY 10012 USA
关键词
DATA FUSION; MATRIX;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by supporting the decomposition of several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large-scale, unsupervised pattern discovery.
引用
收藏
页数:8
相关论文
共 6 条
  • [1] An All-at-Once CP Decomposition Method for Count Tensors
    Ranadive, Teresa M.
    Baskaran, Muthu M.
    [J]. 2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [2] Mining billion-scale tensors: algorithms and discoveries
    Jeon, Inah
    Papalexakis, Evangelos E.
    Faloutsos, Christos
    Sael, Lee
    Kang, U.
    [J]. VLDB JOURNAL, 2016, 25 (04): : 519 - 544
  • [3] Mining billion-scale tensors: algorithms and discoveries
    Inah Jeon
    Evangelos E. Papalexakis
    Christos Faloutsos
    Lee Sael
    U. Kang
    [J]. The VLDB Journal, 2016, 25 : 519 - 544
  • [4] 2PCP: Two-Phase CP Decomposition for Billion-Scale Dense Tensors
    Li, Xinsheng
    Huang, Shengyu
    Candan, K. Selcuk
    Sapino, Maria Luisa
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 835 - 846
  • [5] Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data
    Chai, Di
    Wang, Leye
    Zhang, Junxue
    Yang, Liu
    Cai, Shuowei
    Chen, Kai
    Yang, Qiang
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 46 - 55
  • [6] Incremental PARAFAC Decomposition for Three-Dimensional Tensors Using Apache Spark
    Yang, Hye-Kyung
    Yong, Hwan-Seung
    [J]. WEB ENGINEERING (ICWE 2019), 2019, 11496 : 63 - 71