Mining billion-scale tensors: algorithms and discoveries

被引:23
|
作者
Jeon, Inah [1 ]
Papalexakis, Evangelos E. [2 ,3 ]
Faloutsos, Christos [2 ,3 ]
Sael, Lee [4 ]
Kang, U. [5 ]
机构
[1] LG Elect, Seoul, South Korea
[2] CMU, Dept Comp Sci, Pittsburgh, PA USA
[3] CMU, iLab, Pittsburgh, PA USA
[4] SUNY, Dept Comp Sci, Inchon, South Korea
[5] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul, South Korea
来源
VLDB JOURNAL | 2016年 / 25卷 / 04期
基金
新加坡国家研究基金会;
关键词
Tensor; Distributed computing; Big data; MapReduce; Hadoop;
D O I
10.1007/s00778-016-0427-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tensor decompositions are widely used in many data mining applications: detecting malicious attackers in network traffic logs (with source IP, destination IP, port-number, timestamp), finding telemarketers in a phone call history (with sender, receiver, date), and identifying interesting concepts in a knowledge base (with subject, object, relation). However, current tensor decomposition methods do not scale to large and sparse real-world tensors with millions of rows and columns and 'fibers.' In this paper, we propose HaTen2, a distributed method for large-scale tensor decompositions that runs on the MapReduce framework. Our careful design and implementation of HaTen2 dramatically reduce the size of intermediate data and the number of jobs leading to achieve high scalability compared with the state-of-the-art method. Thanks to HaTen2, we analyze big real-world sparse tensors that cannot be handled by the current state of the art, and discover hidden concepts.
引用
收藏
页码:519 / 544
页数:26
相关论文
共 50 条
  • [1] Mining billion-scale tensors: algorithms and discoveries
    Inah Jeon
    Evangelos E. Papalexakis
    Christos Faloutsos
    Lee Sael
    U. Kang
    [J]. The VLDB Journal, 2016, 25 : 519 - 544
  • [2] Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation
    Kang, U.
    Meeder, Brendan
    Faloutsos, Christos
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 13 - 25
  • [3] PEGASUS: MINING BILLION-SCALE GRAPHS IN THE CLOUD
    Kang, U.
    Chau, Duen Horng Polo
    Faloutsos, Christos
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5341 - 5344
  • [4] BIGtensor: Mining Billion-Scale Tensor Made Easy
    Park, Namyong
    Jeon, Byungsoo
    Lee, Jungwoo
    Kang, U.
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2457 - 2460
  • [5] Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining
    Nelson, Michael
    Radhakrishnan, Sridhar
    Sekharan, Chandra N.
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 395 - 402
  • [6] All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark
    Gudibanda, Aditya
    Henretty, Tom
    Baskaran, Muthu
    Ezick, James
    Lethin, Richard
    [J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [7] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Zhu, Yuanyuan
    Zhang, Hao
    Qin, Lu
    Cheng, Hong
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 149 - 176
  • [8] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Yuanyuan Zhu
    Hao Zhang
    Lu Qin
    Hong Cheng
    [J]. Distributed and Parallel Databases, 2017, 35 : 149 - 176
  • [9] Synchronizing billion-scale automata
    Tas, Mustafa Kemal
    Kaya, Kamer
    Yenigun, Husnu
    [J]. INFORMATION SCIENCES, 2021, 574 : 162 - 175
  • [10] Scalable and Adaptive Algorithms for the Triangle Interdiction Problem on Billion-Scale Networks
    Kuhnle, Alan
    Crawford, Victoria G.
    Thai, My T.
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 237 - 246