PEGASUS: mining peta-scale graphs

被引:0
|
作者
U Kang
Charalampos E. Tsourakakis
Christos Faloutsos
机构
[1] Carnegie Mellon University,School of Computer Science, Department Computer Science
来源
关键词
PEGASUS; Graph mining; GIM-V; Generalized iterative matrix-vector multiplication; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ≈ 6.7 billion edges.
引用
下载
收藏
页码:303 / 325
页数:22
相关论文
共 50 条
  • [41] Constructing and Mining Web-Scale Knowledge Graphs
    Bordes, Antoine
    Gabrilovich, Evgeniy
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1967 - 1967
  • [42] Efficient mining algorithms for large-scale graphs
    Kishimoto, Yasunari
    Shiokawa, Hiroaki
    Fujiwara, Yasuhiro
    Onizuka, Makoto
    NTT Technical Review, 2013, 11 (12):
  • [43] Fine Grained Power Analysis and Low-Power Techniques of a 128GFLOPS/58W SPARC64™ VIIIfx Processor for Peta-scale Computing
    Okano, Hiroshi
    Kawabe, Yukihito
    Kan, Ryuji
    Yoshida, Toshio
    Yamazaki, Iwao
    Sakurai, Hitoshi
    Hondou, Mikio
    Matsui, Nobuyuki
    Yamashita, Hideo
    Nakada, Tatsumi
    Maruyama, Takumi
    Asakawa, Takeo
    2010 SYMPOSIUM ON VLSI CIRCUITS, DIGEST OF TECHNICAL PAPERS, 2010, : 167 - +
  • [44] PEGASUS: An information mining system for TV news videos
    Liu, Jingen
    Zhai, Yun
    Shah, Mubarak
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS IV, 2006, 6229
  • [45] High performance computing beyond the peta scale in Japan
    Sakata, Toichi
    SciDac 2007: Scientific Discovery Through Advanced Computing, 2007, 78 : U454 - U465
  • [46] Approximate Deep Network Embedding for Mining Large-scale Graphs
    Zhou, Yang
    Liu, Ling
    2019 IEEE FIRST INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2019), 2019, : 53 - 60
  • [47] Constructing and Mining Web-Scale Knowledge Graphs WWW 2015 Tutorial
    Bordes, Antoine
    Gabrilovich, Evgeniy
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 1523 - 1523
  • [48] Mining Keys for Graphs
    Alipourlangouri, Morteza
    Chiang, Fei
    DATA & KNOWLEDGE ENGINEERING, 2024, 150
  • [49] Guilt by Association: Large Scale Malware Detection by Mining File-relation Graphs
    Tamersoy, Acar
    Roundy, Kevin
    Chau, Duen Horng
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1524 - 1533
  • [50] Mining Uncertain Graphs: An Overview
    Kassiano, Vasileios
    Gounaris, Anastasios
    Papadopoulos, Apostolos N.
    Tsichlas, Kostas
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2016, 2017, 10230 : 87 - 116