PEGASUS: mining peta-scale graphs

被引:0
|
作者
U Kang
Charalampos E. Tsourakakis
Christos Faloutsos
机构
[1] Carnegie Mellon University,School of Computer Science, Department Computer Science
来源
关键词
PEGASUS; Graph mining; GIM-V; Generalized iterative matrix-vector multiplication; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ≈ 6.7 billion edges.
引用
收藏
页码:303 / 325
页数:22
相关论文
共 50 条
  • [1] PEGASUS: mining peta-scale graphs
    Kang, U.
    Tsourakakis, Charalampos E.
    Faloutsos, Christos
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 27 (02) : 303 - 325
  • [2] PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
    Kang, U.
    Tsourakakis, Charalampos E.
    Faloutsos, Christos
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 229 - 238
  • [3] Designing for peta-scale in the LSST database
    Kantor, Jeffrey
    Axelrod, Tim
    Becla, Jacek
    Cook, Kem
    Nikolaev, Sergei
    Gray, Jim
    Plante, Ray
    Nieto-Santisteban, Maria
    Szalay, Alex
    Thakar, Ani
    [J]. ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVI, 2007, 376 : 3 - +
  • [4] Peta-Scale Data Warehousing at Yahoo!
    Ahuja, Mona
    Chen, Cheng Che
    Gottapu, Ravi
    Hallmann, Joerg
    Hasan, Waqar
    Johnson, Richard
    Kozyrczak, Maciek
    Pabbati, Ramesh
    Pandit, Neeta
    Pokuri, Sreenivasulu
    Uppala, Krishna
    [J]. ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 855 - 861
  • [5] Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers
    Fujisawa, Katsuki
    Suzumura, Toyotaro
    Sato, Hitoshi
    Ueno, Koji
    Yasui, Yuichiro
    Iwabuchi, Keita
    Endo, Toshio
    [J]. OPTIMIZATION IN THE REAL WORLD: TOWARD SOLVING REAL-WORLD OPTIMIZATION PROBLEMS, 2016, 13 : 1 - 13
  • [6] Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers
    Fujisawa, Katsuki
    Endo, Toshio
    Yasui, Yuichiro
    [J]. MATHEMATICAL SOFTWARE, ICMS 2016, 2016, 9725 : 265 - 274
  • [7] PEGASUS: MINING BILLION-SCALE GRAPHS IN THE CLOUD
    Kang, U.
    Chau, Duen Horng Polo
    Faloutsos, Christos
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5341 - 5344
  • [8] In-situ visualization for Peta-scale scientific computation
    [J]. Shan, G. (sgh@sccas.cn), 1600, Institute of Computing Technology (25):
  • [9] Programming for scientific computing on peta-scale heterogeneous parallel systems
    Yang Can-qun
    Wu Qiang
    Tang Tao
    Wang Feng
    Xue Jing-ling
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2013, 20 (05) : 1189 - 1203
  • [10] Programming for scientific computing on peta-scale heterogeneous parallel systems
    杨灿群
    吴强
    唐滔
    王锋
    薛京灵
    [J]. Journal of Central South University, 2013, 20 (05) : 1189 - 1203