PEGASUS: mining peta-scale graphs

被引:0
|
作者
U Kang
Charalampos E. Tsourakakis
Christos Faloutsos
机构
[1] Carnegie Mellon University,School of Computer Science, Department Computer Science
来源
关键词
PEGASUS; Graph mining; GIM-V; Generalized iterative matrix-vector multiplication; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ≈ 6.7 billion edges.
引用
下载
收藏
页码:303 / 325
页数:22
相关论文
共 50 条
  • [31] Multi-reference coupled cluster methods: Review of novel algorithms for peta-scale architectures
    Kowalski, Karol
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [32] From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing
    Zima, Hans P.
    SCIENTIFIC PROGRAMMING, 2007, 15 (01) : 45 - 65
  • [33] Exactly Solving Hard Permutation Flowshop Scheduling Problems on Peta-Scale GPU-Accelerated Supercomputers
    Gmys, Jan
    INFORMS JOURNAL ON COMPUTING, 2022, 34 (05) : 2502 - 2522
  • [34] CineGrid Exchange: A workflow-based peta-scale distributed storage platform on a high-speed network
    Liu, Shaofeng
    Schulze, Jurgen P.
    Herr, Laurin
    Weekley, Jeffrey D.
    Zhu, Bing
    Osdol, Natalie V.
    Plepys, Dana
    Wan, Mike
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2011, 27 (07): : 966 - 976
  • [35] Performance of Parallel Simulators on Peta-scale Platforms for Coupled Multi-physics Modelling of CO2 Geologic Sequestration
    Yamamoto, Hajime
    Nakajima, Kengo
    Zhang, Keni
    Nanai, Shinichi
    12TH INTERNATIONAL CONFERENCE ON GREENHOUSE GAS CONTROL TECHNOLOGIES, GHGT-12, 2014, 63 : 3795 - 3804
  • [36] Shock-turbulence interaction: What we know and what we can learn from peta-scale simulations
    Lele, Sanjiva K.
    Larsson, Johan
    SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2009, 180
  • [37] Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight
    Fu, Haohuan
    Liao, Junfeng
    Ding, Nan
    Duan, Xiaohui
    Gan, Lin
    Liang, Yishuang
    Wang, Xinliang
    Yang, Jinzhe
    Zheng, Yan
    Liu, Weiguo
    Wang, Lanning
    Yang, Guangwen
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [38] Massively Parallel Algorithm and Implementation of RI-MP2 Energy Calculation for Peta-Scale Many-Core Supercomputers
    Katouda, Michio
    Naruse, Akira
    Hirano, Yukihiko
    Nakajima, Takahito
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (30) : 2623 - 2633
  • [39] Highly resolved peta-scale direct numerical simulations: Onset of Kelvin–Helmholtz Rayleigh–Taylor instability via pressure pulses
    Joshi, Bhavna
    Sengupta, Tapan K.
    Sundaram, Prasannabalaji
    Sengupta, Aditi
    Computers and Fluids, 2024, 284
  • [40] Constructing and Mining Web-Scale Knowledge Graphs
    Gabrilovich, Evgeniy
    Usunier, Nicolas
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 1195 - 1197