An I/O-Efficient Disk-based Graph System for Scalable Second-Order RandomWalk of Large Graphs

被引:4
|
作者
Li, Hongzheng [1 ]
Shao, Yingxia [1 ]
Du, Junping [1 ]
Cui, Bin [2 ,3 ,4 ]
Chen, Lei [5 ]
机构
[1] BUPT, Natl Pilot Software Engn Sch, Sch Comp Sci, Beijing, Peoples R China
[2] Peking Univ, Sch CS, Beijing, Peoples R China
[3] Peking Univ, Key Lab High Confidence Software Technol, MOE, Beijing, Peoples R China
[4] Peking Univ Qingdao, Inst Computat Social Sci, Qingdao, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 08期
基金
中国国家自然科学基金;
关键词
ALGORITHMS;
D O I
10.14778/3529337.3529346
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Random walk is widely used in many graph analysis tasks, especially the first-order random walk. However, as a simplification of real-world problems, the first-order random walk is poor at modeling higher-order structures in the data. Recently, second-order random walk-based applications (e.g., Node2vec, Second-order PageRank) have become attractive. Due to the complexity of the second-order random walk models and memory limitations, it is not scalable to run second-order random walk-based applications on a single machine. Existing disk-based graph systems are only friendly to the first-order random walk models and suffer from expensive disk I/Os when executing the second-order random walks. This paper introduces an I/O-efficient disk-based graph system for the scalable second-order random walk of large graphs, called GraSorw. First, to eliminate massive light vertex I/Os, we develop a bi-block execution engine that converts random I/Os into sequential I/Os by applying a new triangular bi-block scheduling strategy, the bucket-based walk management, and the skewed walk storage. Second, to improve the I/O utilization, we design a learning-based block loading model to leverage the advantages of the full-load and on-demand load methods. Finally, we conducted extensive experiments on six large real datasets as well as several synthetic datasets.. The empirical results demonstrate that the end-to-end time cost of popular tasks in GraSorw is reduced by more than one order of magnitude compared to the existing disk-based graph systems.
引用
收藏
页码:1619 / 1631
页数:13
相关论文
共 19 条
  • [1] GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks
    Wang, Rui
    Li, Yongkun
    Xie, Hong
    Xu, Yinlong
    Lui, John C. S.
    PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, 2020, : 559 - 571
  • [2] I/O-efficient calculation of H-group closeness centrality over disk-resident graphs
    Zhao, Junzhou
    Wang, Pinghui
    Lui, John C. S.
    Towsley, Don
    Guan, Xiaohong
    INFORMATION SCIENCES, 2017, 400 : 105 - 128
  • [3] Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing
    Vora, Keval
    Xu, Guoqing
    Gupta, Rajiv
    PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 507 - 522
  • [4] Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs
    Shao, Yingxia
    Huang, Shiyue
    Miao, Xupeng
    Cui, Bin
    Chen, Lei
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1797 - 1812
  • [5] SOWalker: An I/O-Optimized Out-of-Core Graph Processing System for Second-Order RandomWalks
    Wu, Yutong
    Shi, Zhan
    Huang, Shicai
    Tian, Zhipeng
    Zuo, Pengwei
    Fang, Peng
    Wang, Fang
    Feng, Dan
    PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 87 - 100
  • [6] GridGAS: An I/O-Efficient Heterogeneous FPGA plus CPU Computing Platform for Very Large-Scale Graph Analytics
    Zou, Yu
    Lin, Mingjie
    2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 249 - 252
  • [7] Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code
    Wang K.
    Hussain A.
    Zuo Z.
    Xu G.
    Amiri Sani A.
    1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (52): : 389 - 404
  • [8] Efficient balancing-based MOR for large-scale second-order systems
    Benner, Peter
    Saak, Jens
    MATHEMATICAL AND COMPUTER MODELLING OF DYNAMICAL SYSTEMS, 2011, 17 (02) : 123 - 143
  • [9] Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code
    Wang, Kai
    Hussain, Aftab
    Zuo, Zhiqiang
    Xu, Guoqing
    Sani, Ardalan Amiri
    ACM SIGPLAN NOTICES, 2017, 52 (04) : 389 - 404
  • [10] Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code
    Wang, Kai
    Hussain, Aftab
    Zuo, Zhiqiang
    Xu, Guoqing
    Sani, Ardalan Amiri
    OPERATING SYSTEMS REVIEW, 2017, 51 (02) : 389 - 404