Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

被引:23
|
作者
Wang, Zhigang [1 ]
Gu, Yu [1 ]
Bao, Yubin [1 ]
Yu, Ge [1 ]
Yu, Jeffrey Xu [2 ]
机构
[1] Northeastern Univ, Shenyang, Liaoning, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
I/O-Efficient; Distributed Graph Computing; Push; Pull; FRAMEWORK;
D O I
10.1145/2882903.2882938
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices. This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs.
引用
收藏
页码:479 / 494
页数:16
相关论文
共 50 条
  • [21] I/O-Efficient Contour Tree Simplification
    Arge, Lars
    Revsbaek, Morten
    [J]. ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2009, 5878 : 1155 - 1165
  • [22] I/O-Efficient Range Minima Queries
    Afshani, Peyman
    Sitchinava, Nodari
    [J]. ALGORITHM THEORY - SWAT 2014, 2014, 8503 : 1 - +
  • [23] I/O-Efficient Bundled Range Aggregation
    Tao, Yufei
    Sheng, Cheng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (06) : 1521 - 1531
  • [24] I/O-efficient Hierarchical Diameter Approximation
    Ajwani, Deepak
    Meyer, Ulrich
    Veith, David
    [J]. ALGORITHMS - ESA 2012, 2012, 7501 : 72 - 83
  • [25] I/O-efficient undirected shortest paths
    Meyer, U
    Zeh, N
    [J]. ALGORITHMS - ESA 2003, PROCEEDINGS, 2003, 2832 : 434 - 445
  • [26] An I/O-Efficient Buffer Batch Replacement Policy for Update-Intensive Graph Databases
    Zhou, Ningnan
    Zhou, Xuan
    Zhang, Xiao
    Wang, Shan
    Liu, Ling
    [J]. DATA SCIENCE AND ENGINEERING, 2016, 1 (04) : 231 - 241
  • [27] I/O-efficient algorithms for sparse graphs
    Toma, L
    Zeh, N
    [J]. ALGORITHMS FOR MEMORY HIERARCHIES: ADVANCED LECTURES, 2003, 2625 : 85 - 109
  • [28] I/O-Efficient Algorithms on Triangle Listing and Counting
    Hu, Xiaocheng
    Tao, Yufei
    Chung, Chin-Wan
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (04):
  • [29] I/O-Efficient Algorithms for Graphs of Bounded Treewidth
    Maheshwari, Anil
    Zeh, Norbert
    [J]. ALGORITHMICA, 2009, 54 (03) : 413 - 469
  • [30] I/O-Efficient flow Modeling on fat terrains
    de Berg, Mark
    Cheong, Otfried
    Haverkort, Herman
    Lim, Jung Gun
    Toma, Laura
    [J]. ALGORITHMS AND DATA STRUCTURES, PROCEEDINGS, 2007, 4619 : 239 - +