Scalable subgraph enumeration in MapReduce: a cost-oriented approach

被引:16
|
作者
Lai, Longbin [1 ]
Qin, Lu [2 ]
Lin, Xuemin [1 ]
Chang, Lijun [1 ]
机构
[1] Univ New South Wales, Sydney, NSW, Australia
[2] Univ Technol, Ctr QCIS, Sydney, NSW, Australia
来源
VLDB JOURNAL | 2017年 / 26卷 / 03期
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
MapReduce; Subgraph enumeration; Random graph; Power-law graph; ISOMORPHISM;
D O I
10.1007/s00778-017-0459-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm based on a left-deep-join framework in MapReduce, in which the basic join unit is a (an edge or two incident edges of a node). We show that in the Erdos-R,nyi random graph model, is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.
引用
收藏
页码:421 / 446
页数:26
相关论文
共 50 条
  • [1] Scalable subgraph enumeration in MapReduce: a cost-oriented approach
    Longbin Lai
    Lu Qin
    Xuemin Lin
    Lijun Chang
    [J]. The VLDB Journal, 2017, 26 : 421 - 446
  • [2] Scalable Subgraph Enumeration in MapReduce
    Lai, Longbin
    Qin, Lu
    Lin, Xuemin
    Chang, Lijun
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (10): : 974 - 985
  • [3] A cost-oriented approach for the design of IT architectures
    Ardagna, D
    Francalanci, C
    [J]. JOURNAL OF INFORMATION TECHNOLOGY, 2005, 20 (01) : 32 - 51
  • [4] Scalable Distributed Subgraph Enumeration
    Lai, Longbin
    Qin, Lu
    Lin, Xuemin
    Zhang, Ying
    Chang, Lijun
    Yang, Shiyu
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (03): : 217 - 228
  • [5] Practical Survey on MapReduce Subgraph Enumeration Algorithms
    Liu, Xiaozhou
    Santoso, Yudi
    Srinivasan, Venkatesh
    Thomo, Alex
    [J]. ADVANCES IN INTERNET, DATA & WEB TECHNOLOGIES (EIDWT-2022), 2022, 118 : 430 - 444
  • [6] HUGE: An Efficient and Scalable Subgraph Enumeration System
    Yang, Zhengyi
    Lai, Longbin
    Lin, Xuemin
    Hao, Kongzhang
    Zhang, Wenjie
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2049 - 2062
  • [7] COST-ORIENTED USE OF BIOMASS
    WOERNLE, R
    MAYER, M
    [J]. CHEMIE INGENIEUR TECHNIK, 1985, 57 (08) : 711 - 711
  • [8] Cost-oriented load forecasting
    Zhang, Jialun
    Wang, Yi
    Hug, Gabriela
    [J]. ELECTRIC POWER SYSTEMS RESEARCH, 2022, 205
  • [9] Cost-oriented model of quality management
    Linczényi, A
    [J]. WAYS FOR IMPROVING WOODWORKING INDUSTRY FOR TRANSITIONAL ECONOMICS, PROCEEDINGS, 2001, : 7 - 10
  • [10] Legs' Trajectory Generation for a Cost-Oriented Humanoid Robot: a Symmetrical Approach
    d'Apolito, F.
    [J]. IFAC PAPERSONLINE, 2019, 52 (25): : 95 - 99