A Learned Sketch for Subgraph Counting

被引:22
|
作者
Zhao, Kangfei [1 ]
Yu, Jeffrey Xu [1 ]
Zhang, Hao [1 ]
Li, Qiyan [2 ]
Rong, Yu [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Wuhan Univ, Wuhan, Peoples R China
[3] Tecent AI Lab, Shenzhen, Peoples R China
关键词
Subgraph counting; Deep learning; CARDINALITY ESTIMATION; PREDICTION; ALGORITHM; GRAPHLETS; NETWORKS; QUERIES; BOUNDS; ORDER;
D O I
10.1145/3448016.3457289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact that it provides insights of a large graph, in particular a labeled graph, when a collection of query graphs with different sizes and labels are issued. The problem of counting is challenging. On one hand, exact counting by enumerating subgraphs is NP-hard. On the other hand, approximate counting by subgraph isomorphism can only support 3/5-node query graphs over unlabeled graphs. Another way for subgraph counting is to specify it as an SQL query and estimate the cardinality of the query in RDBMS. Existing approaches for cardinality estimation can only support subgraph counting by homomorphism up to some extent, as it is difficult to deal with sampling failure when a query graph becomes large. A question that arises is if subgraph counting can be supported by machine learning (ML) and deep learning (DL). The existing DL approach for subgraph isomorphism can only support small data graphs. The ML/DL approaches proposed in RDBMS context for approximate query processing and cardinality estimation cannot be used, as subgraph counting is to do complex self-joins over one relation, whereas existing approaches focus on multiple relations. In this paper, we propose an Active Learned Sketch for Subgraph Counting (ALSS) with two main components: a sketch learned (LSS) and an active learner (AL). The sketch is learned by a neural network regression model, and the active learner is to perform model updates based on new arrival test query graphs. We conduct extensive experimental studies to confirm the effectiveness and efficiency of ALSS using large real labeled graphs. Moreover, we show that ALSS can assist query optimizers to find a better query plan for complex multi-way self-joins.
引用
收藏
页码:2142 / 2155
页数:14
相关论文
共 50 条
  • [1] Learned sketch for subgraph counting: a holistic approach
    Zhao, Kangfei
    Yu, Jeffrey Xu
    Li, Qiyan
    Zhang, Hao
    Rong, Yu
    VLDB JOURNAL, 2023, 32 (05): : 937 - 962
  • [2] Learned sketch for subgraph counting: a holistic approach
    Kangfei Zhao
    Jeffrey Xu Yu
    Qiyan Li
    Hao Zhang
    Yu Rong
    The VLDB Journal, 2023, 32 : 937 - 962
  • [3] Fast Local Subgraph Counting
    Li, Qiyan
    Yu, Jeffrey Xu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (08): : 1967 - 1980
  • [4] Neural Subgraph Isomorphism Counting
    Liu, Xin
    Pan, Haojie
    He, Mutian
    Song, Yangqiu
    Jiang, Xin
    Shang, Lifeng
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1959 - 1969
  • [5] Neural Subgraph Counting with Wasserstein Estimator
    Wang, Hanchen
    Hu, Rong
    Zhang, Ying
    Qin, Lu
    Wang, Wei
    Zhang, Wenjie
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 160 - 175
  • [6] Mining large networks with subgraph counting
    Bordino, Ilaria
    Donato, Debora
    Gionis, Aristides
    Leonardi, Stefano
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 737 - +
  • [7] Distributed subgraph counting: A general approach
    Zhang H.
    Yu J.X.
    Zhang Y.
    Zhao K.
    Cheng H.
    1600, VLDB Endowment (13): : 2493 - 2507
  • [8] Fast Approximate Subgraph Counting and Enumeration
    Slota, George M.
    Madduri, Kamesh
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 210 - 219
  • [9] Subgraph counting identities and Ramsey numbers
    McKay, BD
    Radziszowski, SP
    JOURNAL OF COMBINATORIAL THEORY SERIES B, 1997, 69 (02) : 193 - 209
  • [10] Parallel Subgraph Counting for Multicore Architectures
    Aparicio, David
    Ribeiro, Pedro
    Silva, Fernando
    2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA), 2014, : 34 - 41