MANIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling

被引:5
|
作者
Preti, Giulia [1 ]
Morales, Gianmarco De Francisci [1 ]
Riondato, Matteo [2 ]
机构
[1] CENTAI, Corso Inghilterra 3, I-10138 Turin, Italy
[2] Amherst Coll, Dept Comp Sci, Box 2232, Amherst, MA 01002 USA
基金
美国国家科学基金会;
关键词
Minimum Node Image; pattern mining; VC-dimension; GRAPHLETS;
D O I
10.1145/3587254
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MANIACS, a sampling-based randomized algorithm for computing high-quality approximations of the collection of the subgraph patterns that are frequent in a single, large, vertex-labeled graph, according to the Minimum Node Image-based (MNI) frequency measure. The output of MANIACS comes with strong probabilistic guarantees, obtained by using the empirical Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory, together with strong probabilistic tail bounds on the difference between the frequency of a pattern in the sample and its exact frequency. MANIACS leverages properties of the MNI-frequency to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces that contain no frequent patterns. In turn, this pruning leads to better bounds to the maximum frequency estimation error, which leads to increased pruning, resulting in a beneficial feedback effect. The results of our experimental evaluation of MANIACS on real graphs show that it returns high-quality collections of frequent patterns in large graphs up to two orders of magnitude faster than the exact algorithm.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?
    Ng, Willie
    Dash, Manoranjan
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 : 151 - 162
  • [22] Ap-FSM: A parallel algorithm for approximate frequent subgraph mining using Pregel
    Bhatia, Vandana
    Rani, Rinkle
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 : 217 - 232
  • [23] A Parallel Algorithm for Frequent Subgraph Mining
    Bay Vo
    Dang Nguyen
    Thanh-Long Nguyen
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 163 - 173
  • [24] Frequent subgraph mining in outerplanar graphs
    Tamás Horváth
    Jan Ramon
    Stefan Wrobel
    Data Mining and Knowledge Discovery, 2010, 21 : 472 - 508
  • [25] Frequent Subgraph Mining Based on Pregel
    Zhao, Xiang
    Chen, Yifan
    Xiao, Chuan
    Ishikawa, Yoshiharu
    Tang, Jiuyang
    COMPUTER JOURNAL, 2016, 59 (08): : 1113 - 1128
  • [26] A survey of frequent subgraph mining algorithms
    Jiang, Chuntao
    Coenen, Frans
    Zito, Michele
    KNOWLEDGE ENGINEERING REVIEW, 2013, 28 (01): : 75 - 105
  • [27] The Gaston Tool for Frequent Subgraph Mining
    Nijssen, Siegfried
    Kok, Joost N.
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2005, 127 (01) : 77 - 87
  • [28] Differentially Private Frequent Subgraph Mining
    Xu, Shengzhi
    Su, Sen
    Xiong, Li
    Cheng, Xiang
    Xiao, Ke
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 229 - 240
  • [29] Efficient frequent subgraph mining algorithm
    Li, Xian-Tong
    Li, Jian-Zhong
    Gao, Hong
    Ruan Jian Xue Bao/Journal of Software, 2007, 18 (10): : 2469 - 2480
  • [30] A qualitative survey on frequent subgraph mining
    Guvenoglu, Busra
    Bostanoglu, Belgin Ergenc
    OPEN COMPUTER SCIENCE, 2018, 8 (01) : 194 - 209