MANIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling

被引:5
|
作者
Preti, Giulia [1 ]
Morales, Gianmarco De Francisci [1 ]
Riondato, Matteo [2 ]
机构
[1] CENTAI, Corso Inghilterra 3, I-10138 Turin, Italy
[2] Amherst Coll, Dept Comp Sci, Box 2232, Amherst, MA 01002 USA
基金
美国国家科学基金会;
关键词
Minimum Node Image; pattern mining; VC-dimension; GRAPHLETS;
D O I
10.1145/3587254
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MANIACS, a sampling-based randomized algorithm for computing high-quality approximations of the collection of the subgraph patterns that are frequent in a single, large, vertex-labeled graph, according to the Minimum Node Image-based (MNI) frequency measure. The output of MANIACS comes with strong probabilistic guarantees, obtained by using the empirical Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory, together with strong probabilistic tail bounds on the difference between the frequency of a pattern in the sample and its exact frequency. MANIACS leverages properties of the MNI-frequency to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces that contain no frequent patterns. In turn, this pruning leads to better bounds to the maximum frequency estimation error, which leads to increased pruning, resulting in a beneficial feedback effect. The results of our experimental evaluation of MANIACS on real graphs show that it returns high-quality collections of frequent patterns in large graphs up to two orders of magnitude faster than the exact algorithm.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Frequent Subgraph Mining Algorithms - A Survey
    Ramraj, T.
    Prabhakar, R.
    GRAPH ALGORITHMS, HIGH PERFORMANCE IMPLEMENTATIONS AND ITS APPLICATIONS (ICGHIA 2014), 2015, 47 : 197 - 204
  • [32] A Fast Frequent Subgraph Mining Algorithm
    Wu, Jia
    Chen, Ling
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 82 - 87
  • [33] Frequent subgraph mining in outerplanar graphs
    Horvath, Tamas
    Ramon, Jan
    Wrobel, Stefan
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (03) : 472 - 508
  • [34] MARGIN: Maximal Frequent Subgraph Mining
    Thomas, Lini T.
    Valluri, Satyanarayana R.
    Karlapalem, Kamalakar
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2010, 4 (03)
  • [35] MARGIN: Maximal frequent subgraph mining
    Thomas, Lini T.
    Valluri, Satyanarayana R.
    Karlapalem, Kamalakar
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1097 - +
  • [36] gApprox: Mining frequent approximate patterns from a massive network
    Chen, Chen
    Yan, Xifeng
    Zhu, Feida
    Han, Jiawei
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 445 - +
  • [37] Extension of Canonical Adjacency Matrices for Frequent Approximate Subgraph Mining on Multi-Graph Collections
    Acosta-Mendoza, Niusvel
    Gago-Alonso, Andres
    Ariel Carrasco-Ochoa, Jesus
    Fco Martinez-Trinidad, Jose
    Medina-Pagola, Jose E.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (08)
  • [38] Discriminative frequent subgraph mining with optimality guarantees
    Thoma M.
    Cheng H.
    Gretton A.
    Han J.
    Kriegel H.-P.
    Smola A.
    Song L.
    Yu P.S.
    Yan X.
    Borgwardt K.M.
    Statistical Analysis and Data Mining, 2010, 3 (05): : 302 - 318
  • [39] A New Framework of Frequent Uncertain Subgraph Mining
    Moussaoui, Mohamed
    Zaghdoud, Montaceur
    Akaichi, Jalel
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 413 - 422
  • [40] Grasping frequent subgraph mining for bioinformatics applications
    Aida Mrzic
    Pieter Meysman
    Wout Bittremieux
    Pieter Moris
    Boris Cule
    Bart Goethals
    Kris Laukens
    BioData Mining, 11