Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs

被引:12
|
作者
Meng, Jinghan [1 ]
Tu, Yi-Cheng [1 ,2 ]
机构
[1] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[2] Univ S Florida, IDSC, Tampa, FL 33620 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Data mining; graph mining; support measures; hypergraph; SUBGRAPH;
D O I
10.1145/3035918.3035936
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence / instance hypergraphs. Based on that, we present two new support measures: minimum instance (MI) measure and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. In particular, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. Although the MVC measure is NP-hard, it can be approximated to a constant factor in polynomial time. We also provide polynomial-time relaxations for both measures and bounding theorems for all presented support measures in the hypergraph setting. We further show that the hypergraph-based framework can unify all support measures studied in this paper. This framework is also flexible in that more variants of support measures can be defined and profiled in it.
引用
收藏
页码:391 / 402
页数:12
相关论文
共 50 条
  • [31] Mining maximum frequent access patterns in web logs based on unique labeled tree
    Zhang, Ling
    Yin, Ran-ping
    Zhan, Yu-bin
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 73 - 82
  • [32] An Efficient Algorithm for Mining Maximal Frequent Sequential Patterns in Large Databases
    Su, Qiu-bin
    Lu, Lu
    Cheng, Bin
    2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 404 - 410
  • [33] A New Polynomial-time Support Measure for Counting Frequent Patterns in Graphs
    Meng, Jinghan
    Tu, Yi-Cheng
    Pitaksirianan, Napath
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2019), 2019, : 214 - 217
  • [34] Mining top-k frequent patterns without minimum support threshold
    Salam, Abdus
    Khayal, M. Sikandar Hayat
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (01) : 57 - 86
  • [35] Mining top-k frequent closed patterns without minimum support
    Han, JW
    Wang, JY
    Lu, Y
    Tzvetkov, P
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 211 - 218
  • [36] A NOVEL ALGORITHM FOR FAST MINING FREQUENT PATTERNS BASED ON SUPPORT LIST STRUCTURE
    Zhu, Xiaolin
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2022, 23 (09) : 1943 - 1966
  • [37] Mining frequent patterns with multiple item support thresholds in tourism information databases
    Chen, Yi-Chun
    Lin, Grace
    Chan, Ya-Hui
    Shih, Meng-Jung
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8916 : 89 - 98
  • [38] Efficient mining of long frequent patterns from very large dense datasets
    Gopalan, RP
    Sucahyo, YG
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 652 - 661
  • [39] An Algorithm Based on Dataflow Model for Mining Frequent Patterns from a Large Graph
    Tang X.-C.
    Fan X.-F.
    Zhou J.-W.
    Li Z.-H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (07): : 1293 - 1311
  • [40] Verifying and mining frequent patterns from large windows over data streams
    Mozafari, Barzan
    Thakkar, Hetal
    Zaniolo, Carlo
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 179 - 188