Counting frequent patterns in large labeled graphs: a hypergraph-based approach

被引:1
|
作者
Meng, Jinghan [1 ]
Pitaksirianan, Napath [1 ]
Tu, Yi-Cheng [1 ]
机构
[1] Univ S Florida, 4202 E Fowler Ave, Tampa, FL 33620 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Data mining; Graph mining; Support measure; Hypergraph; EFFICIENT ALGORITHM; SUBGRAPH;
D O I
10.1007/s10618-020-00686-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for designing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence/instance hypergraphs. Based on such, we are able to design a series of new support measures: minimum instance (MI) measure, and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. More importantly, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. We show that not only most major existing support measures and new measures proposed in this paper can be mapped into the new framework, but also they occupy different locations of the frequency spectrum. By taking advantage of the new framework, we discover that MVC can be approximated to a constant factor (in terms of number of pattern nodes) in polynomial time. In contrast to common belief, we demonstrate that the state-of-the-art overlap-graph-based maximum independent set (MIS) measure also has constant approximation algorithms. We further show that using standard linear programming and semidefinite programming techniques, polynomial-time relaxations for both MVC and MIS measures can be developed and their counts stand between MVC and MIS. In addition, we point out that MVC, MIS, and their relaxations are bounded within constant factor. In summary, all major support measures are unified in the new hypergraph-based framework which helps reveal their bounding relations and hardness properties.
引用
收藏
页码:980 / 1021
页数:42
相关论文
共 50 条
  • [41] IndexedFCP - An Index based approach to identify Frequent Contiguous Patterns (FCP) in Big Data
    Rajasekaran, S.
    Rubi, R. Devika
    Arockiam, L.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 27 - 31
  • [42] An N-List-Based Approach for Mining Frequent Inter-Transaction Patterns
    Thanh-Ngo Nguyen
    Nguyen, Loan T. T.
    Vo, Bay
    Ngoc-Thanh Nguyen
    Nguyen, Trinh D. D.
    IEEE ACCESS, 2020, 8 : 116840 - 116855
  • [43] Efficient k-Clique Counting on Large Graphs: The Power of Color-Based Sampling Approaches
    Ye, Xiaowei
    Li, Rong-Hua
    Dai, Qiangqiang
    Chen, Hongzhi
    Wang, Guoren
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1518 - 1536
  • [44] Using a projection-based approach to mine frequent inter-transaction patterns
    Wang, Chun-Sheng
    Chu, Kuo-Chung
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) : 11024 - 11031
  • [45] An Efficient Interval-Based Approach to Mining Frequent Patterns in a Time Series Database
    Phan Thi Bao Tran
    Vo Thi Ngoc Chau
    Duong Tuan Anh
    MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2013, 8271 : 211 - 222
  • [46] A fine-grained approach for Android taint analysis based on labeled taint value graphs
    Xiang, Dongming
    Lin, Shuai
    Huang, Ke
    Ding, Zuohua
    Liu, Guanjun
    Li, Xiaofeng
    COMPUTERS & SECURITY, 2025, 148
  • [47] VColor: A Practical Vertex-cut Based Approach for Coloring Large Graphs
    Peng, Yun
    Choi, Byron
    He, Bingsheng
    Zhou, Shuigeng
    Xu, Ruzhi
    Yu, Xiaohui
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 97 - 108
  • [48] An efficient approach for outlier detection from uncertain data streams based on maximal frequent patterns
    Cai, Saihua
    Li, Li
    Li, Sicong
    Sun, Ruizhi
    Yuan, Gang
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [49] An efficient approach for mining fault-tolerant frequent patterns based on bit vector representations
    Koh, JL
    Yo, PW
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 568 - 575
  • [50] A Sliding Window-Based Approach for Mining Frequent Weighted Patterns Over Data Streams
    Bui, Huong
    Nguyen-Hoang, Tu-Anh
    Vo, Bay
    Nguyen, Ham
    Le, Tuong
    IEEE ACCESS, 2021, 9 : 56318 - 56329