Discovering closed and maximal embedded patterns from large tree data

被引:0
|
作者
Wu, Xiaoying [1 ]
Theodoratos, Dimitri [2 ]
Mamoulis, Nikos [3 ]
机构
[1] Wuhan Univ, Sch Comp, Wuhan, Peoples R China
[2] New Jersey Inst Technol, Newark, NJ 07102 USA
[3] Univ Ioannina, Ioannina, Greece
基金
中国国家自然科学基金;
关键词
Hierarchical graph data; Frequent pattern mining; Embedded tree pattern; Pattern summarization; Maximal and closed frequent pattern;
D O I
10.1016/j.datak.2021.101890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees. We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Mining maximal embedded unordered tree patterns
    Chehreghani, Mostafa Haghir
    Rahgozar, Masoud
    Lucas, Caro
    Chehreghani, Morteza Haghir
    [J]. 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 437 - 443
  • [2] From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data
    Wu, Xiaoying
    Theodoratos, Dimitri
    Sellis, Timos
    [J]. BIG DATA RESEARCH, 2018, 14 : 37 - 53
  • [3] A MapReduce-Based Approach for Mining Embedded Patterns from Large Tree Data
    Zhao, Wen
    Wu, Xiaoying
    [J]. WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 455 - 462
  • [4] Discovering Patterns from Large and Dynamic Sequential Data
    Wang K.
    [J]. Journal of Intelligent Information Systems, 1997, 9 (1) : 33 - 56
  • [5] A SAT-Based Approach for Discovering Frequent, Closed and Maximal Patterns in a Sequence
    Coquery, Emmanuel
    Jabbour, Said
    Sais, Lakhdar
    Salhi, Yakoub
    [J]. 20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 258 - +
  • [6] EFFICIENT MINING OF CLOSED TREE PATTERNS FROM LARGE TREE DATABASES WITH SUBTREE CONSTRAINT
    Viet Anh Nguyen
    Doi, Koichiro
    Yamamoto, Akihiro
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2012, 21 (06)
  • [7] Discovering Frequent Tree Patterns over Data Streams
    Hsieh, Mark Cheng-Enn
    Wu, Yi-Hung
    Chen, Arbee L. P.
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 629 - +
  • [8] Discovering Maximal Partial Periodic Patterns in Very Large Temporal Databases
    Likitha, P.
    Veena, P.
    Kiran, R. Uday
    Watanobe, Yukata
    Zettsu, Koji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1460 - 1469
  • [9] Discovering Erasable Closed Patterns
    Giang Nguyen
    Tuong Le
    Bay Vo
    Bac Le
    [J]. Intelligent Information and Database Systems, Pt I, 2015, 9011 : 368 - 376
  • [10] Discovering Maximal Periodic-Frequent Patterns in Very Large Temporal Databases
    Kiran, R. Uday
    Watanobe, Yutaka
    Chaudhury, Bhaskar
    Zettsu, Koji
    Toyoda, Masashi
    Kitsuregawa, Masaru
    [J]. 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 11 - 20