From Homomorphisms to Embeddings: A Novel Approach for Mining Embedded Patterns from Large Tree Data

被引:1
|
作者
Wu, Xiaoying [1 ]
Theodoratos, Dimitri [2 ]
Sellis, Timos [3 ]
机构
[1] Wuhan Univ, Sch Comp, Wuhan, Hubei, Peoples R China
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[3] Swinburne Univ Technol, Data Sci Res Inst, Melbourne, Vic, Australia
关键词
Embedded tree pattern mining; Homomorphism; Bitmap view; Holistic twig-join algorithm; FREQUENT TREES; EFFICIENT; QUERIES;
D O I
10.1016/j.bdr.2018.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many modern applications and systems represent and exchange data in tree-structured form and process and produce large tree datasets. Discovering informative patterns in large tree datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships hidden deeply in the datasets which cannot be captured by induced patterns. Unfortunately, previous embedded tree pattern mining approaches cannot scale satisfactorily when the size of the dataset increases. As a consequence, they focus almost exclusively on mining patterns from a collection of small trees and they are incapable of mining patterns from large data trees. However, given the ubiquitous use of tree data, this pattern mining problem needs efficient solutions. In this paper, we address the problem of mining frequent unordered embedded tree patterns from large data trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. To reduce space consumption, matching information of already computed patterns is materialized as bitmaps. We further optimize our basic support computation method by designing an algorithm which incrementally generates the bitmaps of the embeddings of a new candidate pattern without first explicitly computing the embeddings of this pattern. Our extensive experimental results on real and synthetic large-tree datasets show that our approach displays orders of magnitude performance improvements over a state-of-the-art tree mining algorithm and a recent graph mining algorithm. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:37 / 53
页数:17
相关论文
共 50 条
  • [1] A MapReduce-Based Approach for Mining Embedded Patterns from Large Tree Data
    Zhao, Wen
    Wu, Xiaoying
    [J]. WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 455 - 462
  • [2] Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data Trees
    Wu, Xiaoying
    Theodoratos, Dimitri
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 3 - 20
  • [3] Discovering closed and maximal embedded patterns from large tree data
    Wu, Xiaoying
    Theodoratos, Dimitri
    Mamoulis, Nikos
    [J]. DATA & KNOWLEDGE ENGINEERING, 2021, 133
  • [4] A Novel Approach for Mining Patterns from Large Uncertain Data using MapReduce Model
    Rathan, B. Rini
    Rani, K. Swarupa
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [5] A novel approach for mining frequent patterns from incremental data
    Jindal, Rajni
    Borah, Malaya Dutta
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (03) : 244 - 264
  • [6] Mining Tree Patterns with Partially Injective Homomorphisms
    Schulz, Till Hendrik
    Horvath, Tamas
    Welke, Pascal
    Wrobel, Stefan
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 585 - 601
  • [7] Homomorphic Pattern Mining from a Single Large Data Tree
    Wu, Xiaoying
    Theodoratos, Dimitri
    [J]. DATA SCIENCE AND ENGINEERING, 2016, 1 (04) : 203 - 218
  • [8] EFFICIENT MINING OF CLOSED TREE PATTERNS FROM LARGE TREE DATABASES WITH SUBTREE CONSTRAINT
    Viet Anh Nguyen
    Doi, Koichiro
    Yamamoto, Akihiro
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2012, 21 (06)
  • [9] Mining maximal embedded unordered tree patterns
    Chehreghani, Mostafa Haghir
    Rahgozar, Masoud
    Lucas, Caro
    Chehreghani, Morteza Haghir
    [J]. 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 437 - 443
  • [10] Mining sequential patterns from data streams: a centroid approach
    Marascu, Alice
    Masseglia, Florent
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 27 (03) : 291 - 307