Simple yet efficient approach for maximal frequent subtrees extraction from a collection of XML documents

被引:0
|
作者
Paik, Juryon [1 ]
Kim, Ung Mo [1 ]
机构
[1] Sungkyunkwan Univ, Dept Comp Engn, Suwon 440746, Gyeonggi Do, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, XML is penetrating virtually all areas of computer science and information technology, and is bringing about an unprecedented level of data exchange among heterogeneous data storage systems. With the continuous growth of online information stored, presented and exchanged using XML, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The mostly used approach to this task is to extract frequently occurring subtree patterns in trees. However, the number of frequent subtrees usually grows exponentially with the size of trees, and therefore, mining all frequent subtrees becomes infeasible for a large tree size. A more practical and scalable approach is to use maximal frequent subtrees, the number of which is much smaller than that of frequent subtrees. Handling the maximal frequent subtrees is an interesting challenge, and represents the core of this paper. We present a novel, conceptually simple, yet effective approach that discovers maximal frequent subtrees without generation of candidate subtrees from a database of XML trees. The beneficial effect of our approach is that it not only reduces significantly the number of rounds for infrequent tree pruning, but also eliminates totally each round for candidate generation by avoiding time consuming tree join operations or tree enumerations.
引用
收藏
页码:94 / 103
页数:10
相关论文
共 50 条
  • [1] Clustering XML Documents Using Frequent Subtrees
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
  • [2] Efficient extraction of maximally common subtrees from XML documents for web services
    Paik, J
    Song, YJ
    Fouthoi, F
    Kim, U
    [J]. 7TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1371 - 1375
  • [3] Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    [J]. FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 183 - 194
  • [4] EXiT-B: A new approach for extracting maximal frequent subtrees from XML data
    Paik, J
    Won, D
    Fotouhi, F
    Kim, UM
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 1 - 8
  • [5] Efficient schema extraction from a large collection of XML documents
    Xing, Guangming
    Parthepan, Vijayeandra
    [J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 92 - 96
  • [6] Efficient data mining for maximal frequent subtrees
    Xiao, YQ
    Yao, JF
    Li, ZG
    Dunham, MH
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 379 - +
  • [7] Fast Extraction of Maximal Frequent Subtrees Using Bits Representation
    Paik, Juryon
    Nam, Junghyun
    Won, Dongho
    Kim, Ung Mo
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2009, 25 (02) : 435 - 464
  • [8] A novel method for mining frequent subtrees from XML data
    Zhang, WS
    Liu, DX
    Zhang, JP
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 300 - 305
  • [9] Efficient extraction of schemas for XML documents
    Min, JK
    Ahn, JY
    Chung, CW
    [J]. INFORMATION PROCESSING LETTERS, 2003, 85 (01) : 7 - 12