Simple yet efficient approach for maximal frequent subtrees extraction from a collection of XML documents

被引：0

作者：

Paik, Juryon ^{[1
]}

Kim, Ung Mo ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Comp Engn, Suwon 440746, Gyeonggi Do, South Korea

来源：

WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS | 2006年 / 4256卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, XML is penetrating virtually all areas of computer science and information technology, and is bringing about an unprecedented level of data exchange among heterogeneous data storage systems. With the continuous growth of online information stored, presented and exchanged using XML, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The mostly used approach to this task is to extract frequently occurring subtree patterns in trees. However, the number of frequent subtrees usually grows exponentially with the size of trees, and therefore, mining all frequent subtrees becomes infeasible for a large tree size. A more practical and scalable approach is to use maximal frequent subtrees, the number of which is much smaller than that of frequent subtrees. Handling the maximal frequent subtrees is an interesting challenge, and represents the core of this paper. We present a novel, conceptually simple, yet effective approach that discovers maximal frequent subtrees without generation of candidate subtrees from a database of XML trees. The beneficial effect of our approach is that it not only reduces significantly the number of rounds for infrequent tree pruning, but also eliminates totally each round for candidate generation by avoiding time consuming tree join operations or tree enumerations.

引用

页码：94 / 103

页数：10

共 50 条

[1] Clustering XML Documents Using Frequent Subtrees
Kutty, Sangeetha
Tran, Tien
Nayak, Richi
Li, Yuefeng
[J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
[2] Efficient extraction of maximally common subtrees from XML documents for web services
Paik, J
Song, YJ
Fouthoi, F
Kim, U
[J]. 7TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1371 - 1375
[3] Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
Kutty, Sangeetha
Tran, Tien
Nayak, Richi
Li, Yuefeng
[J]. FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 183 - 194
[4] EXiT-B: A new approach for extracting maximal frequent subtrees from XML data
Paik, J
Won, D
Fotouhi, F
Kim, UM
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 1 - 8
[5] Efficient schema extraction from a large collection of XML documents
Xing, Guangming
Parthepan, Vijayeandra
[J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 92 - 96
[6] Efficient data mining for maximal frequent subtrees
Xiao, YQ
Yao, JF
Li, ZG
Dunham, MH
[J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 379 - +
[7] Fast Extraction of Maximal Frequent Subtrees Using Bits Representation
Paik, Juryon
Nam, Junghyun
Won, Dongho
Kim, Ung Mo
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2009, 25 (02) : 435 - 464
[8] A novel method for mining frequent subtrees from XML data
Zhang, WS
Liu, DX
Zhang, JP
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 300 - 305
[9] Efficient extraction of schemas for XML documents
Min, JK
Ahn, JY
Chung, CW
[J]. INFORMATION PROCESSING LETTERS, 2003, 85 (01) : 7 - 12
[10] Discovering Frequent Subtrees from XML Data Using Neural Networks
SUN Wei
[J]. Wuhan University Journal of Natural Sciences, 2006, (01) : 117 - 121

← 1 2 3 4 5 →