Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents

被引:0
|
作者
Miyahara, T [1 ]
Suzuki, Y
Shoudai, T
Uchida, T
Takahashi, K
Ueda, H
机构
[1] Hiroshima Univ, Fac Informat Sci, Hiroshima 7313194, Japan
[2] Kyushu Univ, Dept Informat, Kasuga, Fukuoka 8168580, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to extract meaningful and hidden knowledge from semistructured documents such as HTML or XML files, methods for discovering frequent patterns or common characteristics in semistructured documents have been more and more important. We propose new methods for discovering maximally frequent tree structured patterns in semistructured Web documents by using tag tree patterns as hypotheses. A tag tree pattern is an edge labeled tree which has ordered or unordered children and structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can match an arbitrary subtree, which represents a field of a semistructured document. As a special case, a contractible variable can match an empty subtree, which represents a missing field in a semistructured document. Since semistructured documents have irregularities such as missing fields, a tag tree pattern with contractible variables is suited for representing tree structured patterns in such semistructured documents. First, we present an algorithm for generating all maximally frequent ordered tag tree patterns with contractible variables. Second, we give an algorithm for generating all maximally frequent unordered tag tree patterns with contractible variables.
引用
收藏
页码:133 / 144
页数:12
相关论文
共 24 条
  • [1] Discovery of maximally frequent tag tree patterns with height-constrained variables from semistructured web documents
    Suzuki, Y
    Miyahara, T
    Shoudai, T
    Uchida, T
    Nakamura, Y
    [J]. INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 104 - 112
  • [2] Extraction of tag tree patterns with contractible variables from irregular semistructured data
    Miyahara, T
    Suzuki, Y
    Shoudai, T
    Uchida, T
    Hirokawa, S
    Takahashi, K
    Ueda, H
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 430 - 436
  • [3] Evolution of characteristic tree structured patterns from semistructured documents
    Inata, Katsushi
    Miyahara, Tetsuhiro
    Ueda, Hiroaki
    Takahashi, Kenichi
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 1201 - +
  • [4] Efficient learning of ordered and unordered tree patterns with contractible variables
    Suzuki, Y
    Shoudai, T
    Matsumoto, S
    Uchida, T
    Miyahara, T
    [J]. ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2003, 2842 : 114 - 128
  • [5] Computing frequent graph patterns from semistructured data
    Vanetik, N
    Gudes, E
    Shimony, SE
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 458 - 465
  • [6] From path tree to frequent patterns: A framework for mining frequent patterns
    Xu, YB
    Yu, JX
    Liu, GM
    Lu, HJ
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 514 - 521
  • [7] Discovery of temporal frequent patterns using TFP-tree
    Jin, Long
    Lee, Yongmi
    Seo, Sungbo
    Ryu, Kenn Ho
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 349 - 361
  • [8] Discovery of useful patterns from tree-structured documents with label-projected database
    Paik, Juryon
    Nam, Junghyun
    Youn, Hee Yong
    Kim, Ung Mo
    [J]. AUTONOMIC AND TRUSTED COMPUTING, PROCEEDINGS, 2008, 5060 : 264 - +
  • [9] Algorithm for Enumerating All Maximal Frequent Tree Patterns among Words in Tree-Structured Documents and Its Application
    Uchida, Tomoyuki
    Kawamoto, Kayo
    [J]. International Journal of Database Theory and Application, 2009, 2 (04): : 59 - 73
  • [10] Algorithm for Enumerating All Maximal Frequent Tree Patterns among Words in Tree-Structured Documents and Its Application
    Uchida, Tomoyuki
    Kawamoto, Kayo
    [J]. DATABASE THEORY AND APPLICATION, 2009, 64 : 107 - 114