Extraction of tag tree patterns with contractible variables from irregular semistructured data

被引:0
|
作者
Miyahara, T [1 ]
Suzuki, Y
Shoudai, T
Uchida, T
Hirokawa, S
Takahashi, K
Ueda, H
机构
[1] Hiroshima City Univ, Fac Informat Sci, Hiroshima 7313194, Japan
[2] Kyushu Univ, Dept Informat, Kasuga, Fukuoka 8168580, Japan
[3] Kyushu Univ, Comp & Commun Ctr, Fukuoka 8128581, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. Many semistructured data have irregularities such as missing or erroneous data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wild-card, and a variable can be substituted by an arbitrary tree. Especially, a contractible variable matches any subtree including a singleton vertex. So a tag tree pattern is suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistructured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from irregular semistructured data.
引用
收藏
页码:430 / 436
页数:7
相关论文
共 50 条
  • [1] Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents
    Miyahara, T
    Suzuki, Y
    Shoudai, T
    Uchida, T
    Takahashi, K
    Ueda, H
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 133 - 144
  • [2] Discovery of maximally frequent tag tree patterns with height-constrained variables from semistructured web documents
    Suzuki, Y
    Miyahara, T
    Shoudai, T
    Uchida, T
    Nakamura, Y
    [J]. INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 104 - 112
  • [3] Efficient learning of ordered and unordered tree patterns with contractible variables
    Suzuki, Y
    Shoudai, T
    Matsumoto, S
    Uchida, T
    Miyahara, T
    [J]. ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2003, 2842 : 114 - 128
  • [4] Extracting Schema from Semistructured Data with Weight Tag
    Li, Jiuzhong
    Shi, Shuo
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 3, PROCEEDINGS, 2009, 5553 : 1137 - 1145
  • [5] A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data
    Suzuki, Y
    Inomae, K
    Shoudai, T
    Miyahara, T
    Uchida, T
    [J]. INDUCTIVE LOGIC PROGRAMMING, 2003, 2583 : 270 - 284
  • [6] Evolution of characteristic tree structured patterns from semistructured documents
    Inata, Katsushi
    Miyahara, Tetsuhiro
    Ueda, Hiroaki
    Takahashi, Kenichi
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 1201 - +
  • [7] Computing frequent graph patterns from semistructured data
    Vanetik, N
    Gudes, E
    Shimony, SE
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 458 - 465
  • [8] Mining is-part-of association patterns from semistructured data
    Wang, K
    Liu, HQ
    [J]. KNOWLEDGE MANAGEMENT & INTELLIGENT ENTERPRISES, 2001, : 189 - 204
  • [9] Efficient learning of unlabeled term trees with contractible variables from positive data
    Suzuki, Y
    Shoudai, T
    Matsumoto, S
    Uchida, T
    [J]. INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS, 2003, 2835 : 347 - 364
  • [10] On precision and recall of multi-attribute data extraction from semistructured sources
    Yang, GZ
    Mukherjee, S
    Ramakrishnan, IV
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 395 - 402