Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents

被引:0
|
作者
Miyahara, T [1 ]
Suzuki, Y
Shoudai, T
Uchida, T
Takahashi, K
Ueda, H
机构
[1] Hiroshima Univ, Fac Informat Sci, Hiroshima 7313194, Japan
[2] Kyushu Univ, Dept Informat, Kasuga, Fukuoka 8168580, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to extract meaningful and hidden knowledge from semistructured documents such as HTML or XML files, methods for discovering frequent patterns or common characteristics in semistructured documents have been more and more important. We propose new methods for discovering maximally frequent tree structured patterns in semistructured Web documents by using tag tree patterns as hypotheses. A tag tree pattern is an edge labeled tree which has ordered or unordered children and structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can match an arbitrary subtree, which represents a field of a semistructured document. As a special case, a contractible variable can match an empty subtree, which represents a missing field in a semistructured document. Since semistructured documents have irregularities such as missing fields, a tag tree pattern with contractible variables is suited for representing tree structured patterns in such semistructured documents. First, we present an algorithm for generating all maximally frequent ordered tag tree patterns with contractible variables. Second, we give an algorithm for generating all maximally frequent unordered tag tree patterns with contractible variables.
引用
收藏
页码:133 / 144
页数:12
相关论文
共 24 条
  • [21] Discovery of web frequent patterns and user characteristics from web access logs: A framework for dynamic web personalization
    Dua, S
    Cho, EC
    Iyengar, SS
    [J]. 3RD IEEE SYMPOSIUM ON APPLICATION SPECIFIC SYSTEMS AND SOFTWARE ENGINEERING TECHNOLOGY, PROCEEDINGS, 2000, : 3 - 8
  • [22] Efficiently Mining Closed Frequent Patterns with Weight Constraint from Directed Graph Traversals Using Weighted FP-tree Approach
    Geng, Runian
    Dong, Xiangjun
    Zhang, Xingye
    Xu, Wenbo
    [J]. 2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 3, PROCEEDINGS, 2008, : 399 - +
  • [23] Polynomial Time Inductive Inference of Languages of Ordered Term Tree Patterns with Height-Constrained Variables from Positive Data
    Shoudai, Takayoshi
    Aikoh, Kazuhide
    Suzuki, Yusuke
    Matsumoto, Satoshi
    Miyahara, Tetsuhiro
    Uchida, Tomoyuki
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (03) : 785 - 802
  • [24] Predicting fine-scale tree species abundance patterns using biotic variables derived from LiDAR and high spatial resolution imagery
    van Ewijk, Karin Y.
    Randin, Christophe F.
    Treitz, Paul M.
    Scott, Neal A.
    [J]. REMOTE SENSING OF ENVIRONMENT, 2014, 150 : 120 - 131