Web Contents Tracking by Learning of Page Grammars

被引:1
|
作者
Kukulenz, Dirk [1 ]
Reinke, Christoph [1 ]
Hoeller, Nils [1 ]
机构
[1] Univ Lubeck, Inst Informat Syst, D-23538 Lubeck, Germany
关键词
D O I
10.1109/ICIW.2008.58
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A significant fraction of Web data is available only for short periods of time. We consider methods to keep track and to record such dynamic information automatically. The main problems are to find adequate reload times for Web data in order to reduce network traffic, to improve the freshness of obtained data and to reduce the risk of loosing information. Previous approaches usually improve reload strategies for Web data by considering the change dynamics of pages, by modeling the behavior statistically and then by applying suitable reload strategies. Based on this approach we first give a precise definition of data changes on the Web. Page changes are described by a page decomposition which is based on the estimation of grammars. Based on this decomposition segments of Web pages are identified. The change behavior of individual segments is recorded and applied to optimize reload strategies. We show that the completeness of obtained data and the network traffic may be improved significantly by applying our new reload strategy.
引用
收藏
页码:416 / 425
页数:10
相关论文
共 50 条
  • [1] Social Aspects of Web Page Contents
    Kudelka, Milos
    Snasel, Vaclav
    Horak, Zdenek
    Abraham, Ajith
    2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS, PROCEEDINGS, 2009, : 80 - +
  • [2] Improvement of web data clustering using web page contents
    Xu, Y
    Weng, LT
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 521 - 530
  • [3] Chinese web page classification based on text contents
    Liang, JZ
    ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
  • [4] DESIGNING AN ENGLISH LEARNING WEB PAGE
    Wu Xiaozhen
    Teaching English in China, 1999, (03) : 50 - 53
  • [5] A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations
    Sano, Hiroyuki
    Swezey, Robin M. E.
    Shiramatsu, Shun
    Ozono, Tadachika
    Shintani, Toramatsu
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (01): : 1 - 6
  • [6] Evaluation of Search Engine Weight by Considering Repeated Web Page Contents
    Zhou, Hui
    Li, Chao
    Wang, Yimin
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2017, 23 (04): : 589 - 597
  • [7] Contents Page
    Lambert, Richard
    POETRY WALES, 2009, 44 (04): : 22 - 22
  • [8] Semantic knowledge building for image database by analyzing Web page contents
    Lai, YK
    Liu, S
    Chia, LT
    Chan, S
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 1283 - 1286
  • [9] Web contents extracting for web-based learning
    Qiu, Jiangtao
    Tang, Changjie
    Xu, Kaikuo
    Luo, Qian
    ADVANCES IN WEB BASED LEARNING - ICWL 2008, PROCEEDINGS, 2008, 5145 : 59 - +
  • [10] WEB GRAMMARS AND WEB AUTOMATA
    EZAWA, Y
    ABE, N
    MIZUMOTO, M
    TOYODA, J
    TANAKA, K
    ELECTRONICS & COMMUNICATIONS IN JAPAN, 1973, 56 (04): : 35 - 42