Web Contents Tracking by Learning of Page Grammars

被引:1
|
作者
Kukulenz, Dirk [1 ]
Reinke, Christoph [1 ]
Hoeller, Nils [1 ]
机构
[1] Univ Lubeck, Inst Informat Syst, D-23538 Lubeck, Germany
关键词
D O I
10.1109/ICIW.2008.58
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A significant fraction of Web data is available only for short periods of time. We consider methods to keep track and to record such dynamic information automatically. The main problems are to find adequate reload times for Web data in order to reduce network traffic, to improve the freshness of obtained data and to reduce the risk of loosing information. Previous approaches usually improve reload strategies for Web data by considering the change dynamics of pages, by modeling the behavior statistically and then by applying suitable reload strategies. Based on this approach we first give a precise definition of data changes on the Web. Page changes are described by a page decomposition which is based on the estimation of grammars. Based on this decomposition segments of Web pages are identified. The change behavior of individual segments is recorded and applied to optimize reload strategies. We show that the completeness of obtained data and the network traffic may be improved significantly by applying our new reload strategy.
引用
收藏
页码:416 / 425
页数:10
相关论文
共 50 条
  • [31] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 361 - 365
  • [32] Utility data web page design-learning technologies
    Green, David C.
    Li, Fangxing
    Strategic Planning for Energy and the Environment, 2004, 23 (04) : 53 - 77
  • [33] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 145 - 149
  • [34] Learning to Query: Focused Web Page Harvesting for Entity Aspects
    Fang, Yuan
    Zheng, Vincent W.
    Chang, Kevin Chen-Chuan
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1002 - 1013
  • [35] Learning Web Page Scores by Error Back-Propagation
    Diligenti, Michelangelo
    Gori, Marco
    Maggini, Marco
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 684 - 689
  • [36] Does Aesthetics of Web Page Interface Matters to Mandarin Learning?
    Zain, Jasni Mohamad
    Tey, Mengkar
    Goh, Yingsoon
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (08): : 43 - 51
  • [37] How people acquire knowledge from a web page: An eye tracking study
    Eger, Ludvik
    KNOWLEDGE MANAGEMENT & E-LEARNING-AN INTERNATIONAL JOURNAL, 2018, 10 (03) : 350 - 366
  • [38] Web Page Graphic Design Usability Testing Enhanced with Eye-Tracking
    Chynal, Piotr
    Falkowska, Julia
    Sobecki, Janusz
    INTELLIGENT HUMAN SYSTEMS INTEGRATION, IHSI 2018, 2018, 722 : 515 - 520
  • [39] Design and implementation the Web-based learning contents for individual learning
    Huang, YM
    Wang, KT
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL X, PROCEEDINGS: SYSTEMICS AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 174 - 178
  • [40] Origin Tracking in Attribute Grammars
    Williams, Kevin
    Van Wyk, Eric
    SOFTWARE LANGUAGE ENGINEERING, SLE 2014, 2014, 8706 : 282 - 301