Web Contents Tracking by Learning of Page Grammars

被引:1
|
作者
Kukulenz, Dirk [1 ]
Reinke, Christoph [1 ]
Hoeller, Nils [1 ]
机构
[1] Univ Lubeck, Inst Informat Syst, D-23538 Lubeck, Germany
关键词
D O I
10.1109/ICIW.2008.58
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A significant fraction of Web data is available only for short periods of time. We consider methods to keep track and to record such dynamic information automatically. The main problems are to find adequate reload times for Web data in order to reduce network traffic, to improve the freshness of obtained data and to reduce the risk of loosing information. Previous approaches usually improve reload strategies for Web data by considering the change dynamics of pages, by modeling the behavior statistically and then by applying suitable reload strategies. Based on this approach we first give a precise definition of data changes on the Web. Page changes are described by a page decomposition which is based on the estimation of grammars. Based on this decomposition segments of Web pages are identified. The change behavior of individual segments is recorded and applied to optimize reload strategies. We show that the completeness of obtained data and the network traffic may be improved significantly by applying our new reload strategy.
引用
收藏
页码:416 / 425
页数:10
相关论文
共 50 条
  • [41] WEB GRAMMARS AND PICTURE DESCRIPTION
    PFALTZ, JL
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1971, SMC1 (04): : 397 - &
  • [42] Predicting web page performance level based on web page characteristics
    Zhou, Junzan
    Zhang, Yun
    Zhou, Bo
    Li, Shanping
    International Journal of Web Engineering and Technology, 2015, 10 (02) : 152 - 169
  • [43] Web page scoring based on link analysis of web page sets
    Nakakubo, Hitoshi
    Nakajima, Shinsuke
    Hatano, Kenji
    Miyazaki, Jun
    Uemura, Shunsuke
    DEXA 2007: 18TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, : 269 - +
  • [44] A method for supporting web page design based on impression of web page
    Watanabe, M
    Yoshida, T
    Saiwaki, N
    Nishida, S
    IEEE RO-MAN 2000: 9TH IEEE INTERNATIONAL WORKSHOP ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, PROCEEDINGS, 2000, : 13 - 17
  • [45] Web Server for Web Page Fingerprinting
    Park, Subin
    Cho, Dongsub
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 530 - 533
  • [46] Multi-label incremental learning applied to web page categorization
    Ciarelli, Patrick Marques
    Oliveira, Elias
    Salles, Evandro O. T.
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (06): : 1403 - 1419
  • [47] Experimental Analysis of the Machine Learning Algorithms for Crime Web Page Classification
    Markkandeyan, S.
    Selvam, L.
    Tamizharasu, K.
    Aandi, Senthilkumar
    IETE JOURNAL OF RESEARCH, 2024, 70 (05) : 4890 - 4902
  • [48] Combining ILP with Semi-supervised Learning for Web Page Categorization
    Soonthornphisaj, Nuanwan
    Kijsirikul, Boonserm
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 1, 2007, 1 : 120 - +
  • [49] Multi-label incremental learning applied to web page categorization
    Patrick Marques Ciarelli
    Elias Oliveira
    Evandro O. T. Salles
    Neural Computing and Applications, 2014, 24 : 1403 - 1419
  • [50] Web Page Classification Using Relational Learning Algorithm and Unlabeled Data
    Li, Yanjuan
    Guo, Maozu
    JOURNAL OF COMPUTERS, 2011, 6 (03) : 474 - 479