Web Contents Tracking by Learning of Page Grammars

被引:1
|
作者
Kukulenz, Dirk [1 ]
Reinke, Christoph [1 ]
Hoeller, Nils [1 ]
机构
[1] Univ Lubeck, Inst Informat Syst, D-23538 Lubeck, Germany
关键词
D O I
10.1109/ICIW.2008.58
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A significant fraction of Web data is available only for short periods of time. We consider methods to keep track and to record such dynamic information automatically. The main problems are to find adequate reload times for Web data in order to reduce network traffic, to improve the freshness of obtained data and to reduce the risk of loosing information. Previous approaches usually improve reload strategies for Web data by considering the change dynamics of pages, by modeling the behavior statistically and then by applying suitable reload strategies. Based on this approach we first give a precise definition of data changes on the Web. Page changes are described by a page decomposition which is based on the estimation of grammars. Based on this decomposition segments of Web pages are identified. The change behavior of individual segments is recorded and applied to optimize reload strategies. We show that the completeness of obtained data and the network traffic may be improved significantly by applying our new reload strategy.
引用
收藏
页码:416 / 425
页数:10
相关论文
共 50 条
  • [21] JOURNAL CONTENTS PAGE
    LAMBERT, T
    BRITISH JOURNAL OF PSYCHIATRY, 1989, 155 : 126 - 126
  • [22] MODELING LEARNING CONTENTS BASED ON WEB SERVICES
    Liu, Jingjing
    Wu, Yijian
    Zhao, Wenyun
    NWESP 2007: THIRD INTERNATIONAL CONFERENCE ON NEXT GENERATION WEB SERVICES PRACTICES, PROCEEDINGS, 2007, : 135 - +
  • [23] JOURNAL CONTENTS PAGE
    FREEMAN, H
    BRITISH JOURNAL OF PSYCHIATRY, 1989, 155 : 126 - 126
  • [24] Using Eye Tracking to Compare Web Page Designs: A Case Study
    Bojko, Agnieszka
    JOURNAL OF USABILITY STUDIES, 2006, 1 (03) : 112 - 120
  • [25] Learning System of Web Navigation Patterns through Hypertext Probabilistic Grammars
    Cortez Vasquez, Augusto
    INGE CUC, 2015, 11 (01) : 72 - 78
  • [26] Discovering sequential concept patterns for behavioural diagnosis by interpreting web-page contents
    Chang, CK
    Chen, GD
    INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2005, 42 (01) : 27 - 41
  • [27] Detecting and Clustering Similar Results of Search Engine by Exploiting Web Page's Contents
    Gao, Kai
    Wu, Hui-cong
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10960 - 10963
  • [28] Deep Learning Based Classification of Visual Behavior on Web Page
    Zhang, Meng-jie
    Lv, Sheng-fu
    Li, Mi
    INTERNATIONAL CONFERENCE ON ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING (ICEECE 2015), 2015, : 266 - 270
  • [29] Learning Web Page Block Functions using Roles of Images
    Yang, Xin
    Shi, Yuanchun
    2008 3RD INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2, 2008, : 151 - 156
  • [30] Phishing Web Page Detection Using Optimised Machine Learning
    Stobbs, Jordan
    Issac, Biju
    Jacob, Seibu Mary
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 483 - 490