Web Contents Tracking by Learning of Page Grammars

被引：1

作者：

Kukulenz, Dirk ^{[1
]}

Reinke, Christoph ^{[1
]}

Hoeller, Nils ^{[1
]}

机构：

[1] Univ Lubeck, Inst Informat Syst, D-23538 Lubeck, Germany

来源：

2008 3RD INTERNATIONAL CONFERENCE ON INTERNET AND WEB APPLICATIONS AND SERVICES (ICIW 2008) | 2008年

关键词：

D O I：

10.1109/ICIW.2008.58

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A significant fraction of Web data is available only for short periods of time. We consider methods to keep track and to record such dynamic information automatically. The main problems are to find adequate reload times for Web data in order to reduce network traffic, to improve the freshness of obtained data and to reduce the risk of loosing information. Previous approaches usually improve reload strategies for Web data by considering the change dynamics of pages, by modeling the behavior statistically and then by applying suitable reload strategies. Based on this approach we first give a precise definition of data changes on the Web. Page changes are described by a page decomposition which is based on the estimation of grammars. Based on this decomposition segments of Web pages are identified. The change behavior of individual segments is recorded and applied to optimize reload strategies. We show that the completeness of obtained data and the network traffic may be improved significantly by applying our new reload strategy.

引用

页码：416 / 425

页数：10

共 50 条

[1] Social Aspects of Web Page Contents
Kudelka, Milos
Snasel, Vaclav
Horak, Zdenek
Abraham, Ajith
2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS, PROCEEDINGS, 2009, : 80 - +
[2] Improvement of web data clustering using web page contents
Xu, Y
Weng, LT
INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 521 - 530
[3] Chinese web page classification based on text contents
Liang, JZ
ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
[4] DESIGNING AN ENGLISH LEARNING WEB PAGE
Wu Xiaozhen
Teaching English in China, 1999, (03) : 50 - 53
[5] A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations
Sano, Hiroyuki
Swezey, Robin M. E.
Shiramatsu, Shun
Ozono, Tadachika
Shintani, Toramatsu
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (01): : 1 - 6
[6] Evaluation of Search Engine Weight by Considering Repeated Web Page Contents
Zhou, Hui
Li, Chao
Wang, Yimin
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2017, 23 (04): : 589 - 597
[7] Contents Page
Lambert, Richard
POETRY WALES, 2009, 44 (04): : 22 - 22
[8] Semantic knowledge building for image database by analyzing Web page contents
Lai, YK
Liu, S
Chia, LT
Chan, S
2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 1283 - 1286
[9] Web contents extracting for web-based learning
Qiu, Jiangtao
Tang, Changjie
Xu, Kaikuo
Luo, Qian
ADVANCES IN WEB BASED LEARNING - ICWL 2008, PROCEEDINGS, 2008, 5145 : 59 - +
[10] WEB GRAMMARS AND WEB AUTOMATA
EZAWA, Y
ABE, N
MIZUMOTO, M
TOYODA, J
TANAKA, K
ELECTRONICS & COMMUNICATIONS IN JAPAN, 1973, 56 (04): : 35 - 42

← 1 2 3 4 5 →