A dataflow approach to efficient change detection of HTML']HTML/XML documents in WebVigiL

被引:2
|
作者
Sanka, Anoop [1 ]
Chamakura, Shravan [1 ]
Chakravarthy, Sharma [1 ]
机构
[1] Univ Texas, Dept Comp Sci & Engn, Arlington, TX 76019 USA
基金
美国国家科学基金会;
关键词
change detection; customized web monitoring; dataflow approach; user-defined profiles; web pages and frames;
D O I
10.1016/j.comnet.2005.10.016
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The burgeoning data on the Web makes it difficult for one to keep track of the changes that constantly occur to specific information of interest. Currently, the most widespread way of detecting changes occurring to Web content is to periodically retrieve the pages of interest and check them for changes. This approach puts the burden on the user and wastes time and resources. Alternatively, systems that detect any change to a page is an overkill as it presents information that may not be relevant. Timeliness of change detection is also an issue in this approach. In this paper, we present a change-monitoring system-WebVigiL-which efficiently monitors user-specified Web pages for customized changes and notifies the user in a timely manner. The focus of this paper is on the dataflow approach used for detecting multiple types of changes to a page and to monitor changes to more then one page at a time. This approach has been optimized to group similar/same specifications to reduce the computation of changes. Multiple changes to the same page as well as to different pages are handled in our approach. As a special case, this includes the monitoring of Web pages containing frames. We also provide the overall architecture and functionality of the WebVigiL system to highlight the role of change detection graph (CDG) which forms the core of the system. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1547 / 1563
页数:17
相关论文
共 46 条
  • [1] WebVigiL: User profile-based change detection for HTML']HTML/XML documents
    Pandrangi, N
    Jacob, J
    Sanka, A
    Chakravarthy, S
    [J]. NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 38 - 57
  • [2] A heuristic approach for converting HTML']HTML documents to XML documents
    Lim, SJ
    Ng, YK
    [J]. COMPUTATIONAL LOGIC - CL 2000, 2000, 1861 : 1182 - 1196
  • [3] A typed representation for HTML']HTML and XML documents in Haskell
    Thiemann, P
    [J]. JOURNAL OF FUNCTIONAL PROGRAMMING, 2002, 12 (4-5) : 435 - 468
  • [4] Change discovery of hierarchically structured, order-sensitive data in HTML']HTML/XML documents
    Lim, S
    Ng, YK
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2004, : 178 - 187
  • [5] Using Semantic-Level Tags in HTML']HTML/XML Documents
    Henschen, Lawrence J.
    Lee, Julia C.
    [J]. UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 683 - 692
  • [6] A resource for transforming HTML']HTML and molfile documents to XML compliant form
    Gkoutos, GV
    Kenway, PR
    Murray-Rust, P
    Rzepa, HS
    Wright, M
    [J]. INTERNET JOURNAL OF CHEMISTRY, 2001, 4 (05):
  • [7] A case-based recognition of semantic structures in HTML']HTML documents - An automated transformation from HTML']HTML to XML
    Umehara, M
    Iwanuma, K
    Nabeshima, H
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 141 - 147
  • [8] An automated change-detection algorithm for HTML']HTML documents based on semantic hierarchies
    Lim, SJ
    Ng, YK
    [J]. 17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 303 - 312
  • [9] Using XML metadata to enable the automatic generation and processing of HTML']HTML FORMS from XML documents
    Dubey, AK
    Chueh, HC
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 894 - 894
  • [10] Charset Encoding Detection of HTML']HTML Documents A Practical Experience
    Faghani, Shabanali
    Hadian, Ali
    Minaei-Bidgoli, Behrouz
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015, 2015, 9460 : 215 - 226