Automatic fragment detection in dynamic Web pages and its impact on caching

被引:31
|
作者
Ramaswamy, L
Iyengar, A
Liu, L
Douglis, F
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[2] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
基金
美国国家科学基金会;
关键词
dynamic content caching; fragment-based caching; fragment detection;
D O I
10.1109/TKDE.2005.89
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Constructing Web pages from fragments has been shown to provide significant benefits for both content generation and caching. In order for a Web site to use fragment-based content generation, however, good methods are needed for fragmenting the Web pages. Manual fragmentation of Web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in Web sites serving dynamic content. Our approach analyzes Web pages with respect to their information sharing behavior, personalization characteristics, and change patterns. We identify fragments which are shared among multiple documents or have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a framework for fragment detection, which includes a hierarchical and fragment-aware model for dynamic Web pages and a compact and effective data structure for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. This paper shows the results when the algorithms are applied to real Web sites. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of using the fragments detected by our system on key parameters such as disk space utilization, network bandwidth consumption, and load on the origin servers.
引用
收藏
页码:859 / 874
页数:16
相关论文
共 50 条
  • [1] Caching personalized and database-related dynamic web pages
    Chang, Yeim-Kuan
    Lin, Yu-Ren
    Ting, Yi-Wei
    NAS: 2006 INTERNATIONAL WORKSHOP ON NETWORKING, ARCHITECTURE, AND STORAGES, PROCEEDINGS, 2006, : 149 - +
  • [2] Finding shared fragments in large collections of web pages for fragment-based web caching
    Ma, Junchang
    Gu, Zhimin
    NCA 2006: FIFTH IEEE INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS, PROCEEDINGS, 2006, : 251 - +
  • [3] Model for Efficient Delivery of Dynamic Web Pages with Automatic Detection of Shared Fragments
    Zhang, Lingli
    2013 22ND WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC 2013), 2013, : 475 - 480
  • [4] Automatic template detection for structured web pages
    Lo, Lawrence
    Ng, Vincent To-Yee
    Ng, Patrick
    Chan, Stephen C. F.
    2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 708 - 713
  • [5] Automatic data record detection in Web Pages
    Gao, Xiaoying
    Vuong, Le Phong Bao
    Zhang, Mengjie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 349 - +
  • [6] Automatic detection of shared fragments in large collections of web pages and its applications
    Gu, Zhimin
    Ma, Junchang
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2007, 1 (02) : 215 - 250
  • [7] Dynamic Web pages: performance impact on Web servers
    Kothari, B
    Claypool, M
    INTERNET RESEARCH-ELECTRONIC NETWORKING APPLICATIONS AND POLICY, 2001, 11 (01): : 18 - 25
  • [8] Caching of dynamic pages based on HTTP
    Cao, Bin
    Zhang, Xia
    Liu, Jiren
    Dongbei Daxue Xuebao/Journal of Northeastern University, 1999, 20 (02): : 114 - 117
  • [9] A configuration tool for caching dynamic pages
    Chabbouh, I
    Makpangou, M
    WEB CONTENT CACHING AND DISTRIBUTION, PROCEEDINGS, 2004, 3293 : 219 - 231
  • [10] Automatic Role Detection of Visual Elements of Web Pages for Automatic Accessibility Evaluation
    Duarte, Carlos
    Salvado, Ana
    Akpinar, M. Elgin
    Yesilada, Yeliz
    Carrico, Luis
    15TH INTERNATIONAL WEB FOR ALL CONFERENCE (W4A) 2018, 2018,