Extracting content structure for web pages based on visual representation

被引:0
|
作者
Cai, D
Yu, SP
Wen, JR
Ma, WY
机构
[1] Tsinghua Univ, Beijing 100084, Peoples R China
[2] Peking Univ, Beijing 100871, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results.
引用
收藏
页码:406 / 417
页数:12
相关论文
共 50 条
  • [31] Wordrank: A method for ranking web pages based on content similarity
    Kritikopoulos, Apostolos
    Sideri, Martha
    Varlamis, Iraklis
    [J]. WORKSHOPS OF THE TWENTY FOURTH BRITISH NATIONAL CONFERENCE ON DATABASES, WORKSHOP PROCEEDINGS, 2007, : 92 - +
  • [32] A Visual Technique for Web Pages Comparison
    Alpuente, Maria
    Romero, Daniel
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2009, 235 : 3 - 18
  • [33] Visual literacy and the design of Web pages
    Surprenant, TT
    Blake, VL
    [J]. IOLS '97: INTEGRATED ONLINE LIBRARY SYSTEMS, PROCEEDINGS - 1997: EXPANDING EXPECTATIONS, 1997, : 131 - 143
  • [34] CLASSIFYING WEB PAGES WITH VISUAL FEATURES
    de Boer, Viktor
    van Someren, Maarten
    Lupascu, Tiberiu
    [J]. WEBIST 2010: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGY, VOL 1, 2010, : 245 - 252
  • [35] Measuring the Visual Complexities of Web Pages
    Wu, Ou
    Hu, Weiming
    Shi, Lei
    [J]. ACM TRANSACTIONS ON THE WEB, 2013, 7 (01)
  • [36] Similarity among web pages based on their link structure
    Quevedo, JU
    Huang, SHS
    [J]. IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 232 - 238
  • [37] Clustering Web Pages Based on Structure and Style Similarity
    Gowda, Thamme
    Mattmann, Chris
    [J]. PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 175 - 180
  • [38] Universal Web Pages Content Parser
    Pawlas, Piotr
    Domanski, Adam
    Domanska, Joanna
    [J]. COMPUTER NETWORKS, 2012, 291 : 130 - 138
  • [39] Cleaning web pages for effective web content mining
    Li, Jing
    Ezeife, C. I.
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 560 - 571
  • [40] An Improved VIPS-based Algorithm of Extracting Web Content
    Li, Long
    Zhou, Anmin
    Fang, Yong
    Liu, Liang
    Wu, Qian
    [J]. MATERIAL SCIENCE, CIVIL ENGINEERING AND ARCHITECTURE SCIENCE, MECHANICAL ENGINEERING AND MANUFACTURING TECHNOLOGY II, 2014, 651-653 : 1806 - 1810