Extracting content structure for web pages based on visual representation

被引：0

作者：

Cai, D

Yu, SP

Wen, JR

Ma, WY

机构：

[1] Tsinghua Univ, Beijing 100084, Peoples R China

[2] Peking Univ, Beijing 100871, Peoples R China

来源：

WEB TECHNOLOGIES AND APPLICATIONS | 2003年 / 2642卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results.

引用

页码：406 / 417

页数：12

共 50 条

[1] Extracting News Content with Visual Unit of Web Pages
Zhu, Wenhao
Dai, Song
Song, Yang
Lu, Zhiguo
[J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 211 - 215
[2] Extracting Content for News Web Pages based on DOM
Geng, Hua
Gao, Qiang
Pan, Jingui
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (02): : 124 - 129
[3] Extracting Topic Maps from Web Pages by Web Link Structure and Content
Mase, Motohiro
Yamada, Seiji
Nitta, Katsumi
[J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 1232 - +
[4] Improving the web text content by extracting significant pages into a Web Site
Ríos, SA
Velásquez, JD
Vera, ES
Yasuda, H
Aoki, T
[J]. 5th International Conference on Intelligent Systems Design and Applications, Proceedings, 2005, : 32 - 36
[5] The Role of Structure and Content in Perception of Visual Similarity Between Web Pages
Song, Guangfeng
[J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2011, 27 (08) : 793 - 816
[6] Robin: Extracting visual and textual features from web pages
Oka, M
Tsukada, H
Kato, K
[J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 765 - 771
[7] LBDA: A NOVEL FRAMEWORK FOR EXTRACTING CONTENT FROM WEB PAGES
Vijendran, Anna Saro
Deepa, C.
[J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2013,
[8] A hybrid approach for extracting informative content from web pages
Uzun, Erdinc
Agun, Hayri Volkan
Yerlikaya, Tarik
[J]. INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (04) : 928 - 944
[9] EXTRACTING THE SEMANTIC CONTENT OF WEB PAGES VIA REPEATED STRUCTURES
He, Zheng
Luo, Hangzai
Fan, Jianping
Liu, Xiao
[J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
[10] A Rule Based DFA Driven Information Extractor for Content Extracting from Web Pages
Liu, Jin
Chu, Danliang
Song, Junjie
Zhong, Bei
Cai, Biqi
[J]. INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 482 - 488

← 1 2 3 4 5 →