Heading-based sectional hierarchy identification for HTML']HTML documents

被引：0

作者：

Pembe, F. Canan ^{[1
]}

Gungor, Tunga ^{[1
]}

机构：

[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey

来源：

2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES | 2007年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most of the documents found on the Web are prepared in HTML format which was basically designed for presentation of data. As a result, some limitations are encountered when these documents are accessed automatically for a semantic interpretation of their content. One such inadequacy is in representing the sectional hierarchy (i.e. sections and subsections) of these documents and the headings in this hierarchy. Automatically obtaining this information is a difficult task due to the underlying format and the cluttered structure encountered in most of the Web pages. In this paper, we propose a novel approach to extract heading-based sectional hierarchies of HTML documents. This is the first part of the research, where we aim to use this information in automatic summaries to improve Web search experience of Internet users.

引用

页码：75 / 80

页数：6

共 50 条

[41] Contextual weighted representations and indexing models for the retrieval of HTML']HTML documents
Pereira, RAM
Molinari, A
Pasi, G
SOFT COMPUTING, 2005, 9 (07) : 481 - 492
[42] Using Semantic-Level Tags in HTML']HTML/XML Documents
Henschen, Lawrence J.
Lee, Julia C.
UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 683 - 692
[43] Digital architectures: SGML, HTML']HTML, multimedia and the structure of electronic documents
Heba, GM
STC 1996 PROCEEDINGS - 43RD ANNUAL CONFERENCE: EVOLUTION/REVOLUTION, 1996, : 213 - 216
[44] Classification of HTML']HTML documents by Hidden Tree-Markov Models
Diligenti, M
Gori, M
Maggini, M
Scarselli, F
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 849 - 853
[45] A resource for transforming HTML']HTML and molfile documents to XML compliant form
Gkoutos, GV
Kenway, PR
Murray-Rust, P
Rzepa, HS
Wright, M
INTERNET JOURNAL OF CHEMISTRY, 2001, 4 (05):
[46] Effectively retrieve HTML documents
Liu, Fang
Lu, Zhengding
Xiaoxing Weixing Jisuanji Xitong/Mini-Micro Systems, 2000, 21 (09): : 986 - 988
[47] Bootstrapping semantic annotation for content-rich HTML']HTML documents
Mukherjee, S
Ramakrishnan, IV
Singh, A
ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 583 - 593
[48] Study on Text Information Extraction Model and Algorithm of HTML']HTML Documents
Li Chunyan
Jiang Ilaiyang
PROCEEDINGS OF 2010 CROSS-STRAIT CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY, 2010, : 399 - 403
[49] An integrated system of mining HTML']HTML texts and filtering structured documents
Yun, BH
Lim, ME
Park, SH
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 350 - 355
[50] STRUCTURING DOCUMENTS WITH NEW HTML']HTML5 SEMANTIC ELEMENTS
Fulanovic, Bojan
Kucak, Danijell
Djambic, Goran
ANNALS OF DAAAM FOR 2012 & PROCEEDINGS OF THE 23RD INTERNATIONAL DAAAM SYMPOSIUM - INTELLIGENT MANUFACTURING AND AUTOMATION - FOCUS ON SUSTAINABILITY, 2012, 23 : 723 - 726

← 1 2 3 4 5 →