Heading-based sectional hierarchy identification for HTML']HTML documents

被引:0
|
作者
Pembe, F. Canan [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of the documents found on the Web are prepared in HTML format which was basically designed for presentation of data. As a result, some limitations are encountered when these documents are accessed automatically for a semantic interpretation of their content. One such inadequacy is in representing the sectional hierarchy (i.e. sections and subsections) of these documents and the headings in this hierarchy. Automatically obtaining this information is a difficult task due to the underlying format and the cluttered structure encountered in most of the Web pages. In this paper, we propose a novel approach to extract heading-based sectional hierarchies of HTML documents. This is the first part of the research, where we aim to use this information in automatic summaries to improve Web search experience of Internet users.
引用
收藏
页码:75 / 80
页数:6
相关论文
共 50 条
  • [31] WebVigiL: User profile-based change detection for HTML']HTML/XML documents
    Pandrangi, N
    Jacob, J
    Sanka, A
    Chakravarthy, S
    NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 38 - 57
  • [32] A fuzzy representation of HTML']HTML documents for information retrieval systems
    Molinari, A
    Pasi, G
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 107 - 112
  • [33] An automated change-detection algorithm for HTML']HTML documents based on semantic hierarchies
    Lim, SJ
    Ng, YK
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 303 - 312
  • [34] Charset Encoding Detection of HTML']HTML Documents A Practical Experience
    Faghani, Shabanali
    Hadian, Ali
    Minaei-Bidgoli, Behrouz
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015, 2015, 9460 : 215 - 226
  • [35] The hwriter Package Composing HTML']HTML documents with R objects
    Pau, Gregoire
    Huber, Wolfgang
    R JOURNAL, 2009, 1 (01): : 22 - 24
  • [36] Categorizing and extracting information from multilingual HTML']HTML documents
    Lim, SJ
    Ng, YK
    9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 415 - 422
  • [37] Toward a retrieval of HTML']HTML documents using a semantic approach
    Ferri, F
    Ghiselli, C
    Grifoni, P
    Padula, M
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1571 - 1574
  • [38] Visual Analytics presentation tools applied in HTML']HTML documents
    Jern, Mikael
    Rogstadius, Jakob
    Astroem, Tobias
    Ynnerman, Anders
    PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION, 2008, : 200 - 207
  • [39] Fine-grained transclusions of multimedia documents in HTML']HTML
    Kolbitsch, J
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2005, 11 (06) : 926 - 943
  • [40] Static Validation of Dynamically Generated HTML']HTML Documents Based on Abstract Parsing and Semantic Processing
    Kim, Hyunha
    Doh, Kyung-Goo
    Schmidt, David A.
    STATIC ANALYSIS, SAS 2013, 2013, 7935 : 194 - 214