Heading-based sectional hierarchy identification for HTML']HTML documents

被引:0
|
作者
Pembe, F. Canan [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of the documents found on the Web are prepared in HTML format which was basically designed for presentation of data. As a result, some limitations are encountered when these documents are accessed automatically for a semantic interpretation of their content. One such inadequacy is in representing the sectional hierarchy (i.e. sections and subsections) of these documents and the headings in this hierarchy. Automatically obtaining this information is a difficult task due to the underlying format and the cluttered structure encountered in most of the Web pages. In this paper, we propose a novel approach to extract heading-based sectional hierarchies of HTML documents. This is the first part of the research, where we aim to use this information in automatic summaries to improve Web search experience of Internet users.
引用
收藏
页码:75 / 80
页数:6
相关论文
共 50 条
  • [21] Using Combined List Hierarchy and Headings of HTML']HTML Documents for Learning Domain-Specific Ontology
    Raza, Muhammad Ahsan
    Raza, Binish
    Jabeen, Taiba
    Raza, Sehrish
    Abbas, Munnawar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (04) : 233 - 239
  • [22] Generating structured documents from HTML']HTML tables
    Kim, Yeon-Seok
    Lee, Kyong-Ho
    2006 INTERNATIONAL CONFERENCE ON HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2006, : 605 - +
  • [23] Automatic discovery of semantic structures in HTML']HTML documents
    Mukherjee, S
    Yang, GZ
    Tan, WF
    Ramakrishnan, IV
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 245 - 249
  • [24] From HTML']HTML documents to web tables and rules
    Simon, Kai
    Lausen, Georg
    Boley, Harold
    2006 ICEC: EIGHTH INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE, PROCEEDINGS: THE NEW E-COMMERCE: INNOVATIONS FOR CONQUERING CURRENT BARRIERS, OBSTACLES AND LIMITATIONS TO CONDUCTING SUCCESSFUL BUSINESS ON THE INTERNET, 2006, : 125 - 131
  • [25] Hierarchies in HTML']HTML documents: Linking text to concepts
    Burget, R
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 186 - 190
  • [26] USING COOLLISTS TO INDEX HTML']HTML DOCUMENTS IN THE WEB
    LIM, JG
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2): : 147 - 154
  • [27] Using the structure of HTML']HTML documents to improve retrieval
    Cutler, M
    Shih, YM
    Meng, WY
    PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251
  • [28] A concurrent neural classifier for HTML']HTML documents retrieval
    Pilato, G
    Vitabile, S
    Vassallo, G
    Conti, V
    Sorbello, F
    NEURAL NETS, 2003, 2859 : 210 - 217
  • [29] A typed representation for HTML']HTML and XML documents in Haskell
    Thiemann, P
    JOURNAL OF FUNCTIONAL PROGRAMMING, 2002, 12 (4-5) : 435 - 468
  • [30] A semi-automatic indexing system based on embedded information in HTML']HTML documents
    Vallez, Mari
    Pedraza-Jimenez, Rafael
    Codina, Lluis
    Blanco, Saul
    Rovira, Cristofol
    LIBRARY HI TECH, 2015, 33 (02) : 195 - 210