Automatic generation of structured hyperdocuments from document images

被引:4
|
作者
Lee, JY
Park, JS
Byun, H
Moon, J
Lee, SW [1 ]
机构
[1] Korea Univ, Ctr Artificial Vis Res, Dept Comp Sci & Engn, Seongbuk Ku, Seoul 136701, South Korea
[2] Yonsei Univ, Dept Comp Sci, Seodaemoon Ku, Seoul 120749, South Korea
[3] Korea Univ, Dept Elect & Informat Engn, Chungnam 339800, South Korea
关键词
structured hyperdocument; multi-column document; document conversion; document image understanding; logical structure analysis;
D O I
10.1016/S0031-3203(01)00026-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents. (C) 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:485 / 503
页数:19
相关论文
共 50 条
  • [1] Automatic generation of structured hyperdocuments from multi-column document images
    Lee, JY
    Choi, SH
    Lee, SW
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 422 - 425
  • [2] AUTOMATIC DETECTION OF REFERENCE ELEMENTS ON SEMI-STRUCTURED DOCUMENT IMAGES
    Lanin, Mikhail
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2014, 30 (04): : 17 - 23
  • [3] Structured Document Generation—A Cornerstone of Document Management
    Stephen Lander
    Drug information journal : DIJ / Drug Information Association, 1998, 32 : 757 - 760
  • [4] Structured document generation - A cornerstone of document management
    Lander, S
    DRUG INFORMATION JOURNAL, 1998, 32 (03): : 757 - 760
  • [5] Automatic name extraction from degraded document images
    Laurence Likforman-Sulem
    Pascal Vaillant
    Aliette de Bodard de la Jacopière
    Pattern Analysis and Applications, 2006, 9 : 211 - 227
  • [6] Automatic keyword extraction from historical document images
    Terasawa, K
    Nagasaki, T
    Kawashima, T
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
  • [7] Automatic name extraction from degraded document images
    Likforman-Sulem, Laurence
    Vaillant, Pascal
    de la Jacopiere, Aliette de Bodard
    PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227
  • [8] An automatic histogram detection and information extraction from document images
    P. H. Anagha
    A. Baskar
    International Journal of Speech Technology, 2021, 24 : 77 - 85
  • [9] An automatic histogram detection and information extraction from document images
    Anagha, P. H.
    Baskar, A.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 77 - 85
  • [10] STRUCTURED MESH GENERATION FROM DOPPLER ULTRASOUND IMAGES
    Chaves, Rui
    Sousa, Luisa C.
    Castro, Catarina F.
    Antonio, Carlos C.
    Santos, Rosa
    Castro, Pedro
    Azevedo, Elsa
    ICEM15: 15TH INTERNATIONAL CONFERENCE ON EXPERIMENTAL MECHANICS, 2012,