A fast algorithm for bottom-up document layout analysis

被引:85
|
作者
Simon, A
Pret, JC
Johnson, AP
机构
[1] Institute for Computer Applications in Molecular Sciences, School of Chemistry, University of Leeds, Leeds
关键词
document analysis; physical page layout; bottom-up layout analysis; Kruskal's algorithm; spanning tree; chemical documents;
D O I
10.1109/34.584106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the GLIDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html) but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.
引用
收藏
页码:273 / 277
页数:5
相关论文
共 50 条
  • [21] A bottom-up algorithm of vertical assembling concept lattices
    Zhang, Lei
    Zhang, Hongli
    Shen, Xiajiong
    Yin, Lihua
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2013, 7 (03) : 229 - 244
  • [22] A bottom-up approach to sentence ordering for multi-document summarization
    Bollegala, Danushka
    Okazaki, Naoaki
    Ishizuka, Mitsuru
    INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (01) : 89 - 109
  • [23] Long Document Summarization with Top-down and Bottom-up Inference
    Pang, Bo
    Nijkamp, Erik
    Kryscinski, Wojciech
    Savarese, Silvio
    Zhou, Yingbo
    Xiong, Caiming
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1267 - 1284
  • [24] A Bottom-up Approach to Sentence Ordering for Multi-document Summarization
    Bollegala, Danushka
    Okazaki, Naoaki
    Ishizuka, Mitsuru
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 385 - 392
  • [25] A Bottom-Up Method and Fast Algorithms for MAX INDEPENDENT SET
    Bourgeois, Nicolas
    Escoffier, Bruno
    Paschos, Vangelis Th.
    van Rooij, Johan M. M.
    ALGORITHM THEORY - SWAT 2010, PROCEEDINGS, 2010, 6139 : 62 - +
  • [26] A Fast Bottom-Up Approach to Identify the Congested Network Links
    Su, Haibo
    Lin, Shijun
    Li, Yong
    Su, Li
    Jin, Depeng
    Zeng, Lieguang
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2010, E93B (03) : 741 - 744
  • [27] Bottom-Up Shape Analysis using LISF
    Gulavani, Bhargav S.
    Chakraborty, Supratik
    Ramalingam, G.
    Nori, Aditya V.
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2011, 33 (05):
  • [28] Bottom-up mergesort. A detailed analysis
    Panny, W.
    Prodinger, H.
    Algorithmica (New York), 1995, 14 (04):
  • [29] Protein Analysis by Shotgun/Bottom-up Proteomics
    Zhang, Yaoyang
    Fonslow, Bryan R.
    Shan, Bing
    Baek, Moon-Chang
    Yates, John R., III
    CHEMICAL REVIEWS, 2013, 113 (04) : 2343 - 2394
  • [30] Pipelining Bottom-up Data Flow Analysis
    Shi, Qingkai
    Zhang, Charles
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 835 - 847