Chinese document layout analysis using an adaptive regrouping strategy

被引:9
|
作者
Chang, F [1 ]
Chu, SY [1 ]
Chen, CY [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
layout analysis; Chinese documents; adaptive regrouping strategy; reading order; spacing information;
D O I
10.1016/j.patcog.2004.05.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In document layout analysis, the defining conditions for textlines and text regions involve certain numerical parameters (e.g. inter-character spacing and inter-textline spacing) whose values can only be estimated when textlines and text regions have already been formed. This seemingly chicken-and-egg problem can be solved through an adaptive regrouping strategy, which consists of three operations. First, we group basic ingredients into preliminary textlines and text regions according to crude parametric values. Second, we refine our estimate ' of the parametric values based on the groups thus formed. Third, we form new groups by splitting and merging existing groups based on the newly estimated values. This paper applies the above strategy to Chinese documents whose complexity derives from the coexistence of horizontal and vertical textlines. Successful results are obtained using this approach. The accuracy rates for identifying text regions and textlines are above 98% in a test database consisting of over one thousand document samples and various layout structures. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:261 / 271
页数:11
相关论文
共 50 条
  • [41] Visual Detection with Context for Document Layout Analysis
    Soto, Carlos X.
    Yoo, Shinjae
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3464 - 3470
  • [42] Document layout analysis based on emergent computation
    Ishitani, Y
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 45 - 50
  • [43] BINYAS: a complex document layout analysis system
    Bhowmik, Showmik
    Kundu, Soumyadeep
    Sarkar, Ram
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8471 - 8504
  • [44] Visual similarity based document layout analysis
    Wen, Di
    Ding, Xiao-Qing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2006, 21 (03) : 459 - 465
  • [45] Document Layout Analysis for Semantic Information Extraction
    Adrian, Weronika T.
    Leone, Nicola
    Manna, Marco
    Marte, Cinzia
    [J]. AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 269 - 281
  • [46] BINYAS: a complex document layout analysis system
    Showmik Bhowmik
    Soumyadeep Kundu
    Ram Sarkar
    [J]. Multimedia Tools and Applications, 2021, 80 : 8471 - 8504
  • [47] Comparative Semantic Document Layout Analysis for Enhanced Document Image Retrieval
    Jaha, Emad Sami
    [J]. IEEE Access, 2024, 12 : 150451 - 150467
  • [48] Layout and Perspective Distortion Independent Recognition of Captured Chinese Document Image
    Wang, Yanwei
    Sun, Yuefang
    Liu, Changsong
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 591 - 596
  • [49] Memory-efficient document layout analysis method using LD-net
    Zhao, Haoyu
    Min, Weidong
    Wang, Qi
    Wei, Zitai
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 4371 - 4386
  • [50] Memory-efficient document layout analysis method using LD-net
    Haoyu Zhao
    Weidong Min
    Qi Wang
    Zitai Wei
    [J]. Multimedia Tools and Applications, 2023, 82 : 4371 - 4386