Chinese document layout analysis using an adaptive regrouping strategy

被引:9
|
作者
Chang, F [1 ]
Chu, SY [1 ]
Chen, CY [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
layout analysis; Chinese documents; adaptive regrouping strategy; reading order; spacing information;
D O I
10.1016/j.patcog.2004.05.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In document layout analysis, the defining conditions for textlines and text regions involve certain numerical parameters (e.g. inter-character spacing and inter-textline spacing) whose values can only be estimated when textlines and text regions have already been formed. This seemingly chicken-and-egg problem can be solved through an adaptive regrouping strategy, which consists of three operations. First, we group basic ingredients into preliminary textlines and text regions according to crude parametric values. Second, we refine our estimate ' of the parametric values based on the groups thus formed. Third, we form new groups by splitting and merging existing groups based on the newly estimated values. This paper applies the above strategy to Chinese documents whose complexity derives from the coexistence of horizontal and vertical textlines. Successful results are obtained using this approach. The accuracy rates for identifying text regions and textlines are above 98% in a test database consisting of over one thousand document samples and various layout structures. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:261 / 271
页数:11
相关论文
共 50 条
  • [1] Adaptive layout analysis of document images
    Malerba, D
    Esposito, F
    Altamura, O
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2002, 2366 : 526 - 534
  • [2] Adaptive document layout
    Jacobs, C
    Li, W
    Schrier, E
    Bargeron, D
    Salesin, D
    [J]. COMMUNICATIONS OF THE ACM, 2004, 47 (08) : 60 - 66
  • [3] Chinese document layout analysis based on adaptive split-and-merge and qualitative spatial reasoning
    Liu, JM
    Tang, YY
    Suen, CY
    [J]. PATTERN RECOGNITION, 1997, 30 (08) : 1265 - 1278
  • [4] Chinese document layout analysis based on texture features
    Wang, Y
    Tian, XD
    Guo, BL
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1722 - 1725
  • [5] Chinese multi-document summarization using adaptive clustering and global search strategy
    Liu, Dexi
    He, Yanxiang
    Ji, Donghong
    Yang, Hua
    Wu, Zhao
    [J]. PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1135 - 1139
  • [6] Word spotting in Chinese document images without layout analysis
    Lu, Y
    Tan, CL
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 57 - 60
  • [7] Document Layout Analysis using Multigaussian Fitting
    Melinda, Laiphangbam
    Ghanapuram, Raghu
    Bhagvati, Chakravarthy
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 747 - 752
  • [8] Domain adaptive learning for document layout analysis and object detection using classifier alignment mechanism
    Mishra, Prerna
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 116
  • [9] A Chinese Document Layout Analysis Based on Non-text Images
    Fu Xiaoling
    Li Xiaofeng
    [J]. 2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 326 - 328
  • [10] Document layout analysis using pattern classification method
    Yamaoka, M
    Iwaki, O
    [J]. IMAGE ANALYSIS APPLICATIONS AND COMPUTER GRAPHICS, 1995, 1024 : 524 - 525