Empirical performance evaluation of page segmentation algorithms

被引:0
|
作者
Mao, S [1 ]
Kanungo, T [1 ]
机构
[1] Univ Maryland, Language & Media Proc Lab, Ctr Automat Res, College Pk, MD 20742 USA
来源
关键词
document page segmentation; OCR; comparative evaluation; performance metric; X-Y cut; Docstrum; Voronoi diagram; performance evaluation; statistical significance; paired model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) system. While numerous segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation - empirical or theoretical - of these algorithms. We use the following five step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First we create mutually exclusive training and test dataset with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to automatically search for the optimal parameter values of the segmentation algorithms, 4) the segmentation algorithms are then evaluated on the test dataset, and finally 5) a statistical error analysis is performed to give the statistical significance of the experimental results. We apply this methodology to five segmentation algorithms, three of which are representative research algorithms and the rest two are well-known commercial products. The three research algorithms evaluated are: Nagy's X-Y cut, O'Gorman's Docstrum and Kise's Voronoi-diagram-based algorithm. The two commercial products evaluated are: Caere Corporation's segmentation algorithm and ScanSoft Corporation's segmentation algorithm. The evaluations are conducted on 978 images from the University of Washington III dataset. It is found that the performance of the Voronoi-based, Docstrum and Caere's segmentation algorithms are not significantly different from each other, but they are significantly better than ScanSoft's segmentation algorithm, which in turn is significantly better than the performance of the X-Y cut algorithm. Furthermore, we see that the commercial segmentation algorithms and research segmentation algorithms have comparable performances.
引用
收藏
页码:303 / 314
页数:12
相关论文
共 50 条
  • [1] Empirical performance evaluation methodology and its application to page segmentation algorithms
    Mao, S
    Kanungo, T
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) : 242 - 256
  • [2] Performance evaluation and benchmarking of six-page segmentation algorithms
    Shafait, Faisal
    Keysers, Daniel
    Breuel, Thomas M.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (06) : 941 - 954
  • [3] Performance comparison of six algorithms for page segmentation
    Shafait, F
    Keysers, D
    Breuel, TM
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 368 - 379
  • [4] Empirical Evaluation of Segmentation Algorithms for Lung Modelling
    Lee, S. L. A.
    Kouzani, A. Z.
    Hu, E. J.
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 719 - 724
  • [5] Performance Evaluation of Crop Segmentation Algorithms
    Li, Yanan
    Huang, Ziyun
    Cao, Zhiguo
    Lu, Hao
    Wang, Haihui
    Zhang, Shuiping
    IEEE ACCESS, 2020, 8 : 36210 - 36225
  • [6] An automatic performance evaluation method for document page segmentation
    Peng, LR
    Chen, M
    Liu, CS
    Ding, XQ
    Zheng, JR
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 134 - 137
  • [7] Combination of OCR engines for page segmentation based on performance evaluation
    Ferrer, Miquel
    Valveny, Ernest
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 784 - 788
  • [8] Web Page Segmentation Evaluation
    Sanoja, Andres
    Gancarski, Stephane
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 753 - 760
  • [9] Goal-Oriented Performance Evaluation Methodology for Page Segmentation Techniques
    Stamatopoulos, Nikolaos
    Louloudis, Georgios
    Gatos, Basilis
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 281 - 285
  • [10] Optimal selection of segmentation algorithms based on performance evaluation
    Zhang, YJ
    Luo, HT
    OPTICAL ENGINEERING, 2000, 39 (06) : 1450 - 1456