Performance evaluation and benchmarking of six-page segmentation algorithms

被引:113
|
作者
Shafait, Faisal [1 ]
Keysers, Daniel [1 ]
Breuel, Thomas M. [2 ]
机构
[1] DFKI GmbH, German Res Ctr Artificial Intelligence, Image Understanding & Pattern Recognit Res Grp, D-67663 Kaiserslautern, Germany
[2] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany
关键词
document page segmentation; OCR; performance evaluation; performance metric;
D O I
10.1109/TPAMI.2007.70837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify some classes of serious segmentation errors altogether. This paper introduces a vectorial score that is sensitive to, and identifies, the most important classes of segmentation errors (over, under, and mis-segmentation) and what page components (lines, blocks, etc.) are affected. Unlike previous schemes, our evaluation method has a canonical representation of ground-truth data and guarantees pixel-accurate evaluation results for arbitrary region shapes. We present the results of evaluating widely used segmentation algorithms (x-y cut, smearing, whitespace analysis, constrained text-line finding, docstrum, and Voronoi) on the UW-III database and demonstrate that the new evaluation scheme permits the identification of several specific flaws in individual segmentation methods.
引用
收藏
页码:941 / 954
页数:14
相关论文
共 50 条
  • [1] Performance comparison of six algorithms for page segmentation
    Shafait, F
    Keysers, D
    Breuel, TM
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 368 - 379
  • [2] Empirical performance evaluation of page segmentation algorithms
    Mao, S
    Kanungo, T
    DOCUMENT RECOGNITION AND RETRIEVAL VII, 2000, 3967 : 303 - 314
  • [3] Empirical performance evaluation methodology and its application to page segmentation algorithms
    Mao, S
    Kanungo, T
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) : 242 - 256
  • [4] Benchmarking of document page segmentation
    Agne, S
    Rogger, M
    Rohrschneider, J
    DOCUMENT RECOGNITION AND RETRIEVAL VII, 2000, 3967 : 165 - 171
  • [5] A six-page round-up of news from around the world
    Anon
    Petroleum Economist, 2002, 69 (10):
  • [6] A six-page round-up of news from around the world
    Anon
    Petroleum Economist, 2002, 69 (09):
  • [8] Benchmarking Image Segmentation Algorithms
    Estrada, Francisco J.
    Jepson, Allan D.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 85 (02) : 167 - 181
  • [9] Benchmarking Image Segmentation Algorithms
    Francisco J. Estrada
    Allan D. Jepson
    International Journal of Computer Vision, 2009, 85 : 167 - 181
  • [10] Performance Evaluation of Crop Segmentation Algorithms
    Li, Yanan
    Huang, Ziyun
    Cao, Zhiguo
    Lu, Hao
    Wang, Haihui
    Zhang, Shuiping
    IEEE ACCESS, 2020, 8 : 36210 - 36225