Empirical performance evaluation of page segmentation algorithms

被引:0
|
作者
Mao, S [1 ]
Kanungo, T [1 ]
机构
[1] Univ Maryland, Language & Media Proc Lab, Ctr Automat Res, College Pk, MD 20742 USA
来源
关键词
document page segmentation; OCR; comparative evaluation; performance metric; X-Y cut; Docstrum; Voronoi diagram; performance evaluation; statistical significance; paired model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) system. While numerous segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation - empirical or theoretical - of these algorithms. We use the following five step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First we create mutually exclusive training and test dataset with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to automatically search for the optimal parameter values of the segmentation algorithms, 4) the segmentation algorithms are then evaluated on the test dataset, and finally 5) a statistical error analysis is performed to give the statistical significance of the experimental results. We apply this methodology to five segmentation algorithms, three of which are representative research algorithms and the rest two are well-known commercial products. The three research algorithms evaluated are: Nagy's X-Y cut, O'Gorman's Docstrum and Kise's Voronoi-diagram-based algorithm. The two commercial products evaluated are: Caere Corporation's segmentation algorithm and ScanSoft Corporation's segmentation algorithm. The evaluations are conducted on 978 images from the University of Washington III dataset. It is found that the performance of the Voronoi-based, Docstrum and Caere's segmentation algorithms are not significantly different from each other, but they are significantly better than ScanSoft's segmentation algorithm, which in turn is significantly better than the performance of the X-Y cut algorithm. Furthermore, we see that the commercial segmentation algorithms and research segmentation algorithms have comparable performances.
引用
收藏
页码:303 / 314
页数:12
相关论文
共 50 条
  • [21] PERFORMANCE OF REPLACEMENT ALGORITHMS WITH DIFFERENT PAGE SIZES
    CHU, WW
    OPDERBECK, H
    COMPUTER, 1974, 7 (11) : 14 - 21
  • [22] Extending Page Segmentation Algorithms for Mixed-Layout Document Processing
    Winder, Amy
    Andersen, Tim
    Smith, Elisa H. Barney
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1245 - 1249
  • [23] Web Page Segmentation Revisited: Evaluation Framework and Dataset
    Kiesel, Johannes
    Kneist, Florian
    Meyer, Lars
    Komlossy, Kristof
    Stein, Benno
    Potthast, Martin
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3047 - 3054
  • [24] An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers
    Liebl, Bernhard
    Burghardt, Manuel
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5153 - 5160
  • [25] Software architecture of PSET: A page segmentation evaluation toolkit
    Mao S.
    Kanungo T.
    International Journal on Document Analysis and Recognition, 2002, 4 (3) : 205 - 217
  • [26] PERFORMANCE EVALUATION OF REGION-GROWING BASED SEGMENTATION ALGORITHMS FOR SEGMENTING THE AORTA
    Rahman, Hussain
    Din, Fakhrud
    Rahmana, Sami Ur
    Sehatullah
    JURNAL TEKNOLOGI, 2016, 78 (4-3): : 9 - 15
  • [27] Performance bounds of algorithms for scheduling advertisements on a web page
    Dawande, M
    Kumar, S
    Sriskandarajah, C
    JOURNAL OF SCHEDULING, 2003, 6 (04) : 373 - 393
  • [28] An empirical comparison of predictive models for web page performance
    Ramakrishnan, Raghu
    Kaur, Arvinder
    INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 123
  • [29] Performance Bounds of Algorithms for Scheduling Advertisements on a Web Page
    Milind Dawande
    Subodha Kumar
    Chelliah Sriskandarajah
    Journal of Scheduling, 2003, 6 : 373 - 394
  • [30] Evaluation of segmentation algorithms for medical imaging
    Fenster, Aaron
    Chiu, Bernard
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 7186 - 7189