Exploiting Stroke Orientation for CRF based Binarization of Historical Documents

被引:5
|
作者
Peng, Xujun [1 ]
Cao, Huaigu [1 ]
Subramanian, Krishna [1 ]
Prasad, Rohit [1 ]
Natarajan, Prem [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
关键词
D O I
10.1109/ICDAR.2013.207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel binarization method that is especially effective on historical documents with the following characteristics: (a) the documents contain free-form cursive handwritten text with significant but consistent slant, (b) scanning artifacts resulting in the text and background pixels not having uniform intensity even within the same page, and (c) pages containing significant amount of bleeds from the other side of the page. In order to tackle the problem of non-uniform text and background intensity, we use a thresholding algorithm that works equally well for regions of the page containing text and regions of the page containing no text. We then combine this algorithm with a CRF-based framework which handles bleeds using a novel approach to further improve the quality of binarization. We compare the proposed binarization algorithm against other popular binarization algorithms both qualitatively using examples and quantitatively using the word error rate (WER) metric from performing optical character recognition (OCR) on binarized text using the BBN Byblos Offline Handwritten text recognition (OHR) system.
引用
收藏
页码:1034 / 1038
页数:5
相关论文
共 50 条
  • [1] A New Binarization Algorithm for Historical Documents
    Almeida, Marcos
    Lins, Rafael Dueire
    Bernardino, Rodrigo
    Jesus, Darlisson
    Lima, Bruno
    JOURNAL OF IMAGING, 2018, 4 (02)
  • [2] Hybrid Binarization Method for Historical Handwritten Documents
    Asatryan, D. G.
    Haroutunian, M. E.
    Sazhumyan, G. S.
    Kupriyanov, A. V.
    Paringer, R. A.
    Kirsh, D. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (SUPPL 1) : S45 - S50
  • [3] A comparison of binarization methods for historical archive documents
    He, J
    Do, QDM
    Downton, AC
    Kim, JH
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 538 - 542
  • [4] An Evaluation Survey of Binarization Algorithms on Historical Documents
    Stathis, Pavlos
    Kavallieratou, Ergina
    Papamarkos, Nikos
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1953 - +
  • [5] Hybrid Binarization Method for Historical Handwritten Documents
    D. G. Asatryan
    M. E. Haroutunian
    G. S. Sazhumyan
    A. V. Kupriyanov
    R. A. Paringer
    D. V. Kirsh
    Programming and Computer Software, 2023, 49 : S45 - S50
  • [6] An adaptive binarization technique for low quality historical documents
    Gatos, B
    Pratikakis, I
    Perantonis, SJ
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 102 - 113
  • [7] An analysis of the transition proportion for binarization in handwritten historical documents
    Ramirez-Ortegon, Marte A.
    Ramirez-Ramirez, Lilia L.
    Maergner, Volker
    Ben Messaoud, Ines
    Cuevas, Erik
    Rojas, Raul
    PATTERN RECOGNITION, 2014, 47 (08) : 2635 - 2651
  • [8] Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents
    Itay Bar-Yosef
    Isaac Beckman
    Klara Kedem
    Itshak Dinstein
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 9 : 89 - 99
  • [9] A Robust Multi Stage Technique for Image Binarization of Degraded Historical Documents
    Boudraa, Omar
    Hidouci, Walid Khaled
    Michelucci, Dominique
    2017 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING - BOUMERDES (ICEE-B), 2017,
  • [10] A Morphology based Approach for Binarization of Handwritten Documents
    Papavassiliou, Vassilis
    Simistira, Fotini
    Katsouros, Vassilis
    Carayannis, George
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 577 - 581