Exploiting Stroke Orientation for CRF based Binarization of Historical Documents

被引:5
|
作者
Peng, Xujun [1 ]
Cao, Huaigu [1 ]
Subramanian, Krishna [1 ]
Prasad, Rohit [1 ]
Natarajan, Prem [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
关键词
D O I
10.1109/ICDAR.2013.207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel binarization method that is especially effective on historical documents with the following characteristics: (a) the documents contain free-form cursive handwritten text with significant but consistent slant, (b) scanning artifacts resulting in the text and background pixels not having uniform intensity even within the same page, and (c) pages containing significant amount of bleeds from the other side of the page. In order to tackle the problem of non-uniform text and background intensity, we use a thresholding algorithm that works equally well for regions of the page containing text and regions of the page containing no text. We then combine this algorithm with a CRF-based framework which handles bleeds using a novel approach to further improve the quality of binarization. We compare the proposed binarization algorithm against other popular binarization algorithms both qualitatively using examples and quantitatively using the word error rate (WER) metric from performing optical character recognition (OCR) on binarized text using the BBN Byblos Offline Handwritten text recognition (OHR) system.
引用
收藏
页码:1034 / 1038
页数:5
相关论文
共 50 条
  • [21] A Polar Stroke Descriptor for Classification of Historical Documents
    He, Sheng
    Schomaker, Lambert
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 6 - 10
  • [22] Document image binarization based on stroke enhancement
    Zhu, Yuanping
    Wang, Chunheng
    Dai, Ruwei
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 955 - +
  • [23] A binarization method for scanned documents based on hidden markov model
    Huang, Songtao
    Sid-Ahmed, M. A.
    Ahmadi, Majid
    El-Feghi, Idris
    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 4309 - +
  • [24] A VOTING APPROACH FOR IMAGE BINARIZATION OF TEXT-BASED DOCUMENTS
    Boiangiu, Costin-Anton
    Vlasceanu, Giorgiana Violeta
    Atanasiu, Alexandru Marian
    Damian, Petrisor Alin
    Panaitescu, Cristian
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2019, 81 (03): : 53 - 64
  • [25] Region Based Local Binarization Approach for Handwritten Ancient Documents
    Ben Messaoud, Ines
    Amiri, Hamid
    El Abed, Haikal
    Maergner, Volker
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 633 - 638
  • [26] A voting approach for image binarization of text-based documents
    Boiangiu, Costin-Anton
    Vlăsceanu, Giorgiana Violeta
    Atanasiu, Alexandru Marian
    Damian, Petrișor Alin
    Panaitescu, Cristian
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2019, 81 (03): : 53 - 64
  • [27] Binarization of Degraded Handwritten Documents Based on Morphological Contrast intensification
    Mandal, Sekhar
    Das, Sugata
    Agarwal, Amrit
    Chanda, Bhabatosh
    2015 THIRD INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2015, : 73 - 78
  • [28] Adaptive-interpolative binarization with stroke preservation for restoration of faint characters in degraded documents
    Bag, Soumen
    Bhowmick, Partha
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 31 : 266 - 281
  • [29] AN ADAPTIVE LAYER-BASED LOCAL BINARIZATION TECHNIQUE FOR DEGRADED DOCUMENTS
    Makridis, Michael
    Papamarkos, N.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2010, 24 (02) : 245 - 279
  • [30] Historical Document Image Binarization Based on Edge Contrast Information
    Li, Zhenjiang
    Wang, Weilan
    Cai, Zhengqi
    ADVANCES IN COMPUTER VISION, CVC, VOL 1, 2020, 943 : 614 - 628