Exploiting Stroke Orientation for CRF based Binarization of Historical Documents

被引:5
|
作者
Peng, Xujun [1 ]
Cao, Huaigu [1 ]
Subramanian, Krishna [1 ]
Prasad, Rohit [1 ]
Natarajan, Prem [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
关键词
D O I
10.1109/ICDAR.2013.207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel binarization method that is especially effective on historical documents with the following characteristics: (a) the documents contain free-form cursive handwritten text with significant but consistent slant, (b) scanning artifacts resulting in the text and background pixels not having uniform intensity even within the same page, and (c) pages containing significant amount of bleeds from the other side of the page. In order to tackle the problem of non-uniform text and background intensity, we use a thresholding algorithm that works equally well for regions of the page containing text and regions of the page containing no text. We then combine this algorithm with a CRF-based framework which handles bleeds using a novel approach to further improve the quality of binarization. We compare the proposed binarization algorithm against other popular binarization algorithms both qualitatively using examples and quantitatively using the word error rate (WER) metric from performing optical character recognition (OCR) on binarized text using the BBN Byblos Offline Handwritten text recognition (OHR) system.
引用
收藏
页码:1034 / 1038
页数:5
相关论文
共 50 条
  • [41] Named entity recognition for Chinese judgment documents based on BiLSTM and CRF
    Huang, Wenming
    Hu, Dengrui
    Deng, Zhenrong
    Nie, Jianyun
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [42] Stroke Width-Based Contrast Feature for Document Image Binarization
    Le Thi Khue Van
    Lee, Gueesang
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2014, 10 (01): : 55 - 68
  • [43] Handheld Mobile Device Based Text Region Extraction and Binarization of Image Embedded Text Documents
    Mollah, Ayatullah
    Basu, Suhhadip
    Nasipuri, Mita
    Basu, Dipak
    JOURNAL OF INTELLIGENT SYSTEMS, 2013, 22 (01) : 25 - 47
  • [44] A wavelet-transform-based binarization algorithm on dynamic threshold of vertical orientation of fingerprint
    Li, J
    Wu, HY
    Fang, KL
    Hu, LJ
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON INTELLIGENT MECHATRONICS AND AUTOMATION, 2004, : 877 - 880
  • [45] iDocChip - A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization
    Rybalkin, Vladimir
    Bukhari, Syed Saqib
    Ghaffar, Muhammad Mohsin
    Ghafoor, Aqib
    Wehn, Norbert
    Dengel, Andreas
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [46] A new method for writer identification based on historical documents
    Gattal, Abdeljalil
    Djeddi, Chawki
    Abbas, Faycel
    Siddiqi, Imran
    Bouderah, Brahim
    JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [47] Towards style-based dating of historical documents
    He, Sheng
    Samara, Petros
    Burgers, Jan
    Schomaker, Lambert
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 265 - 270
  • [48] Ontology-Based Information Retrieval for Historical Documents
    Ramli, Fatihah
    Noah, Shahrul Azman
    Kurniawan, Tri Basuki
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 55 - 59
  • [49] A Mask-based enhancement method for historical documents
    Smith, Elisa H. Barney
    Darbon, Jerome
    Likforman-Sulem, Laurence
    DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [50] An Active Contour Based Method for Image Binarization: Application to degraded historical document images
    Hadjadj, Zineb
    Meziane, Abdelkrim
    Cheriet, Mohamed
    Cherfa, Yazid
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 655 - 660