Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引:0
|
作者
Haji, M. Mehdi [1 ]
Bui, Tien D. [1 ]
Suen, Ching Y. [1 ]
机构
[1] Concordia Univ, Montreal, PQ, Canada
来源
DOCUMENT RECOGNITION AND RETRIEVAL XVIII | 2011年 / 7874卷
关键词
Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;
D O I
10.1117/12.874706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] AUTOMATIC TEXT EXTRACTION, REMOVAL AND INPAINTING OF COMPLEX DOCUMENT IMAGES
    Chen, Yen-Lin
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (1A): : 303 - 327
  • [32] An automatic histogram detection and information extraction from document images
    Anagha, P. H.
    Baskar, A.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 77 - 85
  • [33] Segmentation algorithm for unconstrained handwritten numeral strings in bank check reader system
    Zhang, Chuang
    Lin, Zhi-Qing
    Xiao, Bo
    Guo, Jun
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2006, 29 (01): : 13 - 16
  • [34] Toward automatic construction of structural models for unconstrained handwritten characters
    Nishida, H
    HANDWRITING AND DRAWING RESEARCH: BASIC AND APPLIED ISSUES, 1996, : 359 - 372
  • [35] Transcript mapping for historic handwritten document images
    Tomai, CI
    Zhang, B
    Govindaraju, V
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 413 - 418
  • [36] Visual Aesthetic Analysis for Handwritten Document Images
    Majumdar, Anshuman
    Krishnan, Praveen
    Jawahar, C. V.
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 423 - 428
  • [37] Script and language identification for handwritten document images
    Judith Hochberg
    Kevin Bowers
    Michael Cannon
    Patrick Kelly
    International Journal on Document Analysis and Recognition, 1999, 2 (2-3) : 45 - 52
  • [38] Named Entity Linking on Handwritten Document Images
    Tueselmann, Oliver
    Fink, Gernot A.
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 199 - 213
  • [39] Unconstrained Arabic Handwritten Word Feature Extraction: A Comparative Study
    AlKhateeb, Jawad H.
    Ren, Jinchang
    Jiang, Jianmin
    Ipson, Stan S.
    PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 1655 - 1656
  • [40] Binarization and Segmentation of Kannada Handwritten Document Images
    Vinod, H. C.
    Niranjan, S. K.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 488 - 493