Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引:0
|
作者
Haji, M. Mehdi [1 ]
Bui, Tien D. [1 ]
Suen, Ching Y. [1 ]
机构
[1] Concordia Univ, Montreal, PQ, Canada
来源
DOCUMENT RECOGNITION AND RETRIEVAL XVIII | 2011年 / 7874卷
关键词
Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;
D O I
10.1117/12.874706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] A combined approach for the binarization of handwritten document images
    Ntirogiannis, K.
    Gatos, B.
    Pratikakis, I.
    PATTERN RECOGNITION LETTERS, 2014, 35 : 3 - 15
  • [42] Information extraction from historical handwritten document images with a context-aware neural model
    Ignacio Toledo, J.
    Carbonell, Manuel
    Fornes, Alicia
    Llados, Josep
    PATTERN RECOGNITION, 2019, 86 (27-36) : 27 - 36
  • [43] Automatic recognition of handwritten numerical strings: A recognition and verification strategy
    Oliveira, LS
    Sabourin, R
    Bortolozzi, F
    Suen, CY
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (11) : 1438 - 1454
  • [44] An Unconstrained Benchmark Urdu Handwritten Sentence Database with Automatic Line Segmentation
    Raza, Ahsen
    Siddiqi, Imran
    Abidi, Ali
    Arif, Fahim
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 491 - 496
  • [45] A hybrid fuzzy feature extraction framework for handwritten numeric fields recognition
    Chiang, JH
    Gader, P
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 1881 - 1885
  • [46] Automatic recognition of unconstrained off-line Bangla handwritten numerals
    Pal, U
    Chaudhuri, BB
    ADVANCES IN MULTIMODAL INTERFACES - ICMI 2000, PROCEEDINGS, 2000, 1948 : 371 - 378
  • [47] Automatic line and word segmentation applied to densely line-skewed historical handwritten document images
    Sanchez, A.
    Mello, C. A. B.
    Suarez, P. D.
    Lopes, A.
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2011, 18 (02) : 125 - 142
  • [48] Quadratic spline wavelet approach to automatic extraction of baselines from document images
    Tang, YY
    Yang, LH
    Liu, JM
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 693 - 696
  • [49] A probabilistic method for keyword retrieval in handwritten document images
    Cao, Huaigu
    Bhardwaj, Anurag
    Govindaraju, Venu
    PATTERN RECOGNITION, 2009, 42 (12) : 3374 - 3382
  • [50] A Novel Transcript Mapping Technique for Handwritten Document Images
    Stamatopoulos, Nikolaos
    Gatos, Basilis
    Louloudis, Georgios
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 41 - 46