Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引:0
|
作者
Haji, M. Mehdi [1 ]
Bui, Tien D. [1 ]
Suen, Ching Y. [1 ]
机构
[1] Concordia Univ, Montreal, PQ, Canada
来源
DOCUMENT RECOGNITION AND RETRIEVAL XVIII | 2011年 / 7874卷
关键词
Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;
D O I
10.1117/12.874706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
引用
收藏
页数:9
相关论文
共 50 条
  • [11] Study on segmentation algorithm for unconstrained handwritten numeral strings
    Zhang, C
    Lin, ZQ
    Guo, J
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1242 - 1247
  • [12] A Hybrid Method for Text Line Extraction in Handwritten Document Images
    Kiumarsi, Ehsan
    Alaei, Alireza
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 241 - 246
  • [13] Unsupervised Page Area Detection Approach for the Unconstrained Chronic Handwritten Modi Document Images
    Deshmukh, Manisha S.
    Kolhe, Satish R.
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 130 - 135
  • [14] An Approach for Automatic Indic Script Identification from Handwritten Document Images
    Obaidullah, Sk. Md.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    ADVANCED COMPUTING AND SYSTEMS FOR SECURITY, VOL 2, 2016, 396 : 37 - 51
  • [15] Algorithm of the length estimation of unconstrained handwritten connected numeral strings
    Zhang, Chuang
    Wu, Ming
    Guo, Jun
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2004, 27 (03): : 63 - 67
  • [16] A system for segmentation and recognition of totally unconstrained handwritten numeral strings
    Shi, Z
    Srihari, SN
    Shin, YC
    Ramanaprasad, V
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 455 - 458
  • [17] Indic Script Identification from Handwritten Document Images - An Unconstrained Block-level Approach
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 213 - 218
  • [18] Segmentation of unconstrained handwritten numeral strings using continuation property
    Yoon, S
    Kim, K
    Choi, Y
    Lee, Y
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 353 - 362
  • [19] Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis
    Khandelwal, Abhishek
    Choudhury, Pritha
    Sarkar, Ram
    Basu, Subhadip
    Nasipuri, Mita
    Das, Nibaran
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 369 - +
  • [20] Recognition of unconstrained handwritten numeral strings by composite segmentation method
    Kim, KK
    Kim, JH
    Suen, CY
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 594 - 597