Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引:0
|
作者
Haji, M. Mehdi [1 ]
Bui, Tien D. [1 ]
Suen, Ching Y. [1 ]
机构
[1] Concordia Univ, Montreal, PQ, Canada
来源
DOCUMENT RECOGNITION AND RETRIEVAL XVIII | 2011年 / 7874卷
关键词
Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;
D O I
10.1117/12.874706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Indexing of handwritten document images
    SyedaMahmood, T
    WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 66 - 73
  • [22] Matching Handwritten Document Images
    Krishnan, Praveen
    Jawahar, C. V.
    COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 766 - 782
  • [23] Automatic name extraction from degraded document images
    Laurence Likforman-Sulem
    Pascal Vaillant
    Aliette de Bodard de la Jacopière
    Pattern Analysis and Applications, 2006, 9 : 211 - 227
  • [24] Automatic keyword extraction from historical document images
    Terasawa, K
    Nagasaki, T
    Kawashima, T
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
  • [25] Automatic name extraction from degraded document images
    Likforman-Sulem, Laurence
    Vaillant, Pascal
    de la Jacopiere, Aliette de Bodard
    PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227
  • [26] Recognition of unconstrained handwritten numeral strings using decision value generator
    Kim, KK
    Chung, YK
    Kim, JH
    Suen, CY
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 14 - 17
  • [27] Text-line extraction from handwritten document images using GAN
    Kundu, Soumyadeep
    Paul, Sayantan
    Bera, Suman Kumar
    Abraham, Ajith
    Sarkar, Ram
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
  • [28] Unconstrained logo detection in document images
    Pham, TD
    PATTERN RECOGNITION, 2003, 36 (12) : 3023 - 3025
  • [29] An integrated approach for automatic semantic structure extraction in document images
    Berardi, M
    Lapi, M
    Malerba, D
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 179 - 190
  • [30] An automatic histogram detection and information extraction from document images
    P. H. Anagha
    A. Baskar
    International Journal of Speech Technology, 2021, 24 : 77 - 85