Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引:0
|
作者
Haji, M. Mehdi [1 ]
Bui, Tien D. [1 ]
Suen, Ching Y. [1 ]
机构
[1] Concordia Univ, Montreal, PQ, Canada
来源
关键词
Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;
D O I
10.1117/12.874706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Automatic segmentation of unconstrained handwritten numeral strings
    Sadri, J
    Suen, CY
    Bui, TD
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 317 - 322
  • [2] Page-to-Word Extraction from Unconstrained Handwritten Document Images
    Singh, Pawan Kumar
    Chowdhury, Sagnik Pal
    Sinha, Shubham
    Eum, Sungmin
    Sarkar, Ram
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 517 - 525
  • [3] Distance transform based text-line extraction from unconstrained handwritten document images
    Bera, Suman Kumar
    Kundu, Soumyadeep
    Kumar, Neeraj
    Sarkar, Ram
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
  • [4] Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
    Sarkar, Ram
    Malakar, Samir
    Das, Nibaran
    Basu, Subhadip
    Kundu, Mahantapas
    Nasipuri, Mita
    JOURNAL OF INTELLIGENT SYSTEMS, 2011, 20 (03) : 227 - 260
  • [5] Unconstrained handwritten document retrieval
    Huaigu Cao
    Venu Govindaraju
    Anurag Bhardwaj
    International Journal on Document Analysis and Recognition (IJDAR), 2011, 14 : 145 - 157
  • [6] Unconstrained handwritten document retrieval
    Cao, Huaigu
    Govindaraju, Venu
    Bhardwaj, Anurag
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2011, 14 (02) : 145 - 157
  • [7] Word level Script and Language identification for Unconstrained handwritten document images
    Prasanthkumar, P., V
    Dileesh, E. D.
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 14 - 18
  • [8] Integrated extraction of handwritten numeral strings in form document based on hybrid binarization
    Zheng, Tian-Xiang
    Xie, Liang
    Yang, Li-Hua
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2008, 21 (03): : 369 - 375
  • [9] Automatic localization and extraction of tables from handheld mobile-camera captured handwritten document images
    Amarnath, R.
    Sindhushree, G. S.
    Nagabhushan, P.
    Javed, Mohammed
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (03) : 2527 - 2544
  • [10] Study on segmentation algorithm for unconstrained handwritten numeral strings
    Chuang, Z
    Ming, W
    Jun, G
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2004, 3215 : 632 - 642