Automatic Extraction of Numeric Strings in Unconstrained Handwritten Document Images

被引：0

作者：

Haji, M. Mehdi ^{[1
]}

Bui, Tien D. ^{[1
]}

Suen, Ching Y. ^{[1
]}

机构：

[1] Concordia Univ, Montreal, PQ, Canada

来源：

DOCUMENT RECOGNITION AND RETRIEVAL XVIII | 2011年 / 7874卷

关键词：

Numeric extraction; unconstrained handwritten documents; character segmentation; regularity measure; graph partitioning; pruning; recognition-based verification; TEXT LINE; SEGMENTATION; RECOGNITION; CHARACTER;

D O I：

10.1117/12.874706

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.

引用

页数：9

共 50 条

[21] Indexing of handwritten document images
SyedaMahmood, T
WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 66 - 73
[22] Matching Handwritten Document Images
Krishnan, Praveen
Jawahar, C. V.
COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 766 - 782
[23] Automatic name extraction from degraded document images
Laurence Likforman-Sulem
Pascal Vaillant
Aliette de Bodard de la Jacopière
Pattern Analysis and Applications, 2006, 9 : 211 - 227
[24] Automatic keyword extraction from historical document images
Terasawa, K
Nagasaki, T
Kawashima, T
DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 413 - 424
[25] Automatic name extraction from degraded document images
Likforman-Sulem, Laurence
Vaillant, Pascal
de la Jacopiere, Aliette de Bodard
PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 211 - 227
[26] Recognition of unconstrained handwritten numeral strings using decision value generator
Kim, KK
Chung, YK
Kim, JH
Suen, CY
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 14 - 17
[27] Text-line extraction from handwritten document images using GAN
Kundu, Soumyadeep
Paul, Sayantan
Bera, Suman Kumar
Abraham, Ajith
Sarkar, Ram
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
[28] Unconstrained logo detection in document images
Pham, TD
PATTERN RECOGNITION, 2003, 36 (12) : 3023 - 3025
[29] An integrated approach for automatic semantic structure extraction in document images
Berardi, M
Lapi, M
Malerba, D
DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 179 - 190
[30] An automatic histogram detection and information extraction from document images
P. H. Anagha
A. Baskar
International Journal of Speech Technology, 2021, 24 : 77 - 85

← 1 2 3 4 5 →