Character string extraction from color documents

被引:35
|
作者
Hase, H
Shinokawa, T
Yoneda, M
Suen, CY
机构
[1] Toyama Univ, Fac Engn, Dept Intellectual Info Sys Eng, Toyama 9308555, Japan
[2] Toyama Natl Coll Maritime Technol, Toyama, Japan
[3] Concordia Univ, Ctr Pattern Recognit & Machine Intelligence, Montreal, PQ H3G 1M8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
color document; character string extraction; color segmentation; multi-stage relaxation; conflict resolution; likelihood of a character string;
D O I
10.1016/S0031-3203(00)00081-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new algorithm for the extraction of character strings from color documents is proposed. We first divide a full color image into several representative binary color images. Then, character strings are nominated from each binary image by using multi-stage relaxation. However, the nominated strings are not always characters. They may be a part of the background, concatenated holes of characters, or dotted lines, etc. Therefore, when all nominated strings of all binary images are superimposed, some strings overlap each other. So, we selected the appropriate strings from them using the likelihood of a character string and two kinds of conflict resolution. In the experiments, we used color images like magazine covers, posters, etc. After applying color segmentation and the multi-stage relaxation, many character strings were nominated. Next, some adequate strings were selected. Finally, we show the experimental results and discuss some problems of extracting character strings From a color document. (C) 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1349 / 1365
页数:17
相关论文
共 50 条
  • [21] String extraction from color airline coupon image using statistical approach
    Li, Y
    Wang, ZY
    Zeng, HZ
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 289 - 293
  • [22] Color Seal Extraction from Documents: Robustness through Soft Data Fusion
    Aureli Soria-Frisch
    EURASIP Journal on Advances in Signal Processing, 2005
  • [23] Color seal extraction from documents: Robustness through soft data fusion
    Soria-Frisch, A
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (13) : 2146 - 2152
  • [24] Text extraction from color documents - clustering approaches in three and four dimensions
    Perroud, T
    Sobottka, K
    Bunke, H
    Hall, L
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 937 - 941
  • [25] An efficient extraction of character string positions using morphological operator
    Park, CJ
    Moon, KA
    Oh, WG
    Choi, HM
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 1616 - 1620
  • [26] An optimization methodology for document structure extraction on Latin character documents
    Liang, JS
    Phillips, IT
    Haralick, RM
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (07) : 719 - 734
  • [27] Character string extraction from newspaper headlines with a background design by recognizing a combination of connected components
    Takebe, H
    Katsuyama, Y
    Naoi, S
    DOCUMENT RECOGNITION AND RETRIEVAL VI, 1999, 3651 : 22 - 29
  • [28] A method for character string extraction using local and global segment crowdedness
    Shiku, O
    Kawasue, K
    Nakamura, A
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1077 - 1080
  • [29] Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents
    Bar-Yosef, Itay
    Beckman, Isaac
    Kedem, Klara
    Dinstein, Itshak
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 89 - 99
  • [30] Handwritten numeral string recognition: Effects of character normalization and feature extraction
    Liu, CL
    Sako, H
    Fujisawa, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (08): : 1791 - 1798