Character extraction from documents using wavelet maxima

被引:13
|
作者
Hwang, WL [1 ]
Chang, F [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
optical character recognition; thresholding; wavelet maxima;
D O I
10.1016/S0262-8856(97)00063-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:307 / 315
页数:9
相关论文
共 50 条
  • [31] Classification of web documents using concept extraction from ontologies
    Litvak, Marina
    Last, Mark
    Kisilevich, Slava
    AUTONOMOUS INTELLIGENT SYSTEMS: AGENTS AND DATA MINING, PROCEEDINGS, 2007, 4476 : 287 - +
  • [32] Date Field Extraction from Handwritten Documents Using HMMs
    Mandal, Ranju
    Roy, Partha Pratim
    Pal, Umapada
    Blumenstein, Michael
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 866 - 870
  • [33] Using ontology to improve precision of terminology extraction from documents
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (05) : 9333 - 9339
  • [34] Molecular Structure Extraction from Documents Using Deep Learning
    Staker, Joshua
    Marshall, Kyle
    Abel, Robert
    McQuaw, Carolyn M.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 1017 - 1029
  • [35] Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents
    Bar-Yosef, Itay
    Beckman, Isaac
    Kedem, Klara
    Dinstein, Itshak
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 89 - 99
  • [36] Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents
    Itay Bar-Yosef
    Isaac Beckman
    Klara Kedem
    Itshak Dinstein
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 9 : 89 - 99
  • [37] Restoration of archival documents using a wavelet technique
    Tan, CL
    Cao, RN
    Shen, PY
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (10) : 1399 - 1404
  • [38] A SURVEY ON CHARACTER RECOGNITION FROM HANDWRITTEN DOCUMENTS
    Kaur, Gagandeep
    Singh, Varinder
    Chawla, Sunil Kumar
    Bhasin, Mahima
    ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2020, 19 (05): : 321 - 331
  • [39] Selective Extraction of Chlorophyll a/Photosystem Polypeptides from Spirulina maxima Using Aqueous Two Phase Extraction
    Yun Ji Cho
    Byung Man Lee
    Youngbin Baek
    Hwa Sung Shin
    Biotechnology and Bioprocess Engineering, 2022, 27 (6) : 1014 - 1021
  • [40] Wavelet extraction using cepstrum
    Kim, Young C.
    Jo, Yeonghwa
    Byun, Joongmoo
    GEOPHYSICS, 2023, 88 (05) : V403 - V413