Character extraction from documents using wavelet maxima

被引:13
|
作者
Hwang, WL [1 ]
Chang, F [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
optical character recognition; thresholding; wavelet maxima;
D O I
10.1016/S0262-8856(97)00063-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:307 / 315
页数:9
相关论文
共 50 条
  • [41] Selective Extraction of Chlorophyll a/Photosystem Polypeptides from Spirulina maxima Using Aqueous Two Phase Extraction
    Cho, Yun Ji
    Lee, Byung Man
    Baek, Youngbin
    Shin, Hwa Sung
    BIOTECHNOLOGY AND BIOPROCESS ENGINEERING, 2022, 27 (06) : 1014 - 1021
  • [42] Axial representation of character by using wavelet transform
    You, XG
    Fang, B
    Tang, YY
    Li, LQ
    Zhang, D
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 130 - 139
  • [43] Using wavelet transform for feature extraction from EEG signal
    Lhotska, Lenka
    Gerla, Vaclav
    Bukartyk, Jiri
    Krajca, Vladimir
    Petranek, Svojmil
    BIOSIGNALS 2008: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, VOL 1, 2008, : 236 - +
  • [44] Using Wavelet Transform for Feature Extraction from ECG beat
    Huptych, Michal
    Lhotska, Lenka
    ANALYSIS OF BIOMEDICAL SIGNALS AND IMAGES, 2008, : 568 - 572
  • [45] Digitizing Physical Documents Using Optical Character Recognition
    Keshari, Abhinav Kaushal
    Sharma, Rajat
    Nigam, Madhav J.
    TENTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2019, 2019, 11071
  • [46] Using genetic programming for character discrimination in damaged documents
    Rivero, D
    Rabuñal, JR
    Dorado, J
    Pazos, A
    APPLICATIONS OF EVOLUTIONARY COMPUTING, 2004, 3005 : 349 - 358
  • [47] Automatic Information Extraction from Electronic Documents Using Machine Learning
    Kamaleson, Nishanthan
    Chu, Dominique
    Otero, Fernando E. B.
    ARTIFICIAL INTELLIGENCE XXXVIII, 2021, 13101 : 183 - 194
  • [48] VisualWordGrid: Information Extraction from Scanned Documents Using a Multimodal Approach
    Kerroumi, Mohamed
    Sayem, Othmane
    Shabou, Aymen
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 389 - 402
  • [49] Semantic Structuring of and Information Extraction from Medical Documents Using the UMLS
    Denecke, K.
    METHODS OF INFORMATION IN MEDICINE, 2008, 47 (05) : 425 - 434
  • [50] Keyword Extraction from Arabic Documents using Term Equivalence Classes
    Awajan, Arafat
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (02)