Hierarchical content classification and script determination for automatic document image processing

被引:9
|
作者
Chi, Z [1 ]
Wang, Q
Siu, WC
机构
[1] Hong Kong Polytech Univ, Ctr Multimedia Signal Proc, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Northwestern Polytech Univ, Dept Comp Sci & Engn, Xian 710072, Peoples R China
关键词
document image processing; page segmentation; content classification; script determination; background thinning; cross-correlation; Kolmogorov complexity; neural networks;
D O I
10.1016/S0031-3203(03)00128-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page segmentation and image content classification play an important role in automatic image processing with applications to mixed-type document image compression, form and check reading, and automatic mail sorting. In this paper, we first present an enhanced background thinning based approach for fast page segmentation. After the analysis of three different methods individually, a hierarchical approach for document content classification is proposed, which classifies a sub-image into one of two categories: text and halftone. Our approach combines a neural network model, cross-correlation metric, and Kolmogorov complexity measure in a hierarchical structure. Considering the necessity of a recognition system, we also propose using a three-layer feedforward neural network to classify text regions into Chinese and English scripts. The classification accuracy on a number of document images reaches 100% and 97.1% for halftone region and text region, respectively. Meanwhile, the system can achieve a correct rate of 92.3% and 95.0% for Chinese and alphabetic script determination, respectively. (C) 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:2483 / 2500
页数:18
相关论文
共 50 条
  • [31] Automatic grain size determination in microstructures using image processing
    Peregrina-Barreto, H.
    Terol-Villalobos, I. R.
    Rangel-Magdaleno, J. J.
    Herrera-Navarro, A. M.
    Morales-Hernandez, L. A.
    Manriquez-Guerrero, F.
    [J]. MEASUREMENT, 2013, 46 (01) : 249 - 258
  • [32] Automatic content-based image retrieval using hierarchical clustering algorithms
    Jarrah, Kambiz
    Krishnan, Sri
    Guan, Ling
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 3532 - +
  • [33] Automatic Image Classification for Web Content Filtering: New Dataset Evaluation
    Fralenko, V. P.
    Suvorov, R. E.
    Tikhomirov, I. A.
    [J]. RECENT DEVELOPMENTS AND THE NEW DIRECTION IN SOFT-COMPUTING FOUNDATIONS AND APPLICATIONS, 2018, 361 : 351 - 360
  • [34] Automatic Ferrite Content Measurement based on Image Analysis and Pattern Classification
    Tanveer, Hafiz Muhammad
    Mustafa, Hafiz Muhammad Tahir
    Asif, Waleed
    Ahmad, Munir
    Javed, Muhammad Anjum
    Ahmad, Maqsood
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (05) : 103 - 108
  • [35] SOME HIERARCHICAL MODELS FOR AUTOMATIC DOCUMENT RETRIEVAL
    SALTON, G
    [J]. AMERICAN DOCUMENTATION, 1963, 14 (03): : 213 - &
  • [36] Document image content inventories
    Baird, Henry S.
    Moll, Michael A.
    An, Chang
    Casey, Matthew R.
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [37] AUTOMATIC IMAGE CLASSIFICATION
    BUTCHINS, SA
    [J]. ASTRONOMY & ASTROPHYSICS, 1982, 109 (02) : 360 - 365
  • [38] Automatic Image Processing Filter Generation for Visual Defects Classification System
    Hata, Seiji
    Hayashi, Junichiro
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS, VOLS 1 AND 2, 2009, : 486 - 491
  • [39] AUTOMATIC SYSTEM FOR BLOOD TYPE CLASSIFICATION USING IMAGE PROCESSING TECHNIQUES
    Ferraz, Ana
    Moreira, Vania
    Silva, Diana
    Carvalho, Vitor
    Soares, Filomena O.
    [J]. BIODEVICES 2011, 2011, : 368 - 373
  • [40] THEORY OF RELEVANCE FOR AUTOMATIC DOCUMENT CLASSIFICATION
    HEAPS, HS
    [J]. INFORMATION AND CONTROL, 1973, 22 (03): : 268 - 278