A document image segmentation system using analysis of connected components

被引:9
|
作者
Zirari, F. [1 ]
Ennaji, A. [1 ]
Nicolas, S. [1 ]
Mammass, D. [2 ]
机构
[1] Univ Rouen, LITIS Lab, Rouen, France
[2] Ibn Zohr Univ, IRF SIC Lab, Agadir, Morocco
关键词
text/non-text separating; connected components; graph; structural analysis; document image;
D O I
10.1109/ICDAR.2013.154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
引用
收藏
页码:753 / 757
页数:5
相关论文
共 50 条
  • [1] Image analysis and segmentation using gray connected components
    Wang, Y
    Bhattacharya, P
    INFORMATION INTELLIGENCE AND SYSTEMS, VOLS 1-4, 1996, : 444 - 449
  • [2] Gray connected components and image segmentation
    Wang, Y
    Bhattacharya, B
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XIX, 1996, 2847 : 118 - 129
  • [3] Document Image Dataset Indexing and Compression Using Connected Components Clustering
    Chatbri, Houssem
    Kameyama, Keisuke
    2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015, : 267 - 270
  • [4] Color image segmentation and understanding through connected components
    Wang, WZ
    Sun, CY
    Chao, HX
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 1089 - 1093
  • [5] Page segmentation for document image analysis using a neural network
    Patel, D
    OPTICAL ENGINEERING, 1996, 35 (07) : 1854 - 1861
  • [6] DOCUMENT IMAGE SEGMENTATION AND LAYOUT ANALYSIS
    SAITOH, T
    YAMAAI, T
    TACHIKAWA, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1994, E77D (07) : 778 - 784
  • [7] Fuzzy segmentation for document image analysis
    Chan, KCC
    Huang, XD
    Bao, P
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 977 - 982
  • [8] HANDWRITTEN DOCUMENT IMAGE SEGMENTATION AND ANALYSIS
    SHAPIRO, V
    GLUHCHEV, G
    SGUREV, V
    PATTERN RECOGNITION LETTERS, 1993, 14 (01) : 71 - 78
  • [9] Lung Segmentation Based on Statistical Analysis Using Features of Connected Components
    Rani, V. Juliet
    Thanammal, K. K.
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 132 (02) : 1453 - 1486
  • [10] Lung Segmentation Based on Statistical Analysis Using Features of Connected Components
    V. Juliet Rani
    K. K. Thanammal
    Wireless Personal Communications, 2023, 132 : 1453 - 1486