A document image segmentation system using analysis of connected components

被引:9
|
作者
Zirari, F. [1 ]
Ennaji, A. [1 ]
Nicolas, S. [1 ]
Mammass, D. [2 ]
机构
[1] Univ Rouen, LITIS Lab, Rouen, France
[2] Ibn Zohr Univ, IRF SIC Lab, Agadir, Morocco
关键词
text/non-text separating; connected components; graph; structural analysis; document image;
D O I
10.1109/ICDAR.2013.154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
引用
收藏
页码:753 / 757
页数:5
相关论文
共 50 条
  • [21] Document zone classification using sizes of connected-components
    Liang, JS
    Phillips, IT
    Ha, JK
    Haralick, RM
    DOCUMENT RECOGNITION III, 1996, 2660 : 150 - 157
  • [22] Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis
    Khandelwal, Abhishek
    Choudhury, Pritha
    Sarkar, Ram
    Basu, Subhadip
    Nasipuri, Mita
    Das, Nibaran
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 369 - +
  • [23] NEW SEGMENTATION TECHNIQUES FOR DOCUMENT IMAGE-ANALYSIS
    VENKATESWARLU, NB
    BOYLE, RD
    IMAGE AND VISION COMPUTING, 1995, 13 (07) : 573 - 583
  • [24] Document image segmentation through clustering and connectivity analysis
    Ilie, Mihai Bogdan
    Advances in Intelligent Systems and Computing, 2015, 314
  • [25] Color images segmentation using new definition of connected components
    Sun, Y
    Sun, CY
    Wang, WZ
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 863 - 868
  • [26] A Document Layout Analysis Method Based on Morphological Operators and Connected Components
    Alarcon Arenas, Sebastian W.
    Meza-Lovon, Graciela L.
    Yari, Yessenia
    2018 XLIV LATIN AMERICAN COMPUTER CONFERENCE (CLEI 2018), 2018, : 622 - 631
  • [27] Improved Document Image Segmentation Algorithm using Multiresolution Morphology
    Bukhari, Syed Saqib
    Shafait, Faisal
    Breuel, Thomas M.
    DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [28] Document Image Segmentation using Averaging Filtering and Mathematical Morphology
    Polyakova, Marina
    Ishchenko, Alesya
    Huliaieva, Natallia
    2018 14TH INTERNATIONAL CONFERENCE ON ADVANCED TRENDS IN RADIOELECTRONICS, TELECOMMUNICATIONS AND COMPUTER ENGINEERING (TCSET), 2018, : 966 - 969
  • [29] Image skew detection for formulas without fraction bars using connected components analysis
    Zhang, Lichun
    Lu, Yue
    Chen, Guoyue
    Wang, Patrick S. P.
    2007 CIT: 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 680 - +
  • [30] Scanned color document image segmentation using the EM algorithm
    Handley, John C.
    ICIS '06: INTERNATIONAL CONGRESS OF IMAGING SCIENCE, FINAL PROGRAM AND PROCEEDINGS: LINKING THE EXPLOSION OF IMAGING APPLICATIONS WITH THE SCIENCE AND TECHNOLOGY OF IMAGING, 2006, : 675 - 678