A simple text/graphic separation method for document image segmentation

被引:0
|
作者
Zirari, F. [1 ]
Ennaji, A. [1 ]
Nicolas, S. [1 ]
Mammass, D. [2 ]
机构
[1] Univ Rouen, LITIS Lab, Rouen, France
[2] Ibn Zohr Univ, IRF SIC Lab, Agadir, Morocco
关键词
text/non-text separating; connected components; graph; structural analysis; document image;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] A novel method of text line segmentation for historical document image of the uchen Tibetan
    Li, Zhenjiang
    Wang, Weilan
    Chen, Yang
    Hao, Yusheng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 61 : 23 - 32
  • [2] Handwritten document image segmentation into text lines and words
    Papavassiliou, Vassilis
    Stafylakis, Themos
    Katsouros, Vassilis
    Carayannis, George
    PATTERN RECOGNITION, 2010, 43 (01) : 369 - 377
  • [3] Experimental application of a Japanese historical document image synthesis method to text line segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    ICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, 2021, : 628 - 634
  • [4] Experimental Application of a Japanese Historical Document Image Synthesis Method to Text Line Segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 628 - 634
  • [5] A Novel Method for Text and Non-Text Segmentation in Document Images
    Deivalakshmi, S.
    Palanisamy, P.
    Vishwanathan, Gayatri
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2013, : 255 - 259
  • [6] An effective method for text line segmentation in historical document images
    Tien-Nam Nguyen
    Burie, Jean-Christophe
    Thi-Lan Le
    Schweyer, Anne-Valerie
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1593 - 1599
  • [7] A method for combining complementary techniques for document image segmentation
    Stamatopoulos, Nikolaos
    Gatos, Basilis
    Perantonis, Stavros J.
    PATTERN RECOGNITION, 2009, 42 (12) : 3158 - 3168
  • [8] Separation of text and background regions for high performance document image compression
    Fan, Wei
    Sun, Jun
    Naoi, Satoshi
    DOCUMENT RECOGNITION AND RETRIEVAL XXII, 2015, 9402
  • [9] A Background Separation Method of Nonuniform Image Segmentation
    Liu Junqiang
    Gao Jianming
    Shen Qingming
    Tian Junwei
    ICIEA: 2009 4TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-6, 2009, : 3040 - +
  • [10] A Rapid Separation Method for Nonuniform Image Segmentation
    Liu Jun-qiang
    Wu Fu-jia
    Tian Junwei
    Gao Xiao-bing
    PRECISION ENGINEERING AND NON-TRADITIONAL MACHINING, 2012, 411 : 483 - 487