A simple text/graphic separation method for document image segmentation

被引:0
|
作者
Zirari, F. [1 ]
Ennaji, A. [1 ]
Nicolas, S. [1 ]
Mammass, D. [2 ]
机构
[1] Univ Rouen, LITIS Lab, Rouen, France
[2] Ibn Zohr Univ, IRF SIC Lab, Agadir, Morocco
关键词
text/non-text separating; connected components; graph; structural analysis; document image;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] An Improved Method for Text Segmentation and Skew Normalization of Handwriting Image
    Bal, Abhishek
    Saha, Rajib
    PROGRESS IN INTELLIGENT COMPUTING TECHNIQUES: THEORY, PRACTICE, AND APPLICATIONS, VOL 1, 2018, 518 : 181 - 196
  • [22] SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues
    Xie, Yuxin
    Zhou, Tao
    Zhou, Yi
    Chen, Geng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 634 - 644
  • [23] A method for text-line segmentation for unconstrained Arabic and Persian handwritten text image
    Shakoori, Reza
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2014, : 338 - 344
  • [24] iDocChip: A Configurable Hardware Architecture for Historical Document Image ProcessingMultiresolution Morphology-based Text and Image Segmentation
    Menbere Kina Tekleyohannes
    Vladimir Rybalkin
    Muhammad Mohsin Ghaffar
    Javier Alejandro Varela
    Norbert Wehn
    Andreas Dengel
    International Journal of Parallel Programming, 2021, 49 : 253 - 284
  • [25] A simple and effective sub-image separation method
    Ali, Mushtaq
    Asghar, Muhammad Zubair
    Shah, Mohsin
    Mahmood, Tauqeer
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (11) : 14893 - 14910
  • [26] Document segmentation and classification into musical scores and text
    Pedersoli, Fabrizio
    Tzanetakis, George
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (04) : 289 - 304
  • [27] A novel document image segmentation method using medial axis transform
    Tzeng, CH
    Tsai, WH
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A224 - A227
  • [28] Text segmentation in degraded historical document images
    Kavitha, A. S.
    Shivakumara, P.
    Kumar, G. H.
    Lu, Tong
    EGYPTIAN INFORMATICS JOURNAL, 2016, 17 (02) : 189 - 197
  • [29] Segmentation of text and graphics from document images
    Chowdhury, S. P.
    Mandal, S.
    Das, A. K.
    Chanda, Bhabatosh
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 619 - +
  • [30] Document segmentation and classification into musical scores and text
    Fabrizio Pedersoli
    George Tzanetakis
    International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 : 289 - 304