An effective method for text line segmentation in historical document images

被引:3
|
作者
Tien-Nam Nguyen [1 ]
Burie, Jean-Christophe [1 ]
Thi-Lan Le [2 ]
Schweyer, Anne-Valerie [3 ]
机构
[1] La Rochelle Univ, Lab Informat Image Interact L3i, La Rochelle, France
[2] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn SEEE, Hanoi, Vietnam
[3] CNRS, Ctr Asie Sud Est CASE, Paris, France
关键词
D O I
10.1109/ICPR56361.2022.9956617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a text-line segmentation method for historical documents. Historical documents are challenging given their characteristics of highly degradation, writing style variation and diacritics. From these observations, we proposed an effective approach for text line segmentation by analysing the properties of document layouts. We combine the idea of seam carving method with the novel cost functions to accurately split text lines. Experiments were conducted on two challenging datasets of historical documents, namely the DIVA-HisDB dataset and our ChamDoc dataset. Our methods provided good results on the DIVA-HisDB dataset with 99.36% of Line IU and 98.86% of Pixel IU. On the ChamDoc dataset, the proposed method outperformed the two baseline approaches i.e. seam carving-based and A* path planning by a large margin.
引用
收藏
页码:1593 / 1599
页数:7
相关论文
共 50 条
  • [1] Text segmentation in degraded historical document images
    Kavitha, A. S.
    Shivakumara, P.
    Kumar, G. H.
    Lu, Tong
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2016, 17 (02) : 189 - 197
  • [2] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [3] A novel method of text line segmentation for historical document image of the uchen Tibetan
    Li, Zhenjiang
    Wang, Weilan
    Chen, Yang
    Hao, Yusheng
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 61 : 23 - 32
  • [4] A two-step framework for text line segmentation in historical Arabic and Latin document images
    Olfa Mechi
    Maroua Mehri
    Rolf Ingold
    Najoua Essoukri Ben Amara
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 197 - 218
  • [5] A two-step framework for text line segmentation in historical Arabic and Latin document images
    Mechi, Olfa
    Mehri, Maroua
    Ingold, Rolf
    Essoukri Ben Amara, Najoua
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 197 - 218
  • [6] Experimental application of a Japanese historical document image synthesis method to text line segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    [J]. ICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, 2021, : 628 - 634
  • [7] Experimental Application of a Japanese Historical Document Image Synthesis Method to Text Line Segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 628 - 634
  • [8] Text Line Segmentation in Images of Handwritten Historical Documents
    Sanchez, A.
    Suarez, P. D.
    Melloz, C. A. B.
    Oliveira, A. L. I.
    Alves, V. M. O.
    [J]. 2008 FIRST INTERNATIONAL WORKSHOPS ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2008, : 232 - +
  • [9] DENSE PREDICTION FOR TEXT LINE SEGMENTATION IN HANDWRITTEN DOCUMENT IMAGES
    Quang Nhat Vo
    Lee, GueeSang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3264 - 3268
  • [10] Eigenspace method for text retrieval in historical document images
    Terasawa, K
    Nagasaki, T
    Kawashima, T
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 437 - 441