An effective method for text line segmentation in historical document images

被引:3
|
作者
Tien-Nam Nguyen [1 ]
Burie, Jean-Christophe [1 ]
Thi-Lan Le [2 ]
Schweyer, Anne-Valerie [3 ]
机构
[1] La Rochelle Univ, Lab Informat Image Interact L3i, La Rochelle, France
[2] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn SEEE, Hanoi, Vietnam
[3] CNRS, Ctr Asie Sud Est CASE, Paris, France
关键词
D O I
10.1109/ICPR56361.2022.9956617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a text-line segmentation method for historical documents. Historical documents are challenging given their characteristics of highly degradation, writing style variation and diacritics. From these observations, we proposed an effective approach for text line segmentation by analysing the properties of document layouts. We combine the idea of seam carving method with the novel cost functions to accurately split text lines. Experiments were conducted on two challenging datasets of historical documents, namely the DIVA-HisDB dataset and our ChamDoc dataset. Our methods provided good results on the DIVA-HisDB dataset with 99.36% of Line IU and 98.86% of Pixel IU. On the ChamDoc dataset, the proposed method outperformed the two baseline approaches i.e. seam carving-based and A* path planning by a large margin.
引用
收藏
页码:1593 / 1599
页数:7
相关论文
共 50 条
  • [31] FAST TEXT LINE EXTRACTION IN DOCUMENT IMAGES
    Ha, Seong Jong
    Jin, Bora
    Cho, Nam Ik
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 797 - 800
  • [32] Segmentation and Text extraction from Document Images: Survey
    Mukarambi, Gururaj
    Gaikwad, Hema
    Dhandra, B., V
    [J]. 2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [33] Text line segmentation of historical documents: a survey
    Laurence Likforman-Sulem
    Abderrazak Zahour
    Bruno Taconet
    [J]. International Journal of Document Analysis and Recognition (IJDAR), 2007, 9 : 123 - 138
  • [34] Text line segmentation of historical documents: a survey
    Likforman-Sulem, Laurence
    Zahour, Abderrazak
    Taconet, Bruno
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 123 - 138
  • [35] Text Line segmentation of historical Arabic documents
    Zahour, Abderrazak
    Likforman-Sulem, Laurence
    Boussalaa, Wafa
    Taconet, Bruno
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 138 - +
  • [36] Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis
    Khandelwal, Abhishek
    Choudhury, Pritha
    Sarkar, Ram
    Basu, Subhadip
    Nasipuri, Mita
    Das, Nibaran
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 369 - +
  • [37] Weakly supervised precise segmentation for historical document images
    Xie, Zecheng
    Huang, Yaoxiong
    Jin, Lianwen
    Liu, Yuliang
    Zhu, Yuanzhi
    Gao, Liangcai
    Zhang, Xiaode
    [J]. NEUROCOMPUTING, 2019, 350 : 271 - 281
  • [38] Page Segmentation of Historical Document Images with Convolutional Autoencoders
    Chen, Kai
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1011 - 1015
  • [39] Text Line Based Correction of Distorted Document Images
    Luo, Sanding
    Fang, Xiaomin
    Zhao, Cong
    Luo, Yisha
    [J]. TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11, 2011, : 1494 - 1499
  • [40] Script-Independent Text Segmentation from Document Images
    Sahare, Parul
    Tembhurne, Jitendra V.
    Parate, Mayur R.
    Diwan, Tausif
    Dhok, Sanjay B.
    [J]. International Journal of Ambient Computing and Intelligence, 2022, 13 (01)