A two-step framework for text line segmentation in historical Arabic and Latin document images

被引:0
|
作者
Olfa Mechi
Maroua Mehri
Rolf Ingold
Najoua Essoukri Ben Amara
机构
[1] ENISo-National Engineering School of Sousse,LATIS
[2] Sousse University,Laboratory of Advanced Technology and Intelligent Systems
[3] University of Fribourg,DIVA Group
关键词
Historical documents; Text line segmentation; Pixel-wise classification; Benchmark; FCN architectures; Topological structural analysis;
D O I
暂无
中图分类号
学科分类号
摘要
One of the most important preliminary tasks in a transcription system of historical document images is text line segmentation. Nevertheless, this task remains complex due to the idiosyncrasies of ancient document images. In this article, we present a complete framework for text line segmentation in historical Arabic or Latin document images. A two-step procedure is described. First, a deep fully convolutional networks (FCN) architecture has been applied to extract the main area covering the text core. In order to select the highest performing FCN architecture, a thorough performance benchmarking of the most recent and widely used FCN architectures for segmenting text lines in historical Arabic or Latin document images has been conducted. Then, a post-processing step, which is based on topological structure analysis is introduced to extract complete text lines (including the ascender and descender components). This second step aims at refining the obtained FCN results and at providing sufficient information for text recognition. Our experiments have been carried out using a large number of Arabic and Latin document images collected from the Tunisian national archives as well as other benchmark datasets. Quantitative and qualitative assessments are reported in order to firstly pinpoint the strengths and weaknesses of the different FCN architectures and secondly to illustrate the effectiveness of the proposed post-processing method.
引用
收藏
页码:197 / 218
页数:21
相关论文
共 50 条
  • [1] A two-step framework for text line segmentation in historical Arabic and Latin document images
    Mechi, Olfa
    Mehri, Maroua
    Ingold, Rolf
    Essoukri Ben Amara, Najoua
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 197 - 218
  • [2] An effective method for text line segmentation in historical document images
    Tien-Nam Nguyen
    Burie, Jean-Christophe
    Thi-Lan Le
    Schweyer, Anne-Valerie
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1593 - 1599
  • [3] Two-Step CNN Framework for Text Line Recognition in Camera-Captured Images
    Chernyshova, Yulia S.
    Sheshkus, Alexander V.
    Arlazarov, Vladimir V.
    [J]. IEEE ACCESS, 2020, 8 : 32587 - 32600
  • [4] Text Line segmentation of historical Arabic documents
    Zahour, Abderrazak
    Likforman-Sulem, Laurence
    Boussalaa, Wafa
    Taconet, Bruno
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 138 - +
  • [5] Text segmentation in degraded historical document images
    Kavitha, A. S.
    Shivakumara, P.
    Kumar, G. H.
    Lu, Tong
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2016, 17 (02) : 189 - 197
  • [6] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [7] A Two-Step Dewarping of Camera Document Images
    Stamatopoulos, N.
    Gatos, B.
    Pratikakis, I.
    Perantonis, S. J.
    [J]. PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 209 - 216
  • [8] Text Line Segmentation in Images of Handwritten Historical Documents
    Sanchez, A.
    Suarez, P. D.
    Melloz, C. A. B.
    Oliveira, A. L. I.
    Alves, V. M. O.
    [J]. 2008 FIRST INTERNATIONAL WORKSHOPS ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2008, : 232 - +
  • [9] DENSE PREDICTION FOR TEXT LINE SEGMENTATION IN HANDWRITTEN DOCUMENT IMAGES
    Quang Nhat Vo
    Lee, GueeSang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3264 - 3268
  • [10] A Multilevel Text line Segmentation Framework for Handwritten Historical Documents
    Ben Messaoud, Ines
    Amiri, Hamid
    El Abed, Haikal
    Maergner, Volker
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 515 - 520