Text Line Segmentation in Historical Newspapers

被引:0
|
作者
Lenc, Ladislav [1 ,2 ]
Martinek, Jiri [1 ,2 ]
Kral, Pavel [1 ,2 ]
机构
[1] Univ West Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen, Czech Republic
[2] Univ West Bohemia, Fac Sci Appl, NTIS New Technol Informat Soc, Plzen, Czech Republic
关键词
Document image segmentation; Layout analysis; Fully convolutional network; FCN; DOCUMENT LAYOUT;
D O I
10.1007/978-3-031-23480-4_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with page segmentation into individual text lines used as an input of a line-based OCR system. This task is usually solved in one step which directly identifies text lines in whole documents. However, a direct approach may jeopardize the reading order of the lines and thus deteriorate the overall transcription result. We propose a novel approach which decomposes this problem into two steps: text-block and text-line segmentation. The particular tasks are handled by algorithms based on fully convolutional neural networks. The proposed method is evaluated on two standard corpora, Europeana and RDCL 2019, and on a novel dataset created from data available in Porta fontium portal. This dataset is freely available for research purposes.
引用
收藏
页码:35 / 48
页数:14
相关论文
共 50 条
  • [1] Text line segmentation of historical documents: a survey
    Laurence Likforman-Sulem
    Abderrazak Zahour
    Bruno Taconet
    [J]. International Journal of Document Analysis and Recognition (IJDAR), 2007, 9 : 123 - 138
  • [2] Text line segmentation of historical documents: a survey
    Likforman-Sulem, Laurence
    Zahour, Abderrazak
    Taconet, Bruno
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 123 - 138
  • [3] Text Line segmentation of historical Arabic documents
    Zahour, Abderrazak
    Likforman-Sulem, Laurence
    Boussalaa, Wafa
    Taconet, Bruno
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 138 - +
  • [4] Text Line Segmentation in Images of Handwritten Historical Documents
    Sanchez, A.
    Suarez, P. D.
    Melloz, C. A. B.
    Oliveira, A. L. I.
    Alves, V. M. O.
    [J]. 2008 FIRST INTERNATIONAL WORKSHOPS ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2008, : 232 - +
  • [5] A Robust Hybrid Approach for Text Line Segmentation in Historical Documents
    Clausner, Christian
    Antonacopoulos, Apostolos
    Pletschacher, Stefan
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 335 - 338
  • [6] An effective method for text line segmentation in historical document images
    Tien-Nam Nguyen
    Burie, Jean-Christophe
    Thi-Lan Le
    Schweyer, Anne-Valerie
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1593 - 1599
  • [7] Reducing the Human Effort in Text Line Segmentation for Historical Documents
    Granell, Emilio
    Quiros, Lorenzo
    Romero, Veronica
    Andreu Sanchez, Joan
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 523 - 537
  • [8] A Multilevel Text line Segmentation Framework for Handwritten Historical Documents
    Ben Messaoud, Ines
    Amiri, Hamid
    El Abed, Haikal
    Maergner, Volker
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 515 - 520
  • [9] Learning-Free Text Line Segmentation for Historical Handwritten Documents
    Barakat, Berat Kurar
    Cohen, Rafi
    Droby, Ahmad
    Rabaev, Irina
    El-Sana, Jihad
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (22): : 1 - 19
  • [10] Robust Text Line Segmentation for Historical Manuscript Images Using Color and Texture
    Chen, Kai
    Wei, Hao
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2978 - 2983