Text line and word segmentation of handwritten documents

被引:128
|
作者
Louloudis, G. [1 ]
Gatos, B. [2 ]
Pratikakis, I. [2 ]
Halatsis, C. [1 ]
机构
[1] Univ Athens, Dept Informat & Telecommun, GR-10679 Athens, Greece
[2] Natl Ctr Sci Res Demokritos, Inst Informat & Telecommun, Computat Intelligence Lab, Athens 15310, Greece
关键词
Handwritten document image analysis; Hough transform; Text line segmentation; Word segmentation; Gaussian mixture modeling; EXTRACTION; RECOGNITION;
D O I
10.1016/j.patcog.2008.12.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and finally the efficient separation of vertically connected characters using a novel method based on skeletonization. Word segmentation is addressed as a two class problem, The distances between adjacent overlapped components in a text line are calculated using the combination of two distance metrics and each of them is categorized either as an inter- or an intra-word distance in a Gaussian mixture modeling framework. The performance of the proposed methodology is based on a consistent and concrete evaluation methodology that uses suitable performance measures in order to compare the text line segmentation and word segmentation results against the corresponding ground truth annotation. The efficiency of the proposed methodology is demonstrated by experimentation conducted on two different datasets: (a) on the test set of the ICDAR2007 handwriting segmentation competition and (b) on a set of historical handwritten documents. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3169 / 3183
页数:15
相关论文
共 50 条
  • [1] Robust text-line and word segmentation for handwritten documents images
    Stafylakis, Themos
    Papavassiliou, Vassilis
    Katsouros, Vassilis
    Carayannis, George
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3393 - 3396
  • [2] Line and word Segmentation of Kannada Handwritten Text documents using Projection Profile Technique
    Banumathi, K. L.
    Chandra, Jagadeesh A. P.
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2016, : 196 - 201
  • [3] Word segmentation of off-line handwritten documents
    Huang, Chen
    Srihari, Sargur N.
    DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [4] A Tracking Approach for Text Line Segmentation in Handwritten Documents
    Setitra, Insaf
    Hadjadj, Zineb
    Meziane, Abdelkrim
    ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 193 - 198
  • [5] Text Line Segmentation in Images of Handwritten Historical Documents
    Sanchez, A.
    Suarez, P. D.
    Melloz, C. A. B.
    Oliveira, A. L. I.
    Alves, V. M. O.
    2008 FIRST INTERNATIONAL WORKSHOPS ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2008, : 232 - +
  • [6] A Multilevel Text line Segmentation Framework for Handwritten Historical Documents
    Ben Messaoud, Ines
    Amiri, Hamid
    El Abed, Haikal
    Maergner, Volker
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 515 - 520
  • [7] Handwritten Documents Text Line Segmentation based on Information Energy
    Boiangiu, C. A.
    Tanase, M. C.
    Ioanitescu, R.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (01) : 8 - 15
  • [8] Word Segmentation for Gujarati Handwritten Documents
    Bhatia, Divya
    Goswami, Mukesh M.
    Mitra, Suman
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 569 - 580
  • [9] Text Line Segmentation for Handwritten Documents Using Constrained Seam Carving
    Zhang, Xi
    Tan, Chew Lim
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 98 - 103
  • [10] Script-independent text line segmentation in freestyle handwritten documents
    Li, Yi
    Zheng, Yefeng
    Doermann, David
    Jaeger, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1313 - 1329