Script-independent text line segmentation in freestyle handwritten documents

被引:120
|
作者
Li, Yi [1 ]
Zheng, Yefeng [2 ]
Doermann, David
Jaeger, Stefan [1 ,3 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Language & Media Proc Lab, College Pk, MD 20742 USA
[2] Siemens Corp Res, Princeton, NJ 08540 USA
[3] Partner Inst Computat Biol, Grp Syst Bioinformat, CAS MPG, Shanghai 200031, Peoples R China
关键词
handwritten text line segmentation; document image analysis; density estimation; level set methods;
D O I
10.1109/TPAMI.2007.70792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine-printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map where each element represents the probability of the underlying pixel belonging to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component-based methods ([1] and [2], for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts such as Arabic, Chinese, Korean, and Hindi demonstrate that our algorithm consistently outperforms previous methods [1], [2], [3]. Further experiments show that the proposed algorithm is robust to scale change, rotation, and noise.
引用
收藏
页码:1313 / 1329
页数:17
相关论文
共 50 条
  • [31] Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
    Ryu, Jewoong
    Koo, Hyung Il
    Cho, Nam Ik
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1115 - 1119
  • [32] Segmentation of Historical Handwritten Documents into Text Zones and Text Lines
    Gatos, Basilis
    Louloudis, Georgios
    Stamatopoulos, Nikolaos
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 464 - 469
  • [33] Statistical script independent word spotting in offline handwritten documents
    Wshah, Safwan
    Kumar, Gaurav
    Govindaraju, Venu
    PATTERN RECOGNITION, 2014, 47 (03) : 1039 - 1050
  • [34] Statistical Text Line Analysis in Handwritten Documents
    Bosch, Vicente
    Hector Toselli, Alejandro
    Vidal, Enrique
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 201 - 206
  • [35] Text Line Extraction in Handwritten Historical Documents
    Capobianco, Samuele
    Marinai, Simone
    DIGITAL LIBRARIES AND ARCHIVES, IRCDL 2017, 2017, 733 : 68 - 79
  • [36] Line and word Segmentation of Kannada Handwritten Text documents using Projection Profile Technique
    Banumathi, K. L.
    Chandra, Jagadeesh A. P.
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2016, : 196 - 201
  • [37] A* Path Planning for Line Segmentation of Handwritten Documents
    Surinta, Olarik
    Holtkamp, Michiel
    Karabaa, Fait
    van Oosten, Jean-Paul
    Schomaker, Lambert
    Wiering, Marco
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 175 - 180
  • [38] A Hybrid Approach for Line Segmentation in Handwritten Documents
    Adiguzel, Hande
    Sahin, Emre
    Duygulu, Pinar
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 503 - 508
  • [39] A Statistical approach to line segmentation in handwritten documents
    Arivazhagan, Manivannan
    Srinivasan, Harish
    Srihari, Sargur
    DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [40] A script-independent methodology for optical character recognition
    Makhoul, J
    Schwartz, R
    Lapre, C
    Bazzi, I
    PATTERN RECOGNITION, 1998, 31 (09) : 1285 - 1294