Script-independent text line segmentation in freestyle handwritten documents

被引:120
|
作者
Li, Yi [1 ]
Zheng, Yefeng [2 ]
Doermann, David
Jaeger, Stefan [1 ,3 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Language & Media Proc Lab, College Pk, MD 20742 USA
[2] Siemens Corp Res, Princeton, NJ 08540 USA
[3] Partner Inst Computat Biol, Grp Syst Bioinformat, CAS MPG, Shanghai 200031, Peoples R China
关键词
handwritten text line segmentation; document image analysis; density estimation; level set methods;
D O I
10.1109/TPAMI.2007.70792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine-printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map where each element represents the probability of the underlying pixel belonging to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component-based methods ([1] and [2], for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts such as Arabic, Chinese, Korean, and Hindi demonstrate that our algorithm consistently outperforms previous methods [1], [2], [3]. Further experiments show that the proposed algorithm is robust to scale change, rotation, and noise.
引用
收藏
页码:1313 / 1329
页数:17
相关论文
共 50 条
  • [21] Text line detection in handwritten documents
    Louloudis, G.
    Gatos, B.
    Pratikakis, I.
    Halatsis, C.
    PATTERN RECOGNITION, 2008, 41 (12) : 3758 - 3772
  • [22] LINE SEGMENTATION OF HANDWRITTEN KANNADA DOCUMENTS
    Swetha, S.
    Chinmayi, P. S.
    Mamatha, H. R.
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [23] On-line handwritten documents segmentation
    Blanchard, J
    Artières, T
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 148 - 153
  • [24] Robust line segmentation for handwritten documents
    Kuzhinjedathu, Kamal
    Srinivasan, Harish
    Srihari, Sargur
    DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [25] Text line segmentation in handwritten documents using Mumford-Shah model
    Du, Xiaojun
    Pan, Wumo
    Bui, Tien D.
    PATTERN RECOGNITION, 2009, 42 (12) : 3136 - 3145
  • [26] GAN-based text line segmentation method for challenging handwritten documents
    Ozseker, Ibrahim
    Demir, Ali Alper
    Ozkaya, Ufuk
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024,
  • [27] Text Line Segmentation in Handwritten Documents Based on Connected Components Trajectory Generation
    Setitra, Insaf
    Meziane, Abdelkrim
    Hadjadj, Zineb
    Bengherbia, Nawfel
    PATTERN RECOGNITION APPLICATIONS AND METHODS, 2018, 10857 : 222 - 234
  • [28] Entropy-Based Approach for Enabling Text Line Segmentation in Handwritten Documents
    Sindhushree, G. S.
    Amarnath, R.
    Nagabhushan, P.
    DATA ANALYTICS AND LEARNING, 2019, 43 : 169 - 184
  • [29] Script Independent Feature Set for Handwritten Text Recognition
    Khanduja, Deepti
    Nain, Neeta
    2014 37TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2014, : 1147 - 1152
  • [30] Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents
    Zinjore, Ranjana S.
    Ramteke, R. J.
    Pathak, Varsha M.
    PROCEEDINGS OF THE 9TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2017), 2017, : 29 - 32