Script-independent text line segmentation in freestyle handwritten documents

被引:120
|
作者
Li, Yi [1 ]
Zheng, Yefeng [2 ]
Doermann, David
Jaeger, Stefan [1 ,3 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Language & Media Proc Lab, College Pk, MD 20742 USA
[2] Siemens Corp Res, Princeton, NJ 08540 USA
[3] Partner Inst Computat Biol, Grp Syst Bioinformat, CAS MPG, Shanghai 200031, Peoples R China
关键词
handwritten text line segmentation; document image analysis; density estimation; level set methods;
D O I
10.1109/TPAMI.2007.70792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine-printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map where each element represents the probability of the underlying pixel belonging to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component-based methods ([1] and [2], for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts such as Arabic, Chinese, Korean, and Hindi demonstrate that our algorithm consistently outperforms previous methods [1], [2], [3]. Further experiments show that the proposed algorithm is robust to scale change, rotation, and noise.
引用
收藏
页码:1313 / 1329
页数:17
相关论文
共 50 条
  • [1] A hybrid text line segmentation approach for the ancient handwritten unconstrained freestyle Modi script documents
    Deshmukh, Manisha S.
    Patil, Manoj P.
    Kolhe, Satish R.
    IMAGING SCIENCE JOURNAL, 2018, 66 (07): : 433 - 442
  • [2] Adaptive Script-Independent Text Line Extraction
    Ziaratban, Majid
    Faez, Karim
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (04): : 866 - 877
  • [3] Script-Independent Text Segmentation from Document Images
    Sahare P.
    Tembhurne J.V.
    Parate M.R.
    Diwan T.
    Dhok S.B.
    International Journal of Ambient Computing and Intelligence, 2022, 13 (01)
  • [4] A Multi-scale Text Line Segmentation Method in Freestyle Handwritten Documents
    Gao, Yangdong
    Ding, Xiaoqing
    Liu, Changsong
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 643 - 647
  • [5] A generalized line segmentation method for multi-script handwritten text documents
    Rakshit, Payel
    Halder, Chayan
    Md Obaidullah, Sk
    Roy, Kaushik
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [6] Script-independent, HMM-based text line finding for OCR
    Lu, ZD
    Schwartz, R
    Raphael, C
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 551 - 554
  • [7] Text line and word segmentation of handwritten documents
    Louloudis, G.
    Gatos, B.
    Pratikakis, I.
    Halatsis, C.
    PATTERN RECOGNITION, 2009, 42 (12) : 3169 - 3183
  • [8] A Tracking Approach for Text Line Segmentation in Handwritten Documents
    Setitra, Insaf
    Hadjadj, Zineb
    Meziane, Abdelkrim
    ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 193 - 198
  • [9] Text Line Segmentation in Images of Handwritten Historical Documents
    Sanchez, A.
    Suarez, P. D.
    Melloz, C. A. B.
    Oliveira, A. L. I.
    Alves, V. M. O.
    2008 FIRST INTERNATIONAL WORKSHOPS ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2008, : 232 - +
  • [10] A Novel Technique for Line Segmentation in Offline Handwritten Gurmukhi Script Documents
    Kumar, Munish
    Jindal, M. K.
    Sharma, R. K.
    NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2017, 40 (04): : 273 - 277