Text segmentation in degraded historical document images

被引:12
|
作者
Kavitha, A. S. [1 ]
Shivakumara, P. [2 ]
Kumar, G. H. [1 ]
Lu, Tong [3 ]
机构
[1] Univ Mysore, Dept Studies Comp Sci, Mysore 570005, Karnataka, India
[2] Univ Malaya, Fac Comp Sci & Informat Technol, B-2-18, Kuala Lumpur, Malaysia
[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
关键词
Text enhancement; Sobel and Laplacian operations; Indus document; Clustering; Text line segmentation;
D O I
10.1016/j.eij.2015.11.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR) to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision. (C) 2015 Production and hosting by Elsevier B.V. on behalf of Faculty of Computers and Information, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:189 / 197
页数:9
相关论文
共 50 条
  • [1] An effective method for text line segmentation in historical document images
    Tien-Nam Nguyen
    Burie, Jean-Christophe
    Thi-Lan Le
    Schweyer, Anne-Valerie
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1593 - 1599
  • [2] Efficient Binarization of Historical and Degraded Document Images
    Gatos, B.
    Pratikakis, I.
    Perantonis, S. J.
    [J]. PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 447 - 454
  • [3] Textline detection in degraded historical document images
    Byeongyong Ahn
    Jewoong Ryu
    Hyung Il Koo
    Nam Ik Cho
    [J]. EURASIP Journal on Image and Video Processing, 2017
  • [4] Textline detection in degraded historical document images
    Ahn, Byeongyong
    Ryu, Jewoong
    Koo, Hyung Il
    Cho, Nam Ik
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
  • [5] An enhanced binarization framework for degraded historical document images
    Wei Xiong
    Lei Zhou
    Ling Yue
    Lirong Li
    Song Wang
    [J]. EURASIP Journal on Image and Video Processing, 2021
  • [6] An enhanced binarization framework for degraded historical document images
    Xiong, Wei
    Zhou, Lei
    Yue, Ling
    Li, Lirong
    Wang, Song
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2021, 2021 (01)
  • [7] Decompose algorithm for thresholding degraded historical document images
    Chen, Y
    Leedham, G
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (06): : 702 - 714
  • [8] Segmentation of text and graphics from document images
    Chowdhury, S. P.
    Mandal, S.
    Das, A. K.
    Chanda, Bhabatosh
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 619 - +
  • [9] Text line extraction for historical document images
    Saabni, Raid
    Asi, Abedelkadir
    El-Sana, Jihad
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 23 - 33
  • [10] VESSELNESS FOR TEXT DETECTION IN HISTORICAL DOCUMENT IMAGES
    Hofmann, Simon
    Gropp, Martin
    Bernecker, David
    Pollin, Christopher
    Maier, Andreas
    Christlein, Vincent
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3259 - 3263