Text identification in noisy document images using Markov random field

被引:0
|
作者
Zheng, YF [1 ]
Li, HP [1 ]
Doermann, D [1 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Lab Language & Media Proc, College Pk, MD 20742 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.
引用
收藏
页码:599 / 603
页数:5
相关论文
共 50 条
  • [1] ON MARKOV RANDOM FIELD MODELS FOR SEGMENTATION OF NOISY IMAGES
    Kuang Jinyu Zhu Junxiu (Department of Radio-Electronics
    Journal of Electronics(China), 1996, (01) : 31 - 39
  • [2] Machine printed text and handwriting identification in noisy document images
    Zheng, YF
    Li, HP
    Doermann, D
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (03) : 337 - 353
  • [3] Document ranking refinement using a Markov random field model
    Villatoro, Esau
    Juarez, Antonio
    Montes, Manuel
    Villasenor, Luis
    Sucar, L. Enrioue
    NATURAL LANGUAGE ENGINEERING, 2012, 18 : 155 - 185
  • [4] The segmentation and identification of handwriting in noisy document images
    Zheng, YF
    Li, HP
    Doermann, D
    DOCUMENT ANALYSIS SYSTEM V, PROCEEDINGS, 2002, 2423 : 95 - 105
  • [6] UNSUPERVISED SEGMENTATION OF NOISY AND TEXTURED IMAGES USING MARKOV RANDOM-FIELDS
    WON, CS
    DERIN, H
    CVGIP-GRAPHICAL MODELS AND IMAGE PROCESSING, 1992, 54 (04): : 308 - 328
  • [7] Script and language identification in noisy and degraded document images
    Lu Shijian
    Tan, Chew Lim
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) : 14 - 24
  • [8] Noisy multimodal brain image registration using markov random field model
    Samant, Sunita
    Nanda, Pradipta Kumar
    Ghosh, Ashish
    Panda, Adya Kinkar
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 73
  • [9] Binarization of low quality text using a Markov random field model
    Wolf, C
    Doermann, D
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 160 - 163
  • [10] Binarization of Degraded Document Image Using Gaussian Markov Random Field Model
    Lu, Shujing
    Lu, Yue
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 272 - 276