Document Image OCR Accuracy Prediction via Latent Dirichlet Allocation

被引:0
|
作者
Peng, Xujun [1 ]
Cao, Huaigu [1 ]
Natarajan, Prem [2 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
[2] Univ Southern Calif, ISI, Marina Del Rey, CA 90292 USA
关键词
QUALITY ASSESSMENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optical character recognition (OCR) accuracy of document images is an important factor for the success of many document processing and analysis tasks, especially for unconstraint captured document images. Although several document image OCR capability assessment methods are proposed, they mostly model the problem based on the empirically defined rules of image degradation, which cause the existing approaches infeasible for predicting the OCR scores. In this paper, a computational model is presented to automatically predict document image quality towards facilitating the OCR accuracy without references. Unlike conventional methods that use heuristically designed features, in our work the raw features are learned from training images and a generative quality model is built based on latent Dirichlet allocation, which is used to assess the document's OCR capability. We present evaluation results on a public dataset which have been captured using digital cameras with different level of blur degradation. The experimental results show that the proposed method outperforms traditional document image quality assessment approaches.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [1] Latent Dirichlet Allocation for Automatic Document Categorization
    Biro, Istvan
    Szabo, Jacint
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 430 - 441
  • [2] Improving the Latent Dirichlet Allocation Document Model With WordNet
    Isaly, Laura
    Trias, Eric
    Peterson, Gilbert
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION WARFARE AND SECURITY, 2010, : 163 - 170
  • [3] Supervised labeled latent Dirichlet allocation for document categorization
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    Lu, You
    Liu, Yanhui
    [J]. APPLIED INTELLIGENCE, 2015, 42 (03) : 581 - 593
  • [4] Supervised labeled latent Dirichlet allocation for document categorization
    Ximing Li
    Jihong Ouyang
    Xiaotang Zhou
    You Lu
    Yanhui Liu
    [J]. Applied Intelligence, 2015, 42 : 581 - 593
  • [5] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [6] Latent Dirichlet Allocation Models for Image Classification
    Rasiwasia, Nikhil
    Vasconcelos, Nuno
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2665 - 2679
  • [7] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    [J]. 2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [8] Obtaining Single Document Summaries Using Latent Dirichlet Allocation
    Nagesh, Karthik
    Murty, M. Narasimha
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 66 - 74
  • [9] Overlapped latent Dirichlet allocation for efficient image segmentation
    Young-Seob Jeong
    Ho-Jin Choi
    [J]. Soft Computing, 2015, 19 : 829 - 838
  • [10] Image tag refinement by regularized latent Dirichlet allocation
    Wang, Jingdong
    Zhou, Jiazhen
    Xu, Hao
    Mei, Tao
    Hua, Xian-Sheng
    Li, Shipeng
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 124 : 61 - 70