A generative probabilistic OCR model for NLP applications

被引:0
|
作者
Kolak, O [1 ]
Byrne, W [1 ]
Resnik, P [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.
引用
收藏
页码:134 / 141
页数:8
相关论文
共 50 条
  • [41] Probabilistic Management of OCR Data using an RDBMS
    Kumar, Arun
    Re, Christopher
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04): : 322 - 333
  • [42] Leveraging Transformer-Based OCR Model with Generative Data Augmentation for Engineering Document Recognition
    Khallouli, Wael
    Uddin, Mohammad Shahab
    Sousa-Poza, Andres
    Li, Jiang
    Kovacic, Samuel
    ELECTRONICS, 2025, 14 (01):
  • [43] New applications of NLP
    Damnati, Geraldine
    Inkpen, Diana
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2021, 62 (02): : 7 - 12
  • [44] Machine Learning Model Interpretability in NLP and Computer Vision Applications
    Chakrabarty, Navoneel
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 255 - 267
  • [45] Coreference for NLP applications
    Morton, TS
    38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2000, : 173 - 180
  • [46] Semantic Representations for NLP Using VerbNet and the Generative Lexicon
    Brown, Susan Windisch
    Bonn, Julia
    Kazeminejad, Ghazaleh
    Zaenen, Annie
    Pustejovsky, James
    Palmer, Martha
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [47] A probabilistic model for fire detection with applications
    Joglar, F
    Mowrer, F
    Modarres, M
    FIRE TECHNOLOGY, 2005, 41 (03) : 151 - 172
  • [49] A Probabilistic Model for Fire Detection with Applications
    Francisco Joglar
    Frederick Mowrer
    Mohammad Modarres
    Fire Technology, 2005, 41 : 151 - 172
  • [50] Semi-supervised generative model with applications
    An, Dezhi
    Lu, Jun
    Wu, Guangli
    Zheng, Shengcai
    Li, Yan
    Journal of Computational Information Systems, 2015, 11 (05): : 1809 - 1816