A generative probabilistic OCR model for NLP applications

被引:0
|
作者
Kolak, O [1 ]
Byrne, W [1 ]
Resnik, P [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
来源
HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE | 2003年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.
引用
收藏
页码:134 / 141
页数:8
相关论文
共 50 条
  • [21] A Generative Probabilistic Oriented Wavelet Model for Texture Segmentation
    Inna Stainvas
    David Lowe
    Neural Processing Letters, 2003, 17 : 217 - 238
  • [22] A segment-based probabilistic generative model of speech
    Achan, K
    Roweis, S
    Hertzmann, A
    Frey, B
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 221 - 224
  • [23] A Baseline Generative Probabilistic Model for Weakly Supervised Learning
    Papadopoulos, Georgios
    Silavong, Fran
    Moran, Sean
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VI, 2023, 14174 : 36 - 50
  • [24] Hippocampal formation-inspired probabilistic generative model
    Taniguchi, Akira
    Fukawa, Ayako
    Yamakawa, Hiroshi
    NEURAL NETWORKS, 2022, 151 : 317 - 335
  • [25] Automatic Image tagging via a generative probabilistic model
    Liu, Zheng
    FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE, PTS 1-4, 2011, 44-47 : 3443 - 3447
  • [26] Information retrieval for OCR documents: A content-based probabilistic correction model
    Jin, R
    Zhai, CX
    Hauptmann, AG
    DOCUMENT RECOGNITION AND RETRIEVAL X, 2003, 5010 : 128 - 135
  • [27] A Generative Model for Dynamic Networks with Applications
    Gupta, Shubham
    Sharma, Gaurav
    Dukkipati, Ambedkar
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7842 - 7849
  • [28] A Probabilistic Generative Model for Fault Analysis of a Transmission Line With SFCL
    Fahim, Shahriar Rahman
    Sarker, Subrata K.
    Das, Sajal K.
    Islam, Md. Rabiul
    Kouzani, Abbas Z.
    Mahmud, M. A. Parvez
    IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2021, 31 (08)
  • [29] Integrated Probabilistic Generative Model for Detecting Smoke on Visual Images
    Vidal-Calleja, Teresa A.
    Agammenoni, Gabriel
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 2183 - 2188
  • [30] A Probabilistic Generative Model for Typographical Analysis of Early Modern Printing
    Goyal, Kartik
    Dyer, Chris
    Warren, Christopher
    G'Sell, Max
    Berg-Kirkpatrick, Taylor
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2954 - 2960