A generative probabilistic OCR model for NLP applications

被引:0
|
作者
Kolak, O [1 ]
Byrne, W [1 ]
Resnik, P [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks. We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.
引用
收藏
页码:134 / 141
页数:8
相关论文
共 50 条
  • [31] Probabilistic generative modelling
    Larsen, R
    Hilger, KB
    IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 861 - 868
  • [32] Probabilistic Generative Model for Hyperspectral Unmixing Accounting for Endmember Variability
    Shi, Shuaikai
    Zhao, Min
    Zhang, Lijun
    Altmann, Yoann
    Chen, Jie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [33] Understanding Evolution of Research Themes: a Probabilistic Generative Model for Citations
    Wang, Xiaolong
    Zhai, Chengxiang
    Roth, Dan
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 1115 - 1123
  • [34] Learning the progression patterns of treatments using a probabilistic generative model
    Zaballa, Onintze
    Perez, Aritz
    Gomez Inhiesto, Elisa
    Ayesta, Teresa Acaiturri
    Lozano, Jose A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 137
  • [35] Community detection in weighted networks using probabilistic generative model
    Hossein Hajibabaei
    Vahid Seydi
    Abbas Koochari
    Journal of Intelligent Information Systems, 2023, 60 : 119 - 136
  • [36] Learning Trajectories as Words: A Probabilistic Generative Model for Destination Prediction
    Lu, Yuhuan
    He, Zhaocheng
    Luo, Liangkui
    PROCEEDINGS OF THE 16TH EAI INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES (MOBIQUITOUS'19), 2019, : 464 - 472
  • [37] Community detection in weighted networks using probabilistic generative model
    Hajibabaei, Hossein
    Seydi, Vahid
    Koochari, Abbas
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (01) : 119 - 136
  • [38] Beyond rotamers: a generative, probabilistic model of side chains in proteins
    Harder, Tim
    Boomsma, Wouter
    Paluszewski, Martin
    Frellsen, Jes
    Johansson, Kristoffer E.
    Hamelryck, Thomas
    BMC BIOINFORMATICS, 2010, 11
  • [39] Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model
    Mukherjee, Debayan
    Bandyopadhyay, Atreya
    Datta, Soham
    Bhattacharya, Indrajit
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 127 - 135
  • [40] Beyond rotamers: a generative, probabilistic model of side chains in proteins
    Tim Harder
    Wouter Boomsma
    Martin Paluszewski
    Jes Frellsen
    Kristoffer E Johansson
    Thomas Hamelryck
    BMC Bioinformatics, 11