OCR error correction of an inflectional Indian language using morphological parsing

被引:0
|
作者
Pal, U [1 ]
Kundu, PK [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700035, W Bengal, India
关键词
OCR (Optical Character Recognition); error detection; error correction; Indian language; morphological parsing; suffix; inflectional language;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper deals with an OCR (Optical Character Recognition) error detection and correction technique for a highly inflectional Indian language, Bangla, the second-most popular language in India and fifth-most popular language in the world. The technique is based on morphological parsing where using two separate lexicons of root words and suffixes, the candidate root-suffix pairs of each input string, are detected, their grammatical agreement is tested and the root/suffix part in which the error has occurred is noted. The correction is made to the corresponding error part of the input string by means of a fast dictionary access technique. To do so, the information about the error patterns generated by the OCR system are examined, and some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root and suffix are finally chosen as suggested words. In the list of suggested words generated by the system, the desired word is available in 84.22% cases.
引用
收藏
页码:903 / 922
页数:20
相关论文
共 50 条
  • [1] OCR Error Correction Using BiLSTM
    Kayabas, Ayla
    Topcu, Ahmet E.
    Kilic, Ozkan
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 2083 - 2087
  • [2] A FLUENT MORPHOLOGICAL AGRAMMATIC IN AN INFLECTIONAL LANGUAGE
    NIEMI, J
    KOIVUSELKASALLINEN, P
    SARAJARVI, L
    TUOMAINEN, J
    LAINE, M
    LAIHINEN, A
    AHONEN, A
    [J]. JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1988, 10 (01) : 27 - 27
  • [3] Thai OCR error correction using genetic algorithm
    Kruatrachue, B
    Somguntar, K
    Siriboon, K
    [J]. FIRST INTERNATIONAL SYMPOSIUM ON CYBER WORLDS, PROCEEDINGS, 2002, : 137 - 141
  • [4] Using SMT for OCR Error Correction of Historical Texts
    Afli, Haithem
    Qiu, Zhengwei
    Way, Andy
    Sheridan, Paraic
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 962 - 966
  • [5] Thai OCR error correction using token passing algorithm
    Rodphon, M
    Siriboon, K
    Kruatrachue, B
    [J]. 2001 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS I AND II, CONFERENCE PROCEEDINGS, 2001, : 599 - 602
  • [6] Statistical learning for OCR error correction
    Mei, Jie
    Islam, Aminul
    Moh'd, Abidalrahman
    Wu, Yajing
    Milios, Evangelos
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 874 - 887
  • [7] Unsupervised Multi-View Post-OCR Error Correction With Language Models
    Gupta, Harsh
    Del Corro, Luciano
    Broscheit, Samuel
    Hoffart, Johannes
    Brenner, Eliot
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8647 - 8652
  • [8] OCR Error Correction Using Character Correction and Feature-Based Word Classification
    Kissos, Ido
    Dershowitz, Nachum
    [J]. PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 198 - 203
  • [9] OCR error correction using correction patterns and self-organizing migrating algorithm
    Nguyen, Quoc-Dung
    Le, Duc-Anh
    Phan, Nguyet-Minh
    Zelinka, Ivan
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (02) : 701 - 721
  • [10] OCR error correction using correction patterns and self-organizing migrating algorithm
    Quoc-Dung Nguyen
    Duc-Anh Le
    Nguyet-Minh Phan
    Ivan Zelinka
    [J]. Pattern Analysis and Applications, 2021, 24 : 701 - 721