A JAPANESE OCR POST-PROCESSING APPROACH BASED ON DICTIONARY MATCHING

被引:0
|
作者
Guo, Chu-Yu [1 ]
Tang, Yuan-Yan [1 ]
Liu, Chang-Song [2 ]
Duan, Jia [1 ]
机构
[1] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, State Key Lab Intelligent Technol & Syst, Beijing, Peoples R China
关键词
OCR; Dictionary Matching; Bayesian Theory; Limited Length Segmentation; Japanese Character;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a post-processing approach for Japanese character recognition based on dictionary. By the analysis of experimental data in the processing of OCR, we find that some segmentation and recognition results do not conform to the rules of lexical and just generate the character based on the shape. If the fonts of pending recognized characters are similar with the others, it will easily lead to going wrong in the processing of OCR. For these errors we put forward an idea based on the Limited Length Segmentation Matching and the Bayesian Statistical Classifier. Through the above method, most of the font recognized mistakes can be solved. By the experimental results, it can be proved that this method is an effective way to improve the recognized rate of Japanese character.
引用
收藏
页码:22 / 26
页数:5
相关论文
共 50 条
  • [1] An OCR post-processing approach based on multi-knowledge
    Zhuang, L
    Zhu, XY
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2005, 3681 : 346 - 352
  • [2] CONFUSION NETWORK BASED VIDEO OCR POST-PROCESSING APPROACH
    Liu, Anan
    Fei, Jinghao
    Fan, Jianping
    Pang, Lin
    Zhang, Yongdong
    Li, Jintao
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 137 - +
  • [3] A rule-based post-processing approach to improve Persian OCR performance
    Khosrobeigi, Z.
    Veisi, H.
    Ahmadi, H. R.
    Shabanian, H.
    [J]. SCIENTIA IRANICA, 2020, 27 (06) : 3019 - 3033
  • [4] A rule-based post-processing approach to improve Persian OCR performance
    Khosrobeigi Z.
    Veisi H.
    Ahmadi H.R.
    Shabanian H.
    [J]. Scientia Iranica, 2020, 27 (6 D) : 3019 - 3033
  • [5] OCR POST PROCESSING BASED ON CHARACTER PATTERN MATCHING
    Boiangiu, Anton
    Cananau, Dan-Cristian
    Petrescu, Serban
    Moldoveanu, Alin
    [J]. ANNALS OF DAAAM FOR 2009 & PROCEEDINGS OF THE 20TH INTERNATIONAL DAAAM SYMPOSIUM, 2009, 20 : 323 - 324
  • [6] MiBio: A dataset for OCR post-processing evaluation
    Mei, Jie
    Islam, Aminul
    Moh'd, Abidalrahman
    Wu, Yajing
    Milios, Evangelos E.
    [J]. DATA IN BRIEF, 2018, 21 : 251 - 255
  • [7] The Journal Fjolnir for Everyone: The Post-Processing of Historical OCR Texts
    Daoason, Jon Friorik
    Bjarnadottir, Kristin
    Runarsson, Kristjan
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [8] Research on Improved TBL Based Japanese NER Post-Processing
    Wang Jing
    Zheng Dequan
    Zhao Tiejun
    [J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 145 - 149
  • [9] Research on OCR Post-processing Applications for Handwritten Recognition Based on Analysis of Scientific Materials
    Hu, Zhijuan
    Lin, Jie
    Wu, Lu
    [J]. ADVANCES IN COMPUTER SCIENCE, INTELLIGENT SYSTEM AND ENVIRONMENT, VOL 1, 2011, 104 : 131 - +
  • [10] Japanese NER Post-Processing Based on Improved TBL Method
    Zheng, Dequan
    Wang, Jing
    Zhao, Tiejun
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 161 - 165