Increasing robustness of handwriting recognition using character n-gram decoding on large lexica

被引:1
|
作者
Schall, Martin [1 ]
Schambach, Marc-Peter [2 ]
Franz, Matthias O. [1 ]
机构
[1] Univ Appl Sci, Inst Opt Syst, Constance, Germany
[2] Siemens Postal Parcel & Airport Logist GmbH, Constance, Germany
来源
PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016) | 2016年
关键词
offline handwriting recognition; recurrent neural network; long-short-term-memory; connectionist temporal classification; n-gram index; lexicon based decoding;
D O I
10.1109/DAS.2016.43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.
引用
收藏
页码:156 / 161
页数:6
相关论文
共 50 条
  • [1] Handwriting recognition using position sensitive letter N-gram matching
    El-Nasan, A
    Veeramachaneni, S
    Nagy, G
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 577 - 582
  • [2] N-gram and N-class models for on line handwriting recognition
    Perraud, F
    Viard-Gaudin, C
    Morin, E
    Lallican, PM
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 1053 - 1057
  • [3] Generating Synthetic Handwriting using n-gram letter glyphs
    Dey, Arka Ujjal
    Harit, Gaurav
    TENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2016), 2016,
  • [4] Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions
    Siagian, Al Hafiz Akbar Maulana
    Aritsugi, Masayoshi
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (01):
  • [5] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [6] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
  • [7] Detecting Spam Tweets using Character N-gram Features
    Ashour, Mokhtar
    Salama, Cherif
    El-Kharashi, M. Watheq
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 190 - 195
  • [8] Robust n-gram model of Japanese character and its application to document recognition
    Mori, H
    Aso, H
    Makino, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 471 - 476
  • [9] Character n-Gram Spotting in Document Images
    Praveen, Sudha M.
    Sankar, Pramod K.
    Jawahar, C. V.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 941 - 945
  • [10] Behavior Extraction from Tweets using Character N-gram Models
    Yano, Yuji
    Hashiyama, Tomonori
    Ichino, Junko
    Tano, Shun'ichi
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1273 - 1280