Increasing robustness of handwriting recognition using character n-gram decoding on large lexica

被引：1

作者：

Schall, Martin ^{[1
]}

Schambach, Marc-Peter ^{[2
]}

Franz, Matthias O. ^{[1
]}

机构：

[1] Univ Appl Sci, Inst Opt Syst, Constance, Germany

[2] Siemens Postal Parcel & Airport Logist GmbH, Constance, Germany

来源：

PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016) | 2016年

关键词：

offline handwriting recognition; recurrent neural network; long-short-term-memory; connectionist temporal classification; n-gram index; lexicon based decoding;

D O I：

10.1109/DAS.2016.43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.

引用

页码：156 / 161

页数：6

共 50 条

[1] Handwriting recognition using position sensitive letter N-gram matching
El-Nasan, A
Veeramachaneni, S
Nagy, G
SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 577 - 582
[2] N-gram and N-class models for on line handwriting recognition
Perraud, F
Viard-Gaudin, C
Morin, E
Lallican, PM
SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 1053 - 1057
[3] Generating Synthetic Handwriting using n-gram letter glyphs
Dey, Arka Ujjal
Harit, Gaurav
TENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2016), 2016,
[4] Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions
Siagian, Al Hafiz Akbar Maulana
Aritsugi, Masayoshi
ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (01):
[5] Chinese Text Categorization Using the Character N-gram
Suzuki, Makoto
Yamagishi, Naohide
Tsai, Yi-Ching
2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
[6] Multilingual Text Categorization Using Character N-gram
Suzuki, Makoto
Yamagishi, Naohide
Tsai, Yi-Ching
Hirasawa, Shigeichi
2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
[7] Detecting Spam Tweets using Character N-gram Features
Ashour, Mokhtar
Salama, Cherif
El-Kharashi, M. Watheq
PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 190 - 195
[8] Robust n-gram model of Japanese character and its application to document recognition
Mori, H
Aso, H
Makino, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 471 - 476
[9] Character n-Gram Spotting in Document Images
Praveen, Sudha M.
Sankar, Pramod K.
Jawahar, C. V.
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 941 - 945
[10] Behavior Extraction from Tweets using Character N-gram Models
Yano, Yuji
Hashiyama, Tomonori
Ichino, Junko
Tano, Shun'ichi
2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1273 - 1280

← 1 2 3 4 5 →