Large vocabulary recognition for online Turkish handwriting with sublexical units

被引:0
|
作者
Bilgin, Esma Fatima [1 ]
Yanikoglu Yesilyurt, Ayse Berrin [1 ]
机构
[1] Sabanci Univ, Fac Engn & Nat Sci, Comp Sci & Engn Program, Istanbul, Turkey
关键词
Online handwriting recognition; Turkish handwriting recognition; hidden Markov models; statistical language modeling; UNIPEN; grammatical sublexical units; delayed strokes; MARKOV-MODELS;
D O I
10.3906/elk-1801-234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for large vocabulary recognition of online Turkish handwriting, using hidden Markov models. While using a traditional approach for the recognizer, we have identified and developed solutions for the main problems specific to Turkish handwriting recognition. First, since large amounts of Turkish handwriting samples are not available, the system is trained and optimized using the large UNIPEN dataset of English handwriting, before extending it to Turkish using a small Turkish dataset. The delayed strokes, which pose a significant source of variation in writing order due to the large number of diacritical marks in Turkish, are removed during preprocessing. Finally, as a solution to the high out-of-vocabulary rates encountered when using a fixed size lexicon in general purpose recognition, a lexicon is constructed from sublexical units (stems and endings) learned from a large Turkish corpus. A statistical bigram language model learned from the same corpus is also applied during the decoding process. The system obtains a 91.7% word recognition rate when tested on a small Turkish handwritten word dataset using a medium sized (1950 words) lexicon corresponding to the vocabulary of the test set and 63.8% using a large, general purpose lexicon (130,000 words). However, with the proposed stem+ending lexicon (12,500 words) and bigram language model with lattice expansion, a 67.9% word recognition accuracy is obtained, surpassing the results obtained with the general purpose lexicon while using a much smaller one.
引用
收藏
页码:2218 / 2233
页数:16
相关论文
共 50 条
  • [1] A large vocabulary system for Arabic online handwriting recognition
    Ibrahim Abdelaziz
    Sherif Abdou
    Hassanin Al-Barhamtoshy
    [J]. Pattern Analysis and Applications, 2016, 19 : 1129 - 1141
  • [2] A large vocabulary system for Arabic online handwriting recognition
    Abdelaziz, Ibrahim
    Abdou, Sherif
    Al-Barhamtoshy, Hassanin
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (04) : 1129 - 1141
  • [3] An Online handwriting recognition system for Turkish
    Vural, E
    Erdogan, H
    Oflazer, K
    Yanikoglu, B
    [J]. Document Recognition and Retrieval XII, 2005, 5676 : 56 - 65
  • [4] An online handwriting recognition system for Turkish
    Vural, E
    Erdogan, H
    Oflazer, K
    Yanikoglu, B
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 607 - 610
  • [5] Large Vocabulary Hybrid DNN/HMM Arabic Online Handwriting Recognition System
    Khaled, Omar
    Fahmy, Aly
    Abdou, Sherif
    [J]. PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 876 - 881
  • [6] The RWTH Large Vocabulary Arabic Handwriting Recognition System
    Hamdani, Mahdi
    Doetsch, Patrick
    Kozielski, Michal
    Mousa, Amr El-Desoky
    Ney, Hermann
    [J]. 2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 111 - 115
  • [7] Large vocabulary off-line handwriting recognition: A survey
    A. L. Koerich
    R. Sabourin
    C. Y. Suen
    [J]. Pattern Analysis & Applications, 2003, 6 : 97 - 121
  • [8] Large vocabulary off-line handwriting recognition: A survey
    Koerich, AL
    Sabourin, R
    Suen, CY
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (02) : 97 - 121
  • [9] A new hybrid approach to large vocabulary cursive handwriting recognition
    Rigoll, G
    Kosmala, A
    Willett, D
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1512 - 1514
  • [10] An investigation of the use of trigraphs for large vocabulary cursive handwriting recognition
    Kosmala, A
    Rottland, J
    Rigoll, G
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 3373 - 3376