Large vocabulary recognition for online Turkish handwriting with sublexical units

被引:0
|
作者
Bilgin, Esma Fatima [1 ]
Yanikoglu Yesilyurt, Ayse Berrin [1 ]
机构
[1] Sabanci Univ, Fac Engn & Nat Sci, Comp Sci & Engn Program, Istanbul, Turkey
关键词
Online handwriting recognition; Turkish handwriting recognition; hidden Markov models; statistical language modeling; UNIPEN; grammatical sublexical units; delayed strokes; MARKOV-MODELS;
D O I
10.3906/elk-1801-234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for large vocabulary recognition of online Turkish handwriting, using hidden Markov models. While using a traditional approach for the recognizer, we have identified and developed solutions for the main problems specific to Turkish handwriting recognition. First, since large amounts of Turkish handwriting samples are not available, the system is trained and optimized using the large UNIPEN dataset of English handwriting, before extending it to Turkish using a small Turkish dataset. The delayed strokes, which pose a significant source of variation in writing order due to the large number of diacritical marks in Turkish, are removed during preprocessing. Finally, as a solution to the high out-of-vocabulary rates encountered when using a fixed size lexicon in general purpose recognition, a lexicon is constructed from sublexical units (stems and endings) learned from a large Turkish corpus. A statistical bigram language model learned from the same corpus is also applied during the decoding process. The system obtains a 91.7% word recognition rate when tested on a small Turkish handwritten word dataset using a medium sized (1950 words) lexicon corresponding to the vocabulary of the test set and 63.8% using a large, general purpose lexicon (130,000 words). However, with the proposed stem+ending lexicon (12,500 words) and bigram language model with lattice expansion, a 67.9% word recognition accuracy is obtained, surpassing the results obtained with the general purpose lexicon while using a much smaller one.
引用
收藏
页码:2218 / 2233
页数:16
相关论文
共 50 条
  • [31] Script Independent Online Handwriting Recognition
    Samanta, Oendrila
    Roy, Anandarup
    Bhattacharya, Ujjwal
    Parui, Swapan K.
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1251 - 1255
  • [32] A discrete HMM for online handwriting recognition
    Yasuda, H
    Takahashi, K
    Matsumoto, T
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) : 675 - 688
  • [33] Discriminant substrokes for online handwriting recognition
    Alahari, K
    Putrevu, SL
    Jawahar, CV
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 499 - 503
  • [34] Online Arabic handwriting recognition: a survey
    Tagougui, Najiba
    Kherallah, Monji
    Alimi, Adel M.
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2013, 16 (03) : 209 - 226
  • [35] Arabic Online Handwriting Recognition: A Survey
    Al-Salman, AbdulMalik
    Alyahya, Haifa
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND MACHINE LEARNING (IML'17), 2017,
  • [36] Online Bangla handwriting recognition system
    Roy, K.
    Sharma, N.
    Pal, T.
    Pal, U.
    [J]. PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 117 - +
  • [37] Preprocessing techniques for online handwriting recognition
    Huang, B. Q.
    Zhang, Y. B.
    Kechadi, M-T.
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 793 - 798
  • [38] Online Handwriting Thai Character Recognition
    Karnchanapusakij, Credit
    Suwannakat, Phattharasuda
    Rakprasertsuk, Waroonorn
    Dejdumrong, Natasha
    [J]. PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, IMAGING AND VISUALIZATION, 2009, : 323 - 328
  • [39] Online Handwriting Recognition for Malayalam Script
    Kumar, R. Ravindra
    Sulochana, K. G.
    Indhu, T. R.
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 199 - 203
  • [40] Online Arabic Handwriting Recognition Competition
    Kherallah, Monji
    Tagougui, Najiba
    Alimi, Adel M.
    El Abed, Haikal
    Maergner, Volker
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1454 - 1458