Handwritten word recognition using Web resources and recurrent neural networks

被引:6
|
作者
Oprean, Cristina [1 ]
Likforman-Sulem, Laurence [1 ]
Popescu, Adrian [2 ]
Mokbel, Chafic [3 ]
机构
[1] Telecom Paristech, Paris, France
[2] CEA List, Paris, France
[3] Univ Balamand, Al Koura, Lebanon
关键词
Handwritten word recognition; Out-Of-Vocabulary word; Web resources; Dynamic dictionary; Recurrent neural networks;
D O I
10.1007/s10032-015-0251-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handwriting recognition systems usually rely on static dictionaries and language models. Full coverage of these dictionaries is generally not achieved when dealing with unrestricted document corpora due to the presence of Out-Of-Vocabulary (OOV) words. We propose an approach which uses the World Wide Web as a corpus to improve dictionary coverage. We exploit the very large and freely available Wikipedia corpus in order to obtain dynamic dictionaries on the fly. We rely on recurrent neural network (RNN) recognizers, with and without linguistic resources, to detect words that are non-reliably recognized within a word sequence. Such words are labeled as non-anchor words (NAWs) and include OOVs and In-Vocabulary words recognized with low confidence. To recognize a non-anchor word, a dynamic dictionary is built by selecting words from the Web resource based on their string similarity with the NAW image, and their linguistic relevance in the NAW context. Similarity is evaluated by computing the edit distance between the sequence of characters generated by the RNN recognizer exploited as a filler model, and the Wikipedia words. Linguistic relevance is based on an N-gram language model estimated from the Wikipedia corpus. Experiments conducted on aword-segmented version of the publicly available RIMES database show that the proposed approach can improve recognition accuracy compared to systems based on static dictionaries only. The proposed approach shows even better behavior as the proportion of OOVs increases, in terms of both accuracy and dictionary coverage.
引用
收藏
页码:287 / 301
页数:15
相关论文
共 50 条
  • [1] Handwritten word recognition using Web resources and recurrent neural networks
    Cristina Oprean
    Laurence Likforman-Sulem
    Adrian Popescu
    Chafic Mokbel
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 287 - 301
  • [2] Unconstrained Handwritten Word Recognition Using a Combination of Neural Networks
    Luna-Perez, Rodolfo
    Gomez-Gil, Pilar
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, VOLS 1 AND 2, 2010, : 525 - 528
  • [3] Online Handwritten Mongolian Word Recognition Using a Novel Sliding Window Method with Recurrent Neural Networks
    Liu, Ji
    Ma, Long-Long
    Wu, Jian
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 189 - 194
  • [4] Holistic Handwritten Uyghur Word Recognition Using Convolutional Neural Networks
    Simayi, Wujiahemaiti
    Hamdulla, Askar
    Liu, Cheng-Lin
    [J]. PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 846 - 851
  • [5] Isolated word recognition using modular recurrent neural networks
    Lee, T
    Ching, PC
    Chan, LW
    [J]. PATTERN RECOGNITION, 1998, 31 (06) : 751 - 760
  • [6] Sub-word Based Offline Handwritten Farsi Word Recognition Using Recurrent Neural Network
    Ghadikolaie, Mohammad Fazel Younessy
    Kabir, Ehsanolah
    Razzazi, Farbod
    [J]. ETRI JOURNAL, 2016, 38 (04) : 703 - 713
  • [7] Handwritten English Word Recognition based on Convolutional Neural Networks
    Yuan, Aiquan
    Bai, Gang
    Yang, Po
    Guo, Yanni
    Zhao, Xinting
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 207 - 212
  • [8] Feature extraction with convolutional neural networks for handwritten word recognition
    Bluche, Theodore
    Ney, Hermann
    Kermorvant, Christopher
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 285 - 289
  • [9] Unconstrained online handwritten Uyghur word recognition based on recurrent neural networks and connectionist temporal classification
    Ibrayim, Mayire
    Simayi, Wujiahematiti
    Hamdulla, Askar
    [J]. INTERNATIONAL JOURNAL OF BIOMETRICS, 2021, 13 (01) : 51 - 63
  • [10] Character type based online handwritten Uyghur word recognition using recurrent neural network
    Simayi, Wujiahemaiti
    Ibrayim, Mayire
    Hamdulla, Askar
    [J]. WIRELESS NETWORKS, 2021,