HMM word graph based keyword spotting in handwritten document images

被引:39
|
作者
Toselli, Alejandro Hector [1 ]
Vidal, Enrique [1 ]
Romero, Veronica [1 ]
Frinken, Volkmar [2 ,3 ,4 ]
机构
[1] Univ Politecn Valencia, Camino Vera S-N, E-46022 Valencia, Spain
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka 812, Japan
[3] Univ Calif Davis, Elect & Comp Engn, Davis, CA 95616 USA
[4] ONU Technol Inc, San Jose, CA USA
基金
欧盟地平线“2020”;
关键词
Keyword spotting; Handwritten text recognition; Word graph; Posterior probability; Confidence score; INTERACTIVE TRANSCRIPTION; HISTORICAL DOCUMENTS; CONFIDENCE MEASURES; SEGMENTATION; RECOGNITION; ALGORITHM; FILLER; MODEL;
D O I
10.1016/j.ins.2016.07.063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Line-level keyword spotting (KWS) is presented on the basis of frame-level word posterior probabilities. These posteriors are obtained using word graphs derived from the recognition process of a full-fledged handwritten text recognizer based on hidden Markov models and N-gram language models. This approach has several advantages. First, since it uses a holistic, segmentation-free technology, it does not require any kind of word or character segmentation. Second, the use of language models allows the context of each spotted word to be taken into account, thereby considerably increasing KWS accuracy. And third, the proposed KWS scores are based on true posterior probabilities, taking into account all (or most) possible word segmentations of the input image. These scores are properly bounded and normalized. This mathematically clean formulation lends itself to smooth, threshold-based keyword queries which, in turn, permit comfortable trade-offs between search precision and recall. Experiments are carried out on several historic collections of handwritten text images, as well as a well-known data set of modern English handwritten text. According to the empirical results, the proposed approach achieves KWS results comparable to those obtained with the recently-introduced "BLSTM neural networks KWS" approach and clearly outperform the popular, state-of-the-art "Filler HMM" KWS method. Overall, the results clearly support all the above-claimed advantages of the proposed approach. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:497 / 518
页数:22
相关论文
共 50 条
  • [41] Visual Language Model for Keyword Spotting on Historical Mongolian Document Images
    Wei, Hongxi
    Gao, Guanglai
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 1737 - 1742
  • [42] A Case Study Of BoVW For Keyword Spotting On Historical Mongolian Document Images
    Guo, Xing
    Wei, Hongxi
    Su, Xiangdong
    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 374 - 378
  • [43] Word spotting based on a posterior measure of keyword confidence
    Hao, J
    Li, X
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (04) : 491 - 497
  • [44] Deep Learning Features for Handwritten Keyword Spotting
    Wicht, Baptiste
    Fischer, Andreas
    Hennebert, Jean
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3434 - 3439
  • [45] Word Spotting based Retrieval of Urdu Handwritten Documents
    Abidi, Ali
    Jamil, Akhtar
    Siddiqi, Imran
    Khurshid, Khurram
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 331 - 336
  • [46] Improving HMM-Based Keyword Spotting with Character Language Models
    Fischer, Andreas
    Frinken, Volkmar
    Bunke, Horst
    Suen, Ching Y.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 506 - 510
  • [47] Segmentation-free Word Spotting in Historical Bangla Handwritten Binarized Document
    Das, Sugata
    Mandal, Sekhar
    2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 76 - 81
  • [48] Handwritten Word Spotting Based on A Hybrid Optimal Distance
    Wang, Peng
    Eglin, Veronique
    Largeron, Christine
    Garcia, Christophe
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2580 - 2584
  • [49] Segmentation-based Historical Handwritten Word Spotting using Document-Specific Local Features
    Zagoris, Konstantinos
    Pratikakis, Ioannis
    Gatos, Basil. Is
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 9 - 14
  • [50] Keyword spotting for cursive document retrieval
    Keaton, P
    Greenspan, H
    Goodman, R
    WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 74 - 81