HMM word graph based keyword spotting in handwritten document images

被引:39
|
作者
Toselli, Alejandro Hector [1 ]
Vidal, Enrique [1 ]
Romero, Veronica [1 ]
Frinken, Volkmar [2 ,3 ,4 ]
机构
[1] Univ Politecn Valencia, Camino Vera S-N, E-46022 Valencia, Spain
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka 812, Japan
[3] Univ Calif Davis, Elect & Comp Engn, Davis, CA 95616 USA
[4] ONU Technol Inc, San Jose, CA USA
基金
欧盟地平线“2020”;
关键词
Keyword spotting; Handwritten text recognition; Word graph; Posterior probability; Confidence score; INTERACTIVE TRANSCRIPTION; HISTORICAL DOCUMENTS; CONFIDENCE MEASURES; SEGMENTATION; RECOGNITION; ALGORITHM; FILLER; MODEL;
D O I
10.1016/j.ins.2016.07.063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Line-level keyword spotting (KWS) is presented on the basis of frame-level word posterior probabilities. These posteriors are obtained using word graphs derived from the recognition process of a full-fledged handwritten text recognizer based on hidden Markov models and N-gram language models. This approach has several advantages. First, since it uses a holistic, segmentation-free technology, it does not require any kind of word or character segmentation. Second, the use of language models allows the context of each spotted word to be taken into account, thereby considerably increasing KWS accuracy. And third, the proposed KWS scores are based on true posterior probabilities, taking into account all (or most) possible word segmentations of the input image. These scores are properly bounded and normalized. This mathematically clean formulation lends itself to smooth, threshold-based keyword queries which, in turn, permit comfortable trade-offs between search precision and recall. Experiments are carried out on several historic collections of handwritten text images, as well as a well-known data set of modern English handwritten text. According to the empirical results, the proposed approach achieves KWS results comparable to those obtained with the recently-introduced "BLSTM neural networks KWS" approach and clearly outperform the popular, state-of-the-art "Filler HMM" KWS method. Overall, the results clearly support all the above-claimed advantages of the proposed approach. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:497 / 518
页数:22
相关论文
共 50 条
  • [1] Shape-based Word Spotting in Handwritten Document Images
    Giotis, Angelos P.
    Sfikas, Giorgos
    Nikou, Christophoros
    Gatos, Basilis
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 561 - 565
  • [2] Dynamic handwritten keyword spotting based on the NSHP-HMM
    Choisy, Christophe
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 242 - 246
  • [3] A voting-based technique for word spotting in handwritten document images
    Shamik Majumder
    Subhrangshu Ghosh
    Samir Malakar
    Ram Sarkar
    Mita Nasipuri
    Multimedia Tools and Applications, 2021, 80 : 12411 - 12434
  • [4] A voting-based technique for word spotting in handwritten document images
    Majumder, Shamik
    Ghosh, Subhrangshu
    Malakar, Samir
    Sarkar, Ram
    Nasipuri, Mita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 12411 - 12434
  • [5] Visual keyword based word-spotting in handwritten documents
    Kolcz, A
    Alspector, J
    Augusteijn, M
    Carlson, R
    Popescu, GV
    DOCUMENT RECOGNITION V, 1998, 3305 : 185 - 193
  • [6] Graph Based Keyword Spotting in Handwritten Historical Slavic Documents
    Riesen, Kaspar
    Brodic, Darko
    ERCIM NEWS, 2013, (95): : 37 - 38
  • [7] Keyword spotting in historical handwritten documents based on graph matching
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    PATTERN RECOGNITION, 2018, 81 : 240 - 253
  • [8] Graph-Based Keyword Spotting in Historical Handwritten Documents
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2016, 2016, 10029 : 564 - 573
  • [9] Query-Based Word Spotting in Handwritten Documents Using HMM
    Bharathi, V. C.
    Veningston, K.
    Rao, P. V. Venkateswara
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 31 - 39
  • [10] Ensembles for Graph-based Keyword Spotting in Historical Handwritten Documents
    Stauffer, Michael
    Fischer, Andreas
    Riesen, Kaspar
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 714 - 720