Annotation-Free Word Spotting with Bag-of-Features HMMs

被引:0
|
作者
Rothacker, Leonard [1 ]
Wolf, Fabian [1 ]
Fink, Gernot A. [1 ]
机构
[1] Department of Computer Science, Tu Dortmund University, Dortmund,44227, Germany
关键词
Image segmentation;
D O I
暂无
中图分类号
O21 [概率论与数理统计];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The annotation-free word spotting method that is proposed in this paper makes document images searchable without requiring any labeled training data. Thus, our method supports the exploration of a document collection directly without demanding any manual efforts from the users for the preparation of a training dataset. Our method works in the query-by-example scenario where the user selects an exemplary occurrence of the query word. Afterwards, the entire collection of document images is searched according to visual similarity to the query. The proposed method requires only minimal assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. Therefore, the method is also segmentation-free. Word size variabilities can be handled by representing the sequential structure of text with a statistical sequence model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Re-ranking these regions according to the visual similarity obtained with the sequence model leads to highly accurate word spotting results. The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated training data is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks. © 2021 The Author(s).
引用
收藏
相关论文
共 50 条
  • [1] Annotation-Free Word Spotting with Bag-of-Features HMMs
    Rothacker, Leonard
    Wolf, Fabian
    Fink, Gernot A.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (04)
  • [2] Bag-of-Features HMMs for Segmentation-free Word Spotting in Handwritten Documents
    Rothacker, Leonard
    Rusinol, Marcal
    Fink, Gernot A.
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1305 - 1309
  • [3] Segmentation-free Query-by-String Word Spotting with Bag-of-Features HMMs
    Rothacker, Leonard
    Fink, Gernot A.
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 661 - 665
  • [4] Improving Handwritten Word Synthesis for Annotation-free Word Spotting
    Wolf, Fabian
    Brandenbusch, Kai
    Fink, Gernot A.
    [J]. 2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 61 - 66
  • [5] Robust Output Modeling in Bag-of-Features HMMs for Handwriting Recognition
    Rothacker, Leonard
    Fink, Gernot A.
    [J]. PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 199 - 204
  • [6] Annotation-Free Learning of Deep Representations for Word Spotting Using Synthetic Data and Self Labeling
    Wolf, Fabian
    Fink, Gernot A.
    [J]. DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 293 - 308
  • [7] Bag-of-features and beyond
    Schmid, C.
    [J]. PERCEPTION, 2011, 40 : 3 - 3
  • [8] Annotation-Free Keyword Spotting in Historical Vietnamese Manuscripts Using Graph Matching
    Scius-Bertrand, Anna
    Studer, Linda
    Fischer, Andreas
    Bui, Marc
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 22 - 32
  • [9] Lexicon-free handwritten word spotting using character HMMs
    Fischer, Andreas
    Keller, Andreas
    Frinken, Volkmar
    Bunke, Horst
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (07) : 934 - 942
  • [10] Packing bag-of-features
    Jegou, Herve
    Douze, Matthijs
    Schmid, Cordelia
    [J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 2357 - 2364