Text box proposals for handwritten word spotting from documents

被引:4
|
作者
Ghosh, Suman [1 ]
Valveny, Ernest [1 ]
机构
[1] Univ Autonoma Barcelona, Dept Ciencies Comp, Comp Vis Ctr, Bellaterra 08193, Barcelona, Spain
关键词
Word spotting; Segmentation-free; Bounding box proposals; Word attributes; Pyramidal Histogram of Characters; SEGMENTATION; LINE; RECOGNITION; RETRIEVAL;
D O I
10.1007/s10032-018-0300-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we propose a new approach to segmentation-free word spotting that is based on the combination of three different contributions. Firstly, inspired by the success of bounding box proposal algorithms in object recognition, we propose a scheme to generate a set of word-independent text box proposals. For that, we generate a set of atomic bounding boxes based on simple connected component analysis that are combined using a set of spatial constraints in order to generate the final set of text box proposals. Secondly, an attribute representation based on the Pyramidal Histogram of Characters (PHOC) is encoded in an integral image and used to efficiently evaluate text box proposals for retrieval. Thirdly, we also propose an indexing scheme for fast retrieval based on character n-grams. For the generation of the index a similar attribute space based on a Pyramidal Histogram of Character N-grams (PHON) is used. All attribute models are learned using linear SVMs over the Fisher Vector representation of the word images along with the PHOC or PHON labels of the corresponding words. We show the performance of the proposed approach in both tasks of query-by-string and query-by-example in standard single- and multi-writer data sets, reporting state-of-the-art results.
引用
收藏
页码:91 / 108
页数:18
相关论文
共 50 条
  • [1] Text box proposals for handwritten word spotting from documents
    Suman Ghosh
    Ernest Valveny
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2018, 21 : 91 - 108
  • [2] Word Spotting as a Service for Handwritten Documents
    Amanatiadis, Angelos
    Zagoris, Konstantinos
    Pratikakis, Ioannis
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2021,
  • [3] A Survey on handwritten documents word spotting
    Ahmed R.
    Al-Khatib W.G.
    Mahmoud S.
    [J]. International Journal of Multimedia Information Retrieval, 2017, 6 (1) : 31 - 47
  • [4] An overview on handwritten documents word spotting
    Boualam, Manal
    Khaissidi, Ghizlane
    Mrabti, Mostafa
    Elfakir, Youssef
    [J]. 2019 INTERNATIONAL CONFERENCE ON WIRELESS TECHNOLOGIES, EMBEDDED AND INTELLIGENT SYSTEMS (WITS), 2019,
  • [5] Multilingual Word Spotting in Offline Handwritten Documents
    Wshah, Safwan
    Kumar, Gaurav
    Govindaraju, Venu
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 310 - 313
  • [6] Attribute CNNs for word spotting in handwritten documents
    Sebastian Sudholt
    Gernot A. Fink
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2018, 21 : 199 - 218
  • [7] A segmentation free Word Spotting for handwritten documents
    Ghorbel, Adam
    Ogier, Lean-Marc
    Vincent, Nicole
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 346 - 350
  • [8] Attribute CNNs for word spotting in handwritten documents
    Sudholt, Sebastian
    Fink, Gernot A.
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2018, 21 (03) : 199 - 218
  • [9] Sequential Word Spotting in Historical Handwritten Documents
    Fernandez-Mota, David
    Llados, Josep
    Fornes, Alicia
    Manmatha, R.
    [J]. 2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 101 - 105
  • [10] ON THE INFLUENCE OF WORD REPRESENTATIONS FOR HANDWRITTEN WORD SPOTTING IN HISTORICAL DOCUMENTS
    Llados, Josep
    Rusinol, Marcal
    Fornes, Alicia
    Fernandez, David
    Dutta, Anjan
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (05)