Segmentation-Free Keyword Retrieval in Historical Document Images

被引:7
|
作者
Rabaev, Irina [1 ]
Dinstein, Itshak [2 ]
El-Sana, Jihad [1 ]
Kedem, Klara [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Elect & Comp Engn, IL-84105 Beer Sheva, Israel
关键词
Historical document processing; Keyword retrieval; Segmentation-free; Bag-of-visual-words; Kernelized locality-sensitive hashing;
D O I
10.1007/978-3-319-11758-4_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a segmentation-free method to retrieve keywords from degraded historical documents. The proposed method works directly on the gray scale representation and does not require any pre-processing to enhance document images. The document images are subdivided into overlapping patches of varying sizes, where each patch is described by the bag-of-visual-words descriptor. The obtained patch descriptors are hashed into several hash tables using kernelized locality-sensitive hashing scheme for efficient retrieval. In such a scheme the search for a keyword is reduced to a small fraction of the patches from the appropriate entries in the hash tables. Since we need to capture the handwriting variations and the availability of historical documents is limited, we synthesize a small number of samples from the given query to improve the results of the retrieval process. We have tested our approach on historical document images in Hebrew from the Cairo Genizah collection, and obtained impressive results.
引用
收藏
页码:369 / 378
页数:10
相关论文
共 50 条
  • [1] Efficient segmentation-free keyword spotting in historical document collections
    Rusinol, Marcal
    Aldavert, David
    Toledo, Ricardo
    Llados, Josep
    [J]. PATTERN RECOGNITION, 2015, 48 (02) : 545 - 555
  • [2] Segmentation-free pattern spotting in historical document images
    En, Sovann
    Petitjean, Caroline
    Nicolas, Stephane
    Heutte, Laurent
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 606 - 610
  • [3] A segmentation-free approach for keyword search in historical typewritten documents
    Gatos, B
    Konidaris, T
    Ntzios, K
    Pratikakis, I
    Perantonis, SJ
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 54 - 58
  • [4] A keyword retrieval system for historical Mongolian document images
    Hongxi Wei
    Guanglai Gao
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2014, 17 : 33 - 45
  • [5] A keyword retrieval system for historical Mongolian document images
    Wei, Hongxi
    Gao, Guanglai
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2014, 17 (01) : 33 - 45
  • [6] Segmentation-free Word Spotting in Historical Bangla Handwritten Binarized Document
    Das, Sugata
    Mandal, Sekhar
    [J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 76 - 81
  • [7] Word Hypotheses for Segmentation-free Word Spotting in Historic Document Images
    Rothacker, Leonard
    Sudholt, Sebastian
    Rusakov, Eugen
    Kasperidus, Matthias
    Fink, Gernot A.
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1174 - 1179
  • [8] An Application-Independent and Segmentation-Free Approach for Spotting Queries in Document Images
    Chatbri, Houssem
    Kwan, Paul
    Kameyama, Keisuke
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2891 - 2896
  • [9] Segmentation-free Keyword Spotting for Bangla Handwritten Documents
    Zhang, Xi
    Pal, Umapada
    Tan, Chew Lim
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 381 - 386
  • [10] Continuous retrieval of video using segmentation-free query
    Sekimoto, N
    Nishimura, T
    Takahashi, H
    Oka, R
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 371 - 374