Segmentation-Free Keyword Retrieval in Historical Document Images

被引:7
|
作者
Rabaev, Irina [1 ]
Dinstein, Itshak [2 ]
El-Sana, Jihad [1 ]
Kedem, Klara [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Elect & Comp Engn, IL-84105 Beer Sheva, Israel
关键词
Historical document processing; Keyword retrieval; Segmentation-free; Bag-of-visual-words; Kernelized locality-sensitive hashing;
D O I
10.1007/978-3-319-11758-4_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a segmentation-free method to retrieve keywords from degraded historical documents. The proposed method works directly on the gray scale representation and does not require any pre-processing to enhance document images. The document images are subdivided into overlapping patches of varying sizes, where each patch is described by the bag-of-visual-words descriptor. The obtained patch descriptors are hashed into several hash tables using kernelized locality-sensitive hashing scheme for efficient retrieval. In such a scheme the search for a keyword is reduced to a small fraction of the patches from the appropriate entries in the hash tables. Since we need to capture the handwriting variations and the availability of historical documents is limited, we synthesize a small number of samples from the given query to improve the results of the retrieval process. We have tested our approach on historical document images in Hebrew from the Cairo Genizah collection, and obtained impressive results.
引用
收藏
页码:369 / 378
页数:10
相关论文
共 50 条
  • [31] Page Segmentation of Historical Document Images with Convolutional Autoencoders
    Chen, Kai
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1011 - 1015
  • [32] Weakly supervised precise segmentation for historical document images
    Xie, Zecheng
    Huang, Yaoxiong
    Jin, Lianwen
    Liu, Yuliang
    Zhu, Yuanzhi
    Gao, Liangcai
    Zhang, Xiaode
    [J]. NEUROCOMPUTING, 2019, 350 : 271 - 281
  • [33] Segmentation-Free Dynamic Scene Deblurring
    Kim, Tae Hyun
    Lee, Kyoung Mu
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2766 - 2773
  • [34] Keyword spotting for cursive document retrieval
    Keaton, P
    Greenspan, H
    Goodman, R
    [J]. WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 74 - 81
  • [35] Segmentation-free skeletonization of gray-scale images via PDE's
    Chung, DH
    Sapiro, G
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, : 927 - 930
  • [36] A Multiple Instances Approach to Improving Keyword Spotting on Historical Mongolian Document Images
    Wei, Hongxi
    Gao, Guanglai
    Su, Xiangdong
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 121 - 125
  • [37] Segmentation-Free Ocular Detection and Recognition
    Rodriguez, Andres
    Panza, Jeffrey
    Kumar, B. V. K. Vijaya
    [J]. SENSING TECHNOLOGIES FOR GLOBAL HEALTH, MILITARY MEDICINE, DISASTER RESPONSE, AND ENVIRONMENTAL MONITORING AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VIII, 2011, 8029
  • [38] Segmentation-Free Detection of Comic Panels
    Stommel, Martin
    Merhej, Lena I.
    Mueller, Marion G.
    [J]. COMPUTER VISION AND GRAPHICS, 2012, 7594 : 633 - 640
  • [39] Segmentation-Free Streaming Machine Translation
    Iranzo-Sanchez, Javier
    Iranzo-Sanchez, Jorge
    Gimenez, Adria
    Civera, Jorge
    Juan, Alfons
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1104 - 1121
  • [40] Convolutional Neural Networks for Page Segmentation of Historical Document Images
    Chen, Kai
    Seuret, Mathias
    Henneberet, Jean
    Ingold, Rolf
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 965 - 970