Two-Stage Hashing for Fast Document Retrieval

被引:0
|
作者
Li, Hao [1 ]
Liu, Wei [2 ]
Ji, Heng [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This work fulfills sublinear time Nearest Neighbor Search (NNS) in massive-scale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-the-art hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. Furthermore, the proposed hashing framework capitalizes on both term and topic similarity among documents, leading to precise document retrieval. The experimental results convincingly show that our hashing based document retrieval approach well approximates the conventional Information Retrieval (IR) method in terms of retrieving semantically similar documents, and meanwhile achieves a speedup of over one order of magnitude in query time.
引用
下载
收藏
页码:495 / 500
页数:6
相关论文
共 50 条
  • [21] Fusion vs. Two-Stage for Multimodal Retrieval
    Arampatzis, Avi
    Zagoris, Konstantinos
    Chatzichristofis, Savvas A.
    ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 759 - 762
  • [22] A Fast Two-Stage Extreme Learning Machine
    Lai, Jie
    Wang, Xiaodan
    Li, Rui
    Gu, Jinghao
    ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 16 - 22
  • [23] Document image segmentation using a two-stage neural network
    Ahmed, M
    Cooper, B
    Love, S
    APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN IMAGE PROCESSING V, 2000, 3962 : 25 - 33
  • [24] A Convolutional Neural Network based Two-stage Document Deblurring
    Jiao, Jile
    Sun, Jun
    Satoshi, Naoi
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 703 - 707
  • [25] Topical document clustering: two-stage post processing technique
    Goya, Poonam
    Mehala, N.
    Bhatia, Divyansh
    Goyal, Navneet
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2018, 10 (02) : 127 - 170
  • [26] Semantic Hashing for Fast Solar Magnetogram Retrieval
    Grycuk, Rafal
    Scherer, Rafal
    Marchlewska, Alina
    Napoli, Christian
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2022, 12 (04) : 299 - 306
  • [27] Deep Supervised Hashing for Fast Image Retrieval
    Liu, Haomiao
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2064 - 2072
  • [28] Two-stage lossy/lossless compression of grayscale document images
    Popat, K
    Bloomberg, DS
    MATHEMATICAL MORPHOLOGY AND ITS APPLICATIONS TO IMAGE AND SIGNAL PROCESSING, 2000, 18 : 361 - 370
  • [29] Word Detecting in Document Image Based on Two-Stage Model
    Li, Xiujuan
    Huang, Zhimin
    Wen, Ying
    Lu, Yue
    ADVANCES ON DIGITAL TELEVISION AND WIRELESS MULTIMEDIA COMMUNICATIONS, 2012, 331 : 175 - +
  • [30] Deep Supervised Hashing for Fast Image Retrieval
    Haomiao Liu
    Ruiping Wang
    Shiguang Shan
    Xilin Chen
    International Journal of Computer Vision, 2019, 127 : 1217 - 1234