Semi-Supervised Hashing for Large-Scale Search

被引:615
|
作者
Wang, Jun [1 ]
Kumar, Sanjiv [2 ]
Chang, Shih-Fu [3 ]
机构
[1] IBM TJ Watson Res Ctr, Business Analyt & Math Sci Dept, Yorktown Hts, NY 10598 USA
[2] Google Res, New York, NY 10011 USA
[3] Columbia Univ, Dept Elect & Comp Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Hashing; nearest neighbor search; binary codes; semi-supervised hashing; pairwise labels; sequential hashing; SCENE;
D O I
10.1109/TPAMI.2012.48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e. g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
引用
收藏
页码:2393 / 2406
页数:14
相关论文
共 50 条
  • [21] Efficient Supervised Discrete Multi-View Hashing for Large-Scale Multimedia Search
    Lu, Xu
    Zhu, Lei
    Li, Jingjing
    Zhang, Huaxiang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (08) : 2048 - 2060
  • [22] Large-scale image recognition based on parallel kernel supervised and semi-supervised subspace learning
    Fei Wu
    Xiao-Yuan Jing
    Qian Liu
    Song-Song Wu
    Guo-Liang He
    Neural Computing and Applications, 2017, 28 : 483 - 498
  • [23] Large-scale image recognition based on parallel kernel supervised and semi-supervised subspace learning
    Wu, Fei
    Jing, Xiao-Yuan
    Liu, Qian
    Wu, Song-Song
    He, Guo-Liang
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (03): : 483 - 498
  • [24] Contextual Hashing for Large-Scale Image Search
    Liu, Zhen
    Li, Houqiang
    Zhou, Wengang
    Zhao, Ruizhen
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (04) : 1606 - 1614
  • [25] Large-scale image retrieval with supervised sparse hashing
    Xu, Yan
    Shen, Fumin
    Xu, Xing
    Gao, Lianli
    Wang, Yuan
    Tan, Xiao
    NEUROCOMPUTING, 2017, 229 : 45 - 53
  • [26] Supervised Distributed Hashing for Large-Scale Multimedia Retrieval
    Zhai, Deming
    Liu, Xianming
    Ji, Xiangyang
    Zhao, Debin
    Satoh, Shin'ichi
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 675 - 686
  • [27] Noise-Robust Semi-Supervised Learning by Large-Scale Sparse Coding
    Lu, Zhiwu
    Gao, Xin
    Wang, Liwei
    Wen, Ji-Rong
    Huang, Songfang
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2828 - 2834
  • [28] A survey of large-scale graph-based semi-supervised classification algorithms
    Song Y.
    Zhang J.
    Zhang C.
    International Journal of Cognitive Computing in Engineering, 2022, 3 : 188 - 198
  • [29] LARGE-SCALE SEMI-SUPERVISED LEARNING BY APPROXIMATE LAPLACIAN EIGENMAPS, VLAD AND PYRAMIDS
    Mantziou, Eleni
    Papadopoulos, Symeon
    Kompatsiaris, Yiannis
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [30] Semi-supervised incremental feature extraction algorithm for large-scale data stream
    Tan, Chao
    Ji, Genlin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (06):