Semi-Supervised Hashing for Large-Scale Search

被引:615
|
作者
Wang, Jun [1 ]
Kumar, Sanjiv [2 ]
Chang, Shih-Fu [3 ]
机构
[1] IBM TJ Watson Res Ctr, Business Analyt & Math Sci Dept, Yorktown Hts, NY 10598 USA
[2] Google Res, New York, NY 10011 USA
[3] Columbia Univ, Dept Elect & Comp Engn, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Hashing; nearest neighbor search; binary codes; semi-supervised hashing; pairwise labels; sequential hashing; SCENE;
D O I
10.1109/TPAMI.2012.48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e. g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
引用
收藏
页码:2393 / 2406
页数:14
相关论文
共 50 条
  • [31] A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data
    He, Guoliang
    Pan, Yanzhou
    Xia, Xuewen
    He, Jinrong
    Peng, Rong
    Xiong, Neal N.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (07): : 4201 - 4216
  • [32] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
    Long, Yanhua
    Li, Yijie
    Wei, Shuang
    Zhang, Qiaozheng
    Yang, Chunxia
    IEEE ACCESS, 2019, 7 : 133615 - 133627
  • [33] Semi-Supervised anchor graph ensemble for large-scale hyperspectral image classification
    He, Ziping
    Xia, Kewen
    Hu, Yuhen
    Yin, Zhixian
    Wang, Sijie
    Zhang, Jiangnan
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (05) : 1894 - 1918
  • [34] Semi-Supervised Multi-View Discrete Hashing for Fast Image Search
    Zhang, Chenghao
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (06) : 2604 - 2617
  • [35] Semi-Supervised Hashing for Scalable Image Retrieval
    Wang, Jun
    Kumar, Sanjiv
    Chang, Shih-Fu
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3424 - 3431
  • [36] Semi-Supervised Deep Hashing with a Bipartite Graph
    Yan, Xinyu
    Zhang, Lijun
    Li, Wu-Jun
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3238 - 3244
  • [37] A BALANCED SEMI-SUPERVISED HASHING METHOD FOR CBIR
    Zhou, Jianhui
    Fu, Haiyan
    Kong, Xiangwei
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [38] Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing
    Nanculef, Ricardo
    Mena, Francisco
    Macaluso, Antonio
    Lodi, Stefano
    Sartori, Claudio
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 258 - 268
  • [39] Semi-supervised Learning for Large Scale Image Cosegmentation
    Wang, Zhengxiang
    Liu, Rujie
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 393 - 400
  • [40] Tree decomposition for large scale semi-supervised classification
    Zhou, Rong
    Wu, Guangchao
    Yang, Xiaowei
    Lv, Haoran
    Journal of Computational Information Systems, 2013, 9 (06): : 2451 - 2460