Ranking-Based Supervised Discrete Cross-Modal Hashing

被引:0
|
作者
Li H.-Q. [1 ]
Wang Y.-X. [1 ]
Chen Z.-D. [1 ]
Luo X. [1 ]
Xu X.-S. [1 ]
机构
[1] School of Software, Shandong University, Jinan
来源
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Discrete optimization; Learning to hash; Ranking-based hashing; Similarity preserving;
D O I
10.11897/SP.J.1016.2021.01620
中图分类号
学科分类号
摘要
In recent years, with the development of information technology, the explosion of multimedia data such as images, texts, videos, audios, has occurred. When dealing with a huge amount of data, the efficiency of some traditional retrieval methods may be affected and cannot obtain satisfactory accuracy within an acceptable time. In addition, the massive amount of data has also caused huge storage consumption problems. In order to solve the above problems, hashing is proposed. It first transforms data from original representations into binary codes, minimizing the Hamming distance of similar data points and maximizing that of dissimilar ones. Then, pairwise comparisons can be carried out extremely efficiently in the learned Hamming space, using XOR operations. Moreover, by representing data with binary codes rather than original high-dimensional features, the storage cost can be dramatically reduced. Due to the efficient indexing and quick query, hashing has received extensive attention in the field of cross-modal retrieval, and many cross-modal hashing methods have been proposed. However, there still exist some issues worthy of investigation for existing cross-modal hashing methods. (1)For example, most methods only consider the pairwise similarity between samples and ignore the ranking information. However, lack of ranking information may lead to sub-optimal performance since it is also important. (2)A lot of hashing methods employ a pairwise similarity matrix to preserve similarity, which makes the algorithm complexity O(n2) and cannot extend to large-scale datasets. (3)Besides, most methods relax the discrete constraint to solve the discrete optimization problem, which may introduce serious quantization error. To overcome the aforementioned issues, in this paper, we propose a new method named Ranking-based Supervised Discrete Cross-modal Hashing (RSDCH for short). RSDCH consists of ranking learning step and hashing learning step. In the first step, the proposed method learns ranking information from the manifold structure and semantic labels of data and generates a ranking score matrix. In the second step, RSDCH jointly learns hash codes and hash functions while preserving the learned ranking information. To make our method scalable to large-scale datasets, anchor sampling is leveraged and the time complexity of our method is linear to the number of training samples. To learn high-quality hash codes, two effective similarity-preserving strategies are proposed. To avoid large quantization error, an alternative optimization algorithm, which discretely solves the binary codes learning problem, is designed. We conducted comparative experiments on two widely-used multi-label datasets, i.e., MIRFlickr-25K and NUS-WIDE. To comprehensively evaluate our proposed method RSDCH, we adopted three evaluation metrics, i.e., Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG) and Precision-Recall Curve. The experimental results have shown that the proposed RSDCH is superior to several state-of-the-art methods, including both non-deep and deep cross-modal hashing methods. To further evaluate the effectiveness of our method, we also carried out ablation experiments in order to test the necessity and effectiveness of each module in the RSDCH model. Finally, the effectiveness of the model convergence, parameter sensitivity, and training efficiency were tested by additional experiments, and the results further demonstrate that the proposed method is effective. © 2021, Science Press. All right reserved.
引用
收藏
页码:1620 / 1635
页数:15
相关论文
共 33 条
  • [1] Luo X, Zhang P, Huang Z, Et al., Discrete hashing with multiple supervision, IEEE Transactions on Image Processing, 28, 6, pp. 2962-2975, (2019)
  • [2] Li Z, Tang J, Zhang L, Et al., Weakly-supervised semantic guided hashing for social image retrieval, International Journal of Computer Vision, 128, 8, pp. 2265-2278, (2020)
  • [3] Cui H, Zhu L, Li J, Et al., Scalable deep hashing for large scale social image retrieval, IEEE Transactions on Image Processing, 29, pp. 1271-1284, (2020)
  • [4] He S, Wang B, Wang Z, Et al., Bidirectional discrete matrix factorization hashing for image search, IEEE Transactions on Cybernetics, 50, 9, pp. 4157-4168, (2020)
  • [5] Liu H, Ji R, Wu Y, Et al., Cross-modality binary code learning via fusion similarity hashing, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 6345-6353, (2017)
  • [6] Lin Z, Ding G, Hu M, Et al., Semantics-preserving hashing for cross-view retrieval, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3864-3872, (2015)
  • [7] Wang Y, Luo X, Nie L, Et al., BATCH: A scalable asymmetric discrete cross-modal hashing, IEEE Transactions on Knowledge and Data Engineering, 99, pp. 1-1, (2020)
  • [8] Nie X, Liu X, Xi X, Et al., Fast Unmediated hashing for cross-modal retrieval, IEEE Transactions on Circuits and Systems for Video Technology, PP, 99, pp. 1-1, (2020)
  • [9] Ding G, Guo Y, Zhou J., Collective matrix factorization hashing for multimodal data, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2083-2090, (2014)
  • [10] Long M, Cao Y, Wang J, Et al., Composite correlation quantization for efficient multimodal retrieval, Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579-588, (2016)