HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval

被引：34

作者：

Li, Tao ^{[1
]}

Zhang, Zheng ^{[1
]}

Pei, Lishen ^{[2
]}

Gan, Yan ^{[3
]}

机构：

[1] Open Univ Henan, Zhengzhou 450046, Peoples R China

[2] Henan Univ Econ & Law, Zhengzhou 450046, Peoples R China

[3] Chongqing Univ, Chongqing 400044, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Transformers; Binary codes; Task analysis; Training; Image retrieval; Feature extraction; Databases; Binary embedding; image retrieval;

D O I：

10.1109/LSP.2022.3157517

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep image hashing aims to map an input image to compact binary codes by deep neural network, to enable efficient image retrieval across large-scale dataset. Due to the explosive growth of modern data, deep hashing has gained growing attention from research community. Recently, convolutional neural networks like ResNet have dominated in deep hashing. Nevertheless, motivated by the recent advancements of vision transformers, we propose a pure transformer-based framework, called as HashFormer, to tackle the deep hashing task. Specifically, we utilize vision transformer (ViT) as our backbone, and treat binary codes as the intermediate representations for our surrogate task, i.e., image classification. In addition, we observe that the binary codes suitable for classification are sub-optimal for retrieval. To mitigate this problem, we present a novel average precision loss, which enables us to directly optimize the retrieval accuracy. To the best of our knowledge, our work is one of the pioneer works to address deep hashing learning problems without convolutional neural networks (CNNs). We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUSWIDE and ImageNet. The proposed method demonstrates promising results against existing state-of-the-art works, validating the advantages and merits of our HashFormer.

引用

页码：827 / 831

页数：5

共 50 条

[1] Contrastive hashing with vision transformer for image retrieval
Ren, Xiuxiu
Zheng, Xiangwei
Zhou, Huiyu
Liu, Weilong
Dong, Xiao
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 12192 - 12211
[2] Deep Supervised Hashing Image Retrieval Method Based on Swin Transformer
Miao Z.
Zhao X.
Li Y.
Wang J.
Zhang R.
[J]. Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2023, 50 (08): : 62 - 71
[3] Deep internally connected transformer hashing for image retrieval
Chao, Zijian
Cheng, Shuli
Li, Yongming
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 279
[4] Medical image retrieval based on deep hashing
Yan, Longquan
Shi, Wei
[J]. DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 491 - 491
[5] VTHSC-MIR: Vision Transformer Hashing with Supervised Contrastive learning based medical image retrieval
Kumar, Mehul
Singh, Rhythumwinder
Mukherjee, Prerana
[J]. PATTERN RECOGNITION LETTERS, 2024, 184 : 28 - 36
[6] Quadruplet-based deep hashing for image retrieval
Zhu, Jie
Chen, Zhipeng
Zhao, Li
Wu, Shufang
[J]. NEUROCOMPUTING, 2019, 366 : 161 - 169
[7] Deep Hamming Embedding Based Hashing for Image Retrieval
Lin J.
Liu H.
Zheng Z.
[J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (06): : 542 - 550
[8] Deep Transfer Hashing for Image Retrieval
Zhai, Hongjia
Lai, Shenqi
Jin, Hanyang
Qian, Xueming
Mei, Tao
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) : 742 - 753
[9] Deep Progressive Hashing for Image Retrieval
Bai, Jiale
Ni, Bingbing
Wang, Minsi
Li, Zefan
Cheng, Shuo
Yang, Xiaokang
Hu, Chuanping
Gao, Wen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (12) : 3178 - 3193
[10] Hierarchical deep hashing for image retrieval
Ge Song
Xiaoyang Tan
[J]. Frontiers of Computer Science, 2017, 11 : 253 - 265

← 1 2 3 4 5 →