Multi-Granularity Matching Transformer for Text-Based Person Search

被引：0

作者：

Bao, Liping ^{[1
]}

Wei, Longhui ^{[2
]}

Zhou, Wengang ^{[1
]}

Liu, Lin ^{[1
]}

Xie, Lingxi ^{[3
]}

Li, Houqiang ^{[1
]}

Tian, Qi ^{[3
]}

机构：

[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China

[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China

[3] Huawei Cloud, Shenzhen 518129, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Transformers; Feature extraction; Task analysis; Pedestrians; Visualization; Search problems; Training; Text-based person search; transformer; vision-language pre-trained model; REIDENTIFICATION; ALIGNMENT;

D O I：

10.1109/TMM.2023.3321504

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-based person search aims to retrieve the most relevant pedestrian images from an image gallery based on textual descriptions. Most existing methods rely on two separate encoders to extract the image and text features, and then elaborately design various schemes to bridge the gap between image and text modalities. However, the shallow interaction between both modalities in these methods is still insufficient to eliminate the modality gap. To address the above problem, we propose TransTPS, a transformer-based framework that enables deeper interaction between both modalities through the self-attention mechanism in transformer, effectively alleviating the modality gap. In addition, due to the small inter-class variance and large intra-class variance in image modality, we further develop two techniques to overcome these limitations. Specifically, Cross-modal Multi-Granularity Matching (CMGM) is proposed to address the problem caused by small inter-class variance and facilitate distinguishing pedestrians with similar appearance. Besides, Contrastive Loss with Weakly Positive pairs (CLWP) is introduced to mitigate the impact of large intra-class variance and contribute to the retrieval of more target images. Experiments on CUHK-PEDES and RSTPReID datasets demonstrate that our proposed framework achieves state-of-the-art performance compared to previous methods.

引用

页码：4281 / 4293

页数：13

共 50 条

[1] Text-based Person Search via Multi-Granularity Embedding Learning
Wang, Chengji
Luo, Zhiming
Lin, Yaojin
Li, Shaozi
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1068 - 1074
[2] Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search
Jing, Ya
Si, Chenyang
Wang, Junbo
Wang, Wei
Wang, Liang
Tan, Tieniu
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11189 - 11196
[3] Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization
Li, Shenshen
Xu, Xing
Shen, Fumin
Yang, Yang
[J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 307 - 315
[4] Text-Based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning
Wu, Xinyi
Ma, Wentao
Guo, Dan
Zhou, Tongqing
Zhao, Shan
Cai, Zhiping
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6162 - 6170
[5] Conditional Feature Learning Based Transformer for Text-Based Person Search
Gao, Chenyang
Cai, Guanyu
Jiang, Xinyang
Zheng, Feng
Zhang, Jun
Gong, Yifei
Lin, Fangzhou
Sun, Xing
Bai, Xiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6097 - 6108
[6] Attentive multi-granularity perception network for person search
Zhang, Qixian
Wu, Jun
Miao, Duoqian
Zhao, Cairong
Zhang, Qi
[J]. INFORMATION SCIENCES, 2024, 681
[7] Improving Text-based Person Search by Spatial Matching and Adaptive Threshold
Chen, Tianlang
Xu, Chenliang
Luo, Jiebo
[J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1879 - 1887
[8] Text-based Person Search via Attribute-aided Matching
Aggarwal, Surbhi
Babu, R. Venkatesh
Chakraborty, Anirban
[J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2606 - 2614
[9] Multi-granularity Cross Transformer Network for person re-identification
Li, Yanping
Miao, Duoqian
Zhang, Hongyun
Zhou, Jie
Zhao, Cairong
[J]. PATTERN RECOGNITION, 2024, 150
[10] Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Wu, Hefeng
Chen, Weifeng
Liu, Zhibin
Chen, Tianshui
Chen, Zhiguang
Lin, Liang
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 7005 - 7016

← 1 2 3 4 5 →