Multi-Granularity Matching Transformer for Text-Based Person Search

被引:0
|
作者
Bao, Liping [1 ]
Wei, Longhui [2 ]
Zhou, Wengang [1 ]
Liu, Lin [1 ]
Xie, Lingxi [3 ]
Li, Houqiang [1 ]
Tian, Qi [3 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China
[3] Huawei Cloud, Shenzhen 518129, Peoples R China
关键词
Transformers; Feature extraction; Task analysis; Pedestrians; Visualization; Search problems; Training; Text-based person search; transformer; vision-language pre-trained model; REIDENTIFICATION; ALIGNMENT;
D O I
10.1109/TMM.2023.3321504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-based person search aims to retrieve the most relevant pedestrian images from an image gallery based on textual descriptions. Most existing methods rely on two separate encoders to extract the image and text features, and then elaborately design various schemes to bridge the gap between image and text modalities. However, the shallow interaction between both modalities in these methods is still insufficient to eliminate the modality gap. To address the above problem, we propose TransTPS, a transformer-based framework that enables deeper interaction between both modalities through the self-attention mechanism in transformer, effectively alleviating the modality gap. In addition, due to the small inter-class variance and large intra-class variance in image modality, we further develop two techniques to overcome these limitations. Specifically, Cross-modal Multi-Granularity Matching (CMGM) is proposed to address the problem caused by small inter-class variance and facilitate distinguishing pedestrians with similar appearance. Besides, Contrastive Loss with Weakly Positive pairs (CLWP) is introduced to mitigate the impact of large intra-class variance and contribute to the retrieval of more target images. Experiments on CUHK-PEDES and RSTPReID datasets demonstrate that our proposed framework achieves state-of-the-art performance compared to previous methods.
引用
收藏
页码:4281 / 4293
页数:13
相关论文
共 50 条
  • [1] Text-based Person Search via Multi-Granularity Embedding Learning
    Wang, Chengji
    Luo, Zhiming
    Lin, Yaojin
    Li, Shaozi
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1068 - 1074
  • [2] Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search
    Jing, Ya
    Si, Chenyang
    Wang, Junbo
    Wang, Wei
    Wang, Liang
    Tan, Tieniu
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11189 - 11196
  • [3] Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization
    Li, Shenshen
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 307 - 315
  • [4] Text-Based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning
    Wu, Xinyi
    Ma, Wentao
    Guo, Dan
    Zhou, Tongqing
    Zhao, Shan
    Cai, Zhiping
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6162 - 6170
  • [5] Conditional Feature Learning Based Transformer for Text-Based Person Search
    Gao, Chenyang
    Cai, Guanyu
    Jiang, Xinyang
    Zheng, Feng
    Zhang, Jun
    Gong, Yifei
    Lin, Fangzhou
    Sun, Xing
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6097 - 6108
  • [6] Attentive multi-granularity perception network for person search
    Zhang, Qixian
    Wu, Jun
    Miao, Duoqian
    Zhao, Cairong
    Zhang, Qi
    [J]. INFORMATION SCIENCES, 2024, 681
  • [7] Improving Text-based Person Search by Spatial Matching and Adaptive Threshold
    Chen, Tianlang
    Xu, Chenliang
    Luo, Jiebo
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1879 - 1887
  • [8] Text-based Person Search via Attribute-aided Matching
    Aggarwal, Surbhi
    Babu, R. Venkatesh
    Chakraborty, Anirban
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2606 - 2614
  • [9] Multi-granularity Cross Transformer Network for person re-identification
    Li, Yanping
    Miao, Duoqian
    Zhang, Hongyun
    Zhou, Jie
    Zhao, Cairong
    [J]. PATTERN RECOGNITION, 2024, 150
  • [10] Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
    Wu, Hefeng
    Chen, Weifeng
    Liu, Zhibin
    Chen, Tianshui
    Chen, Zhiguang
    Lin, Liang
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 7005 - 7016