An Empirical Study of CLIP for Text-Based Person Search

被引:0
|
作者
Cao, Min [1 ]
Bai, Yang [1 ]
Zeng, Ziyin [1 ]
Ye, Mang [2 ]
Zhang, Min [3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[3] Harbin Inst Technol, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-based Person Search (TBPS) aims to retrieve the person images using natural language descriptions. Recently, Contrastive Language Image Pretraining (CLIP), a universal large cross-modal vision-language pre-training model, has remarkably performed over various cross-modal downstream tasks due to its powerful cross-modal semantic learning capacity. TPBS, as a fine-grained cross-modal retrieval task, is also facing the rise of research on the CLIP-based TBPS. In order to explore the potential of the visual-language pre-training model for downstream TBPS tasks, this paper makes the first attempt to conduct a comprehensive empirical study of CLIP for TBPS and thus contribute a straightforward, incremental, yet strong TBPS-CLIP baseline to the TBPS community. We revisit critical design considerations under CLIP, including data augmentation and loss function. The model, with the aforementioned designs and practical training tricks, can attain satisfactory performance without any sophisticated modules. Also, we conduct the probing experiments of TBPS-CLIP in model generalization and model compression, demonstrating the effectiveness of TBPS-CLIP from various aspects. This work is expected to provide empirical insights and highlight future CLIP-based TBPS research.The code is available at https://github.com/Flame-Chasers/TBPS-CLIP.
引用
收藏
页码:465 / 473
页数:9
相关论文
共 50 条
  • [1] Diverse Person: Customize Your Own Dataset for Text-Based Person Search
    Song, Zifan
    Hu, Guosheng
    Zhao, Cairong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4943 - 4951
  • [2] Hierarchical Gumbel Attention Network for Text-based Person Search
    Zheng, Kecheng
    Liu, Wu
    Liu, Jiawei
    Zha, Zheng-Jun
    Mei, Tao
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3441 - 3449
  • [3] Text-based Person Search via Virtual Attribute Learning
    Wang, Cheng-Ji
    Su, Jia-Wei
    Luo, Zhi-Ming
    Cao, Dong-Lin
    Lin, Yao-Jin
    Li, Shao-Zi
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2035 - 2050
  • [4] An Adaptive Correlation Filtering Method for Text-Based Person Search
    Sun, Mengyang
    Suo, Wei
    Wang, Peng
    Niu, Kai
    Liu, Le
    Lin, Guosheng
    Zhang, Yanning
    Wu, Qi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [5] Conditional Feature Learning Based Transformer for Text-Based Person Search
    Gao, Chenyang
    Cai, Guanyu
    Jiang, Xinyang
    Zheng, Feng
    Zhang, Jun
    Gong, Yifei
    Lin, Fangzhou
    Sun, Xing
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6097 - 6108
  • [6] Text-based Person Search without Parallel Image-Text Data
    Bai, Yang
    Wang, Jingyao
    Cao, Min
    Chen, Chen
    Cao, Ziqiang
    Nie, Liqiang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 757 - 767
  • [7] Text-Guided Visual Feature Refinement for Text-Based Person Search
    Gao, Liying
    Niu, Kai
    Ma, Zehong
    Jiao, Bingliang
    Tan, Tonghao
    Wang, Peng
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 118 - 126
  • [8] A Simple and Robust Correlation Filtering Method for Text-Based Person Search
    Suo, Wei
    Sun, Mengyang
    Niu, Kai
    Gao, Yiqi
    Wang, Peng
    Zhang, Yanning
    Wu, Qi
    [J]. COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 726 - 742
  • [9] Asymmetric Cross-Scale Alignment for Text-Based Person Search
    Ji, Zhong
    Hu, Junhua
    Liu, Deyin
    Wu, Lin Yuanbo
    Zhao, Ye
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7699 - 7709
  • [10] Feature semantic alignment and information supplement for Text-based person search
    Zhou, Hang
    Li, Fan
    Tian, Xuening
    Huang, Yuling
    [J]. FRONTIERS IN PHYSICS, 2023, 11