Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification

被引:0
|
作者
Zhao, Zhiwei [1 ,2 ]
Liu, Bin [1 ,2 ]
Lu, Yan [3 ]
Chu, Qi [1 ,2 ]
Yu, Nenghai [1 ,2 ]
机构
[1] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei, Peoples R China
[2] CAS Key Lab Electromagnet Space Informat, Beijing, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Image person re-identification (TI-ReID) aims to retrieve the images of target identity according to the given textual description. The existing methods in TI-ReID focus on aligning the visual and textual modalities through contrastive feature alignment or reconstructive masked language modeling (MLM). However, these methods parameterize the image/text instances as deterministic embeddings and do not explicitly consider the inherent uncertainty in pedestrian images and their textual descriptions, leading to limited imagetext relationship expression and semantic alignment. To address the above problem, in this paper, we propose a novel method that unifies multi-modal uncertainty modeling and semantic alignment for TI-ReID. Specifically, we model the image and textual feature vectors of pedestrian as Gaussian distributions, where the multi-granularity uncertainty of the distribution is estimated by incorporating batch-level and identity-level feature variances for each modality. The multimodal uncertainty modeling acts as a feature augmentation and provides richer image-text semantic relationship. Then we present a bi-directional cross-modal circle loss to more effectively align the probabilistic features between image and text in a self-paced manner. To further promote more comprehensive image-text semantic alignment, we design a task that complements the masked language modeling, focusing on the cross-modality semantic recovery of global masked token after cross-modal interaction. Extensive experiments conducted on three TI-ReID datasets highlight the effectiveness and superiority of our method over state-of-the-arts.
引用
收藏
页码:7534 / 7542
页数:9
相关论文
共 50 条
  • [41] Joint Modal Alignment and Feature Enhancement for Visible-Infrared Person Re-Identification
    Lin, Ronghui
    Wang, Rong
    Zhang, Wenjing
    Wu, Ao
    Bi, Yihan
    SENSORS, 2023, 23 (11)
  • [42] Image-to-video person re-identification with cross-modal embeddings
    Xie, Zhongwei
    Li, Lin
    Zhong, Xian
    Zhong, Luo
    Xiang, Jianwen
    PATTERN RECOGNITION LETTERS, 2020, 133 (133) : 70 - 76
  • [43] Visible-Infrared Person Re-Identification via Semantic Alignment and Affinity Inference
    Fang, Xingye
    Yang, Yang
    Fu, Ying
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11236 - 11245
  • [44] Multimodal Feature Hierarchical Fusion for Text-Image Person Re-identification
    Li, Jiaxuan
    Huang, Likun
    Zhu, Chuanhu
    Zhang, Song
    Li, Qiang
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 468 - 481
  • [45] Person re-identification based on multi-level and generated alignment network
    Chong, Yanwen
    Zhang, Chen
    Feng, Wenqiang
    Pan, Shaoming
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50 (04): : 64 - 70
  • [46] Multi-branch Body Region Alignment Network for Person Re-identification
    Fang, Han
    Chen, Jun
    Tian, Qi
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 341 - 352
  • [47] Multi-view Based Pose Alignment Method for Person Re-identification
    Zhang, Yulei
    Zhao, Qingjie
    Li, You
    PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 439 - 447
  • [48] MULTI-MODAL METRIC LEARNING FOR VEHICLE RE-IDENTIFICATION IN TRAFFIC SURVEILLANCE ENVIRONMENT
    Tang, Yi
    Wu, Di
    Jin, Zhi
    Zou, Wenbin
    Li, Xia
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2254 - 2258
  • [49] Multi-level semantic appearance representation for person re-identification system
    Fendri, Emna
    Frikha, Mayssa
    Hammami, Mohamed
    PATTERN RECOGNITION LETTERS, 2018, 115 : 30 - 38
  • [50] AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification
    Farooq, Ammarah
    Awais, Muhammad
    Kittler, Josef
    Khalid, Syed Safwan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4477 - 4485