Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification

被引:0
|
作者
Zhao, Zhiwei [1 ,2 ]
Liu, Bin [1 ,2 ]
Lu, Yan [3 ]
Chu, Qi [1 ,2 ]
Yu, Nenghai [1 ,2 ]
机构
[1] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei, Peoples R China
[2] CAS Key Lab Electromagnet Space Informat, Beijing, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Image person re-identification (TI-ReID) aims to retrieve the images of target identity according to the given textual description. The existing methods in TI-ReID focus on aligning the visual and textual modalities through contrastive feature alignment or reconstructive masked language modeling (MLM). However, these methods parameterize the image/text instances as deterministic embeddings and do not explicitly consider the inherent uncertainty in pedestrian images and their textual descriptions, leading to limited imagetext relationship expression and semantic alignment. To address the above problem, in this paper, we propose a novel method that unifies multi-modal uncertainty modeling and semantic alignment for TI-ReID. Specifically, we model the image and textual feature vectors of pedestrian as Gaussian distributions, where the multi-granularity uncertainty of the distribution is estimated by incorporating batch-level and identity-level feature variances for each modality. The multimodal uncertainty modeling acts as a feature augmentation and provides richer image-text semantic relationship. Then we present a bi-directional cross-modal circle loss to more effectively align the probabilistic features between image and text in a self-paced manner. To further promote more comprehensive image-text semantic alignment, we design a task that complements the masked language modeling, focusing on the cross-modality semantic recovery of global masked token after cross-modal interaction. Extensive experiments conducted on three TI-ReID datasets highlight the effectiveness and superiority of our method over state-of-the-arts.
引用
收藏
页码:7534 / 7542
页数:9
相关论文
共 50 条
  • [31] Joint graph regularized dictionary learning and sparse ranking for multi-modal multi-shot person re-identification
    Zheng, Aihua
    Li, Hongchao
    Jiang, Bo
    Zheng, Wei-Shi
    Luo, Bin
    PATTERN RECOGNITION, 2020, 104 (104)
  • [32] A multi-branch attention and alignment network for person re-identification
    Lyu, Chunyan
    Ning, Wu
    Wang, Chenhui
    Wang, Kejun
    APPLIED INTELLIGENCE, 2022, 52 (10) : 10845 - 10866
  • [33] A multi-branch attention and alignment network for person re-identification
    Chunyan Lyu
    Wu Ning
    Chenhui Wang
    Kejun Wang
    Applied Intelligence, 2022, 52 : 10845 - 10866
  • [34] DSPI - Dual Semantic Parsing Image: A Robust Person Representation for Person Re-Identification
    Dau Anh Dung
    Nakamura, Yasuhiro
    2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024, 2024, : 613 - 618
  • [35] Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification
    Gong, Tiantian
    Du, Guodong
    Wang, Junsheng
    Ding, Yongkang
    Zhang, Liyan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5253 - 5261
  • [36] MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification
    Zhang, Ruijuan
    Xu, Lizhong
    Yang, Song
    Wang, Li
    SENSORS, 2024, 24 (14)
  • [37] MaxFusion: Plug&Play Multi-modal Generation in Text-to-Image Diffusion Models
    Nair, Nithin Gopalakrishnan
    Valanarasu, Jeya Maria Jose
    Patel, Vishal M.
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 93 - 110
  • [38] Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
    Zhai, Yajing
    Zeng, Yawen
    Huang, Zhiyong
    Qin, Zheng
    Jin, Xin
    Cao, Da
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6979 - 6987
  • [39] Bottom-up color-independent alignment learning for text-image person re-identification
    Du, Guodong
    Zhu, Hanyue
    Zhang, Liyan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [40] Cascaded Cross-modal Alignment for Visible-Infrared Person Re-Identification
    Li, Zhaohui
    Wang, Qiangchang
    Chen, Lu
    Zhang, Xinxin
    Yin, Yilong
    KNOWLEDGE-BASED SYSTEMS, 2024, 305