DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

被引:70
|
作者
Zhu, Aichun [1 ]
Wang, Zijie [1 ]
Li, Yifeng [1 ]
Wan, Xili [1 ]
Jin, Jing [1 ]
Wang, Tian [2 ]
Hu, Fangqiang [1 ]
Hua, Gang [3 ]
机构
[1] Nanjing Tech Univ, Nanjing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
[3] China Univ Min & Technol, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
person retrieval; text-based person re-identification; cross-modal retrieval; surroundings-person separation;
D O I
10.1145/3474085.3475369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality. Nevertheless, due to the complexity of high-dimensional data, the unconstrained mapping paradigms are not able to properly catch discriminative clues about the corresponding person while drop the misaligned information. Intuitively, the information contained in visual data can be divided into person information (PI) and surroundings information (SI), which are mutually exclusive from each other. To this end, we propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper to effectively extract and match person information, and hence achieve a superior retrieval accuracy. A surroundings-person separation and fusion mechanism plays the key role to realize an accurate and effective surroundings-person separation under a mutually exclusion constraint. In order to adequately utilize multimodal and multi-granular information for a higher retrieval accuracy, five diverse alignment paradigms are adopted. Extensive experiments are carried out to evaluate the proposed DSSL on CUHK-PEDES, which is currently the only accessible dataset for text-base person retrieval task. DSSL achieves the state-of-the-art performance on CUHK-PEDES. To properly evaluate our proposed DSSL in the real scenarios, a Real Scenarios Text-based Person Reidentification (RSTPReid) dataset is constructed to benefit future research on text-based person retrieval, which will be publicly available.
引用
收藏
页码:209 / 217
页数:9
相关论文
共 50 条
  • [21] LEARNING SEMANTIC-ALIGNED FEATURE REPRESENTATION FOR TEXT-BASED PERSON SEARCH
    Li, Shiping
    Cao, Min
    Zhang, Min
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2724 - 2728
  • [22] An Empirical Study of CLIP for Text-Based Person Search
    Cao, Min
    Bai, Yang
    Zeng, Ziyin
    Ye, Mang
    Zhang, Min
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 465 - 473
  • [23] Improving embedding learning by virtual attribute decoupling for text-based person search
    Chengji Wang
    Zhiming Luo
    Yaojin Lin
    Shaozi Li
    [J]. Neural Computing and Applications, 2022, 34 : 5625 - 5647
  • [24] CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval
    Wang, Zijie
    Zhu, Aichun
    Xue, Jingyi
    Wan, Xili
    Liu, Chao
    Wang, Tian
    Li, Yifeng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5314 - 5322
  • [25] Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color
    Zhu, Aichun
    Wang, Zijie
    Xue, Jingyi
    Wan, Xili
    Jin, Jing
    Wang, Tian
    Snoussi, Hichem
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [26] Learning shared features from specific and ambiguous descriptions for text-based person search
    Ke Cheng
    Qikai Geng
    Shucheng Huang
    Juanjuan Tu
    Hu Lu
    [J]. Multimedia Systems, 2024, 30
  • [27] Learning shared features from specific and ambiguous descriptions for text-based person search
    Cheng, Ke
    Geng, Qikai
    Huang, Shucheng
    Tu, Juanjuan
    Lu, Hu
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [28] Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold
    Wang, Zijie
    Zhu, Aichun
    Xue, Jingyi
    Wan, Xili
    Liu, Chao
    Wang, Tian
    Li, Yifeng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1984 - 1992
  • [29] Weakly Supervised Text-based Person Re-Identification
    Zhao, Shizhen
    Gao, Changxin
    Shao, Yuanjie
    Zheng, Wei-Shi
    Sang, Nong
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11375 - 11384
  • [30] Hierarchical Gumbel Attention Network for Text-based Person Search
    Zheng, Kecheng
    Liu, Wu
    Liu, Jiawei
    Zha, Zheng-Jun
    Mei, Tao
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3441 - 3449