Joint discriminative representation learning for end-to-end person search

被引:17
|
作者
Zhang, Pengcheng [1 ]
Yu, Xiaohan [2 ,3 ]
Bai, Xiao [1 ]
Wang, Chen [1 ]
Zheng, Jin [1 ]
Ning, Xin [4 ]
机构
[1] Beihang Univ, Jiangxi Res Inst, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Macquarie Univ, Sch Comp, Sydney, Australia
[3] Griffith Univ, Inst Integrated & Intelligent Syst, Brisbane, Australia
[4] Chinese Acad Sci, Inst Semicond, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
Person search; Person re-identification; Part segmentation; Batch sampling; NETWORK;
D O I
10.1016/j.patcog.2023.110053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Person search simultaneously detects and retrieves a query person from uncropped scene images. Existing methods are either two-step or end-to-end. The former employs two standalone models for the two sub-tasks, while the latter conducts person search with a unified model. Despite encouraging progress, most existing end-to-end methods focus on balancing the model between detection and retrieval sub-tasks, while ignoring to enhance the learned representation for retrieval, which leads to inferior accuracy to two-step approaches. To that end, we propose a novel hierarchical framework that jointly optimizes instance-aware and part -aware embedding to enable discriminative representation learning. Specifically, we develop a region-of-interest cosegment (ROICoseg) module that captures part-aware information without requiring extra annotations to enable fine-grained discriminative representation. On top of that, a Contextual Instance Batch Sampling (CIBS) method is introduced to effectively employ contextual information for constructing training batches, thus facilitating effective instance-aware representation learning. We further introduce the first cross-door person search dataset (CDPS) that retrieves a target person in outdoor cameras with an indoor captured image or vice versa. Extensive experiments show that our proposed model achieves competitive performance on CUHK-SYSU and outperforms state-of-the-art end-to-end methods on the more challenging PRW and CDPS.1
引用
收藏
页数:11
相关论文
共 50 条
  • [41] End-to-End Trainable Trident Person Search Network Using Adaptive Gradient Propagation
    Han, Byeong-Ju
    Ko, Kuhyeun
    Sim, Jae-Young
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 905 - 913
  • [42] END-TO-END TRAINING APPROACHES FOR DISCRIMINATIVE SEGMENTAL MODELS
    Tang, Hao
    Wang, Weiran
    Gimpel, Kevin
    Livescu, Karen
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 496 - 502
  • [43] ELIAS: End-to-End Learning to Index and Search in Large Output Spaces
    Gupta, Nilesh
    Chen, Patrick H.
    Hsiang-Fu Yu
    Cho-Jui Hsieh
    Dhillon, Inderjit S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [44] End-to-End Latent Fingerprint Search
    Cao, Kai
    Dinh-Luan Nguyen
    Tymoszek, Cori
    Jain, Anil K.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 (15) : 880 - 894
  • [45] Joint Bayesian guided metric learning for end-to-end face verification
    Chen, Di
    Xu, Chunyan
    Yang, Jian
    Qian, Jianjun
    Zheng, Yuhui
    Shen, Linlin
    NEUROCOMPUTING, 2018, 275 : 560 - 567
  • [46] END-TO-END JOINT LEARNING OF NATURAL LANGUAGE UNDERSTANDING AND DIALOGUE MANAGER
    Yang, Xuesong
    Chen, Yun-Nung
    Hakkani-Tur, Dilek
    Crook, Paul
    Li, Xiujun
    Gao, Jianfeng
    Deng, Li
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5690 - 5694
  • [47] End-to-End Incremental Learning
    Castro, Francisco M.
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    Schmid, Cordelia
    Alahari, Karteek
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 241 - 257
  • [48] HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
    Zhu, Yi
    Zhang, Hui
    Yu, Jiaqian
    Yang, Yifan
    Jung, Sangil
    Park, Seung-In
    Yoon, Byung-In
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15396 - 15406
  • [49] Deep End-to-End Representation Learning for Food Type Recognition from Speech
    Sertolli, Benjamin
    Cummins, Nicholas
    Sengur, Abdulkadir
    Schuller, Bjorn W.
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 574 - 578
  • [50] End-to-end deep representation learning for time series clustering: a comparative study
    Baptiste Lafabregue
    Jonathan Weber
    Pierre Gançarski
    Germain Forestier
    Data Mining and Knowledge Discovery, 2022, 36 : 29 - 81