Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature

被引:5
|
作者
Sun, Rui [1 ]
Huang, Qiheng [1 ]
Xia, Miaomiao [1 ]
Zhang, Jun [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Feicui Rd 420, Hefei 230000, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
person re-identification; end-to-end architecture; appearance-temporal features; Siamese network; pivotal frames;
D O I
10.3390/s18113669
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person's temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] TEMPORAL REGULARIZED SPATIAL ATTENTION FOR VIDEO-BASED PERSON RE-IDENTIFICATION
    Wang, Xueying
    Zhao, Xu
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2249 - 2253
  • [32] Spatial and Temporal Mutual Promotion for Video-Based Person Re-Identification
    Liu, Yiheng
    Yuan, Zhenxun
    Zhou, Wengang
    Li, Houqiang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8786 - 8793
  • [33] Video-based Person Re-identification with Spatial and Temporal Memory Networks
    Eom, Chanho
    Lee, Geon
    Lee, Junghyup
    Ham, Bumsub
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12016 - 12025
  • [34] End-to-End Detection and Re-identification Integrated Net for Person Search
    He, Zhenwei
    Zhang, Lei
    COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 349 - 364
  • [35] An End-to-End Foreground-Aware Network for Person Re-Identification
    Liu, Yiheng
    Zhou, Wengang
    Liu, Jianzhuang
    Qi, Guo-Jun
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2060 - 2071
  • [36] Weakly supervised end-to-end domain adaptation for person re-identification
    Zhang, Lei
    Li, Haisheng
    Liu, Ruijun
    Wang, Xiaochuan
    Wu, Xiaoqun
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 113
  • [37] Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification
    Chen, Guangyi
    Lu, Jiwen
    Yang, Ming
    Zhou, Jie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4192 - 4205
  • [38] Simultaneous visual-appearance-level and spatial-temporal-level dictionary learning for video-based person re-identification
    Zhu, Xiaoke
    Jing, Xiao-Yuan
    Ma, Fei
    Cheng, Li
    Ren, Yilin
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (11): : 7303 - 7315
  • [39] Simultaneous visual-appearance-level and spatial-temporal-level dictionary learning for video-based person re-identification
    Xiaoke Zhu
    Xiao-Yuan Jing
    Fei Ma
    Li Cheng
    Yilin Ren
    Neural Computing and Applications, 2019, 31 : 7303 - 7315
  • [40] Video-based person re-identification with scene and person attributes
    Gong, Xun
    Luo, Bin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8117 - 8128