Spatial-temporal graph-guided global attention network for video-based person re-identification

被引:0
|
作者
Xiaobao Li
Wen Wang
Qingyong Li
Jiang Zhang
机构
[1] Jiangsu Normal University,School of Computer Science and Technology
[2] Beijing Jiaotong University,Beijing Key Lab of Traffic Data Analysis and Mining
[3] China Academy of Aerospace Aerodynamics,undefined
来源
关键词
Person Re-identification; Global attention learning; Graph; Spatial-temporal;
D O I
暂无
中图分类号
学科分类号
摘要
Global attention learning has been extensively applied in video-based person re-identification due to its superiority in capturing contextual correlations. However, existing global attention learning methods usually adopt the conventional neural network to model non-Euclidean contextual correlations, resulting in a limited representation ability. Inspired by the graph-structure property of the contextual correlations, we propose a spatial-temporal graph-guided global attention network (STG3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^3$$\end{document}A) for video-based person re-identification. STG3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^3$$\end{document}A comprises two graph-guided attention modules to capture the spatial contexts within a frame and temporal contexts across all frames in a sequence for global attention learning. Furthermore, the graphs from both modules are encoded as graph representations, which combine with weighted representations to grasp the spatial-temporal contextual information adequately for video feature learning. To reduce the effect of noisy graph nodes and learn robust graph representations, a graph node attention is developed to trade-off the importance of each graph node, leading to noise-tolerant graph models. Finally, we design a graph-guided fusion scheme to integrate the representations output by these two attentive modules for a more compact video feature. Extensive experiments on MARS and DukeMTMCVideoReID datasets demonstrate the superior performance of the STG3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^3$$\end{document}A.
引用
收藏
相关论文
共 50 条
  • [21] A two-stream network with joint spatial-temporal distance for video-based person re-identification
    Han, Zhisong
    Liang, Yaling
    Chen, Zengqun
    Zhou, Zhiheng
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 3769 - 3781
  • [22] Triplet Attention Network for Video-Based Person Re-Identification
    Sun, Rui
    Liang, Qili
    Yang, Zi
    Zhao, Zhenghui
    Zhang, Xudong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (10) : 1775 - 1779
  • [23] Joint Attentive Spatial-Temporal Feature Aggregation for Video-Based Person Re-Identification
    Chen, Lin
    Yang, Hua
    Gao, Zhiyong
    [J]. IEEE ACCESS, 2019, 7 : 41230 - 41240
  • [24] Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification
    Xu, Shuangjie
    Cheng, Yu
    Gu, Kang
    Yang, Yang
    Chang, Shiyu
    Zhou, Pan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4743 - 4752
  • [25] CONVOLUTIONAL TEMPORAL ATTENTION MODEL FOR VIDEO-BASED PERSON RE-IDENTIFICATION
    Rahman, Tanzila
    Rochan, Mrigank
    Wang, Yang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1102 - 1107
  • [26] Local and global aligned spatiotemporal attention network for video-based person re-identification
    Li Cheng
    Xiao-Yuan Jing
    Xiaoke Zhu
    Chang-Hui Hu
    Guangwei Gao
    Songsong Wu
    [J]. Multimedia Tools and Applications, 2020, 79 : 34489 - 34512
  • [27] Local and global aligned spatiotemporal attention network for video-based person re-identification
    Cheng, Li
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Hu, Chang-Hui
    Gao, Guangwei
    Wu, Songsong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34489 - 34512
  • [28] Spatial and Temporal Mutual Promotion for Video-Based Person Re-Identification
    Liu, Yiheng
    Yuan, Zhenxun
    Zhou, Wengang
    Li, Houqiang
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8786 - 8793
  • [29] Video-based Person Re-identification with Spatial and Temporal Memory Networks
    Eom, Chanho
    Lee, Geon
    Lee, Junghyup
    Ham, Bumsub
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12016 - 12025
  • [30] SANet: Statistic Attention Network for Video-Based Person Re-Identification
    Bai, Shutao
    Ma, Bingpeng
    Chang, Hong
    Huang, Rui
    Shan, Shiguang
    Chen, Xilin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3866 - 3879