Spatio-temporal Prompting Network for Robust Video Feature Extraction

被引:0
|
作者
Sun, Guanxiong [1 ,2 ]
Wang, Chi [1 ]
Zhang, Zhaoyu [1 ]
Deng, Jiankang [2 ,3 ]
Zafeiriou, Stefanos [3 ]
Hua, Yang [1 ]
机构
[1] Queens Univ Belfast, Belfast, Antrim, North Ireland
[2] Huawei UKRD, Cambridge, England
[3] Imperial Coll London, London, England
关键词
D O I
10.1109/ICCV51070.2023.01250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame quality deterioration is one of the main challenges in the field of video understanding. To compensate for the information loss caused by deteriorated frames, recent approaches exploit transformer-based integration modules to obtain spatio-temporal information. However, these integration modules are heavy and complex. Furthermore, each integration module is specifically tailored for its target task, making it difficult to generalise to multiple tasks. In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). It can efficiently extract robust and accurate video features by dynamically adjusting the input features in the backbone network. Specifically, STPN predicts several video prompts containing spatio-temporal information of neighbour frames. Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction. Moreover, STPN is easy to generalise to various video tasks because it does not contain task-specific modules. Without bells and whistles, STPN achieves state-of-the-art performance on three widely-used datasets for different video understanding tasks, i.e., ImageNetVID for video object detection, YouTubeVIS for video instance segmentation, and GOT-10k for visual object tracking. Codes are available at https://github.com/guanxiongsun/STPN
引用
收藏
页码:13541 / 13551
页数:11
相关论文
共 50 条
  • [11] Spatio-temporal feature points detection and extraction based on convolutional neural network
    Yang, Chaoyu
    Liu, Qian
    Liang, Yincheng
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 400 - 403
  • [12] Multi-Attention Spatio-Temporal Feature Extraction Network for Image Deraining
    Feng, Shangyu
    Yin, Dejun
    Sun, Shijun
    [J]. 2023 8th International Conference on Intelligent Computing and Signal Processing, ICSP 2023, 2023, : 1804 - 1809
  • [13] Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature
    Ma, Chao
    Gu, Yun
    Liu, Wei
    Yang, Jie
    He, Xiangjian
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 511 - 518
  • [14] Spatio-Temporal Transformer Network for Video Restoration
    Kim, Tae Hyun
    Sajjadi, Mehdi S. M.
    Hirsch, Michael
    Schoelkopf, Bernhard
    [J]. COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
  • [15] Spatio-temporal feature extraction in sensory electroneurographic signals
    Silveira, C.
    Khushaba, R. N.
    Brunton, E.
    Nazarpour, K.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2022, 380 (2228):
  • [16] Video Caption Extraction Using Spatio-Temporal Slices
    Chen, Liang-Hua
    Su, Chih-Wen
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2018, 18 (02)
  • [17] The research of video matching algorithm based on spatio-temporal feature
    Jia, Ke-Bin
    Deng, Zhi-Pin
    Zhuang, Xin-Yue
    [J]. 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2007, : 165 - 168
  • [18] Learning Feature Semantic Matching for Spatio-Temporal Video Grounding
    Zhang, Tong
    Fang, Hao
    Zhang, Hao
    Gao, Jialin
    Lu, Xiankai
    Nie, Xiushan
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9268 - 9279
  • [19] A Video Retrieval Algorithm Based on Spatio-temporal Feature Curves
    Chen, Xiuxin
    Jia, Kebin
    Zhuang, Xinyue
    [J]. 2008 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 287 - 290
  • [20] Spatio-Temporal feature based VLAD for efficient Video retrieval
    Reddy, Mopuri K.
    Arora, Sahil
    Babu, R. Venkatesh
    [J]. 2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,