Spatio-temporal Prompting Network for Robust Video Feature Extraction

被引:0
|
作者
Sun, Guanxiong [1 ,2 ]
Wang, Chi [1 ]
Zhang, Zhaoyu [1 ]
Deng, Jiankang [2 ,3 ]
Zafeiriou, Stefanos [3 ]
Hua, Yang [1 ]
机构
[1] Queens Univ Belfast, Belfast, Antrim, North Ireland
[2] Huawei UKRD, Cambridge, England
[3] Imperial Coll London, London, England
关键词
D O I
10.1109/ICCV51070.2023.01250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame quality deterioration is one of the main challenges in the field of video understanding. To compensate for the information loss caused by deteriorated frames, recent approaches exploit transformer-based integration modules to obtain spatio-temporal information. However, these integration modules are heavy and complex. Furthermore, each integration module is specifically tailored for its target task, making it difficult to generalise to multiple tasks. In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). It can efficiently extract robust and accurate video features by dynamically adjusting the input features in the backbone network. Specifically, STPN predicts several video prompts containing spatio-temporal information of neighbour frames. Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction. Moreover, STPN is easy to generalise to various video tasks because it does not contain task-specific modules. Without bells and whistles, STPN achieves state-of-the-art performance on three widely-used datasets for different video understanding tasks, i.e., ImageNetVID for video object detection, YouTubeVIS for video instance segmentation, and GOT-10k for visual object tracking. Codes are available at https://github.com/guanxiongsun/STPN
引用
收藏
页码:13541 / 13551
页数:11
相关论文
共 50 条
  • [1] Incremetal spatio-temporal feature extraction and retrieval for large video database
    Geng, Bo
    Lu, Hong
    Xue, Xiangyang
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 961 - 964
  • [2] Novel Robust and Invariant Feature Extraction by Spatio-Temporal Decomposition of Images
    Korikana, Shiva Kumar
    Chandrasekaran, V.
    [J]. 8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 401 - 405
  • [3] Interactive spatio-temporal feature learning network for video foreground detection
    Hongrui Zhang
    Huan Li
    [J]. Complex & Intelligent Systems, 2022, 8 : 4251 - 4263
  • [4] Interactive spatio-temporal feature learning network for video foreground detection
    Zhang, Hongrui
    Li, Huan
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4251 - 4263
  • [5] Network Traffic Classification Model Based on Spatio-Temporal Feature Extraction
    Wang, Cheng
    Zhang, Wei
    Hao, Hao
    Shi, Huiling
    [J]. ELECTRONICS, 2024, 13 (07)
  • [6] SPATIO-TEMPORAL SALIENT FEATURE EXTRACTION FOR PERCEPTUAL CONTENT BASED VIDEO RETRIEVAL
    Megrhi, Sameh
    Souidene, Wided
    Beghdadi, Azeddine
    [J]. 2013 COLOUR AND VISUAL COMPUTING SYMPOSIUM (CVCS), 2013,
  • [7] Video anomaly detection based on attention and efficient spatio-temporal feature extraction
    Rahimpour, Seyed Mohammad
    Kazemi, Mohammad
    Moallem, Payman
    Safayani, Mehran
    [J]. VISUAL COMPUTER, 2024, 40 (10): : 6825 - 6841
  • [8] A Spatio-temporal Approach for Video Caption Extraction
    Chen, Liang-Hua
    Hsieh, Meng-Chen
    Su, Chih-Wen
    [J]. SIGMAP: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS - VOL. 5, 2016, : 83 - 88
  • [9] Spatio-temporal relationships and video object extraction
    Deng, YN
    Manjunath, BS
    [J]. CONFERENCE RECORD OF THE THIRTY-SECOND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 895 - 899
  • [10] Spatio-temporal feature points detection and extraction based on convolutional neural network
    Yang, Chaoyu
    Liu, Qian
    Liang, Yincheng
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 400 - 403