Spatio-temporal Prompting Network for Robust Video Feature Extraction

被引:0
|
作者
Sun, Guanxiong [1 ,2 ]
Wang, Chi [1 ]
Zhang, Zhaoyu [1 ]
Deng, Jiankang [2 ,3 ]
Zafeiriou, Stefanos [3 ]
Hua, Yang [1 ]
机构
[1] Queens Univ Belfast, Belfast, Antrim, North Ireland
[2] Huawei UKRD, Cambridge, England
[3] Imperial Coll London, London, England
关键词
D O I
10.1109/ICCV51070.2023.01250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame quality deterioration is one of the main challenges in the field of video understanding. To compensate for the information loss caused by deteriorated frames, recent approaches exploit transformer-based integration modules to obtain spatio-temporal information. However, these integration modules are heavy and complex. Furthermore, each integration module is specifically tailored for its target task, making it difficult to generalise to multiple tasks. In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). It can efficiently extract robust and accurate video features by dynamically adjusting the input features in the backbone network. Specifically, STPN predicts several video prompts containing spatio-temporal information of neighbour frames. Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction. Moreover, STPN is easy to generalise to various video tasks because it does not contain task-specific modules. Without bells and whistles, STPN achieves state-of-the-art performance on three widely-used datasets for different video understanding tasks, i.e., ImageNetVID for video object detection, YouTubeVIS for video instance segmentation, and GOT-10k for visual object tracking. Codes are available at https://github.com/guanxiongsun/STPN
引用
收藏
页码:13541 / 13551
页数:11
相关论文
共 50 条
  • [41] Guest Editorial: Spatio-temporal Feature Learning for Unconstrained Video Analysis
    Han, Yahong
    Nie, Liqiang
    Wu, Fei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29209 - 29211
  • [42] Blind video quality assessment based on Spatio-Temporal Feature Resolver
    Bi, Xiaodong
    He, Xiaohai
    Xiong, Shuhua
    Zhao, Zeming
    Chen, Honggang
    Sheriff, Raymond Edward
    [J]. NEUROCOMPUTING, 2024, 574
  • [43] Guest Editorial: Spatio-temporal Feature Learning for Unconstrained Video Analysis
    Yahong Han
    Liqiang Nie
    Fei Wu
    [J]. Multimedia Tools and Applications, 2018, 77 : 29209 - 29211
  • [44] Dual-frame spatio-temporal feature modulation for video enhancement
    Patil, Prashant W.
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    [J]. PATTERN RECOGNITION, 2022, 130
  • [45] Spatio-temporal prediction and reconstruction network for video anomaly detection
    Liu, Ting
    Zhang, Chengqing
    Niu, Xiaodong
    Wang, Liming
    [J]. PLOS ONE, 2022, 17 (05):
  • [46] A spatio-temporal network for video semantic segmentation in surgical videos
    Maria Grammatikopoulou
    Ricardo Sanchez-Matilla
    Felix Bragman
    David Owen
    Lucy Culshaw
    Karen Kerr
    Danail Stoyanov
    Imanol Luengo
    [J]. International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 375 - 382
  • [47] A spatio-temporal network for video semantic segmentation in surgical videos
    Grammatikopoulou, Maria
    Sanchez-Matilla, Ricardo
    Bragman, Felix
    Owen, David
    Culshaw, Lucy
    Kerr, Karen
    Stoyanov, Danail
    Luengo, Imanol
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 19 (2) : 375 - 382
  • [48] SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
    Zhang, Hongcheng
    Zhao, Xu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2180 - 2184
  • [49] Video object segmentation using spatio-temporal deep network
    Ramaswamy, Akshaya
    Gubbi, Jayavardhana
    Balamuralidhar, P.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [50] A spatio-temporal network for video semantic segmentation in surgical videos
    Grammatikopoulou, Maria
    Sanchez-Matilla, Ricardo
    Bragman, Felix
    Owen, David
    Culshaw, Lucy
    Kerr, Karen
    Stoyanov, Danail
    Luengo, Imanol
    [J]. arXiv, 2023,