SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引:3
|
作者
Zhang, Hongcheng [1 ]
Zhao, Xu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
video understanding; video action detection; spatio-temporal action detection; anchor-free detector;
D O I
10.1109/ICASSP43922.2022.9746817
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.
引用
收藏
页码:2180 / 2184
页数:5
相关论文
共 50 条
  • [21] Interactive spatio-temporal feature learning network for video foreground detection
    Zhang, Hongrui
    Li, Huan
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4251 - 4263
  • [22] Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
    Huang, Lili
    Yan, Pengxiang
    Li, Guanbin
    Wang, Qing
    Lin, Liang
    IEEE ACCESS, 2019, 7 : 166203 - 166213
  • [23] Spatio-Temporal Transformer Network for Video Restoration
    Kim, Tae Hyun
    Sajjadi, Mehdi S. M.
    Hirsch, Michael
    Schoelkopf, Bernhard
    COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
  • [24] Motion tracking as spatio-temporal motion boundary detection
    Mitiche, A
    Feghali, R
    Mansouri, A
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2003, 43 (01) : 39 - 50
  • [25] ActionVLAD: Learning spatio-temporal aggregation for action classification
    Girdhar, Rohit
    Ramanan, Deva
    Gupta, Abhinav
    Sivic, Josef
    Russell, Bryan
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3165 - 3174
  • [26] Spatio-Temporal AutoEncoder for Video Anomaly Detection
    Zhao, Yiru
    Deng, Bing
    Shen, Chen
    Liu, Yao
    Lu, Hongtao
    Hua, Xian-Sheng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1933 - 1941
  • [27] Stme-net: spatio-temporal motion excitation network for action recognition
    Zhao, Qian
    Su, Yanxiong
    Zhang, Hui
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2025, 22 (02)
  • [28] Video anomaly detection with spatio-temporal dissociation
    Chang, Yunpeng
    Tu, Zhigang
    Xie, Wei
    Luo, Bin
    Zhang, Shifu
    Sui, Haigang
    Yuan, Junsong
    PATTERN RECOGNITION, 2022, 122
  • [29] Spatio-temporal compression of the motion field in video coding
    Grigoriu, L
    2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 129 - 134
  • [30] Video Relation Detection with Spatio-Temporal Graph
    Qian, Xufeng
    Zhuang, Yueting
    Li, Yimeng
    Xiao, Shaoning
    Pu, Shiliang
    Xiao, Jun
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 84 - 93