SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引：3

作者：

Zhang, Hongcheng ^{[1
]}

Zhao, Xu ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

video understanding; video action detection; spatio-temporal action detection; anchor-free detector;

D O I：

10.1109/ICASSP43922.2022.9746817

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.

引用

页码：2180 / 2184

页数：5

共 50 条

[21] Interactive spatio-temporal feature learning network for video foreground detection
Zhang, Hongrui
Li, Huan
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4251 - 4263
[22] Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
Huang, Lili
Yan, Pengxiang
Li, Guanbin
Wang, Qing
Lin, Liang
IEEE ACCESS, 2019, 7 : 166203 - 166213
[23] Spatio-Temporal Transformer Network for Video Restoration
Kim, Tae Hyun
Sajjadi, Mehdi S. M.
Hirsch, Michael
Schoelkopf, Bernhard
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
[24] Motion tracking as spatio-temporal motion boundary detection
Mitiche, A
Feghali, R
Mansouri, A
ROBOTICS AND AUTONOMOUS SYSTEMS, 2003, 43 (01) : 39 - 50
[25] ActionVLAD: Learning spatio-temporal aggregation for action classification
Girdhar, Rohit
Ramanan, Deva
Gupta, Abhinav
Sivic, Josef
Russell, Bryan
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3165 - 3174
[26] Spatio-Temporal AutoEncoder for Video Anomaly Detection
Zhao, Yiru
Deng, Bing
Shen, Chen
Liu, Yao
Lu, Hongtao
Hua, Xian-Sheng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1933 - 1941
[27] Stme-net: spatio-temporal motion excitation network for action recognition
Zhao, Qian
Su, Yanxiong
Zhang, Hui
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2025, 22 (02)
[28] Video anomaly detection with spatio-temporal dissociation
Chang, Yunpeng
Tu, Zhigang
Xie, Wei
Luo, Bin
Zhang, Shifu
Sui, Haigang
Yuan, Junsong
PATTERN RECOGNITION, 2022, 122
[29] Spatio-temporal compression of the motion field in video coding
Grigoriu, L
2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 129 - 134
[30] Video Relation Detection with Spatio-Temporal Graph
Qian, Xufeng
Zhuang, Yueting
Li, Yimeng
Xiao, Shaoning
Pu, Shiliang
Xiao, Jun
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 84 - 93

← 1 2 3 4 5 →