SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

被引：3

作者：

Zhang, Hongcheng ^{[1
]}

Zhao, Xu ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

video understanding; video action detection; spatio-temporal action detection; anchor-free detector;

D O I：

10.1109/ICASSP43922.2022.9746817

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.

引用

页码：2180 / 2184

页数：5

共 50 条

[41] Online Spatio-temporal Action Detection for Eldercare
Koh, Thean Chun
Yeo, Chai Kiat
Jing, Xuan
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 126 - 127
[42] ACTION RECOGNITION USING SPATIO-TEMPORAL DIFFERENTIAL MOTION
Yadav, Gaurav Kumar
Sethi, Amit
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3415 - 3419
[43] Robot Motion Planning as Video Prediction: A Spatio-Temporal Neural Network-based Motion Planner
Zang, Xiao
Yin, Miao
Huang, Lingyi
Yu, Jingjin
Zonouz, Saman
Yuan, Bo
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 12492 - 12499
[44] STHARNet: spatio-temporal human action recognition network in content based video retrieval
Sowmyayani, S.
Rani, P. Arockia Jansi
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 82 (24) : 38051 - 38066
[45] STHARNet: spatio-temporal human action recognition network in content based video retrieval
S. Sowmyayani
P. Arockia Jansi Rani
Multimedia Tools and Applications, 2023, 82 : 38051 - 38066
[46] Spatio-Temporal Deformable Attention Network for Video Deblurring
Zhang, Huicong
Xie, Haozhe
Yao, Hongxun
COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 581 - 596
[47] Spatio-Temporal Filter Adaptive Network for Video Deblurring
Zhou, Shangchen
Zhang, Jiawei
Pan, Jinshan
Xie, Haozhe
Zuo, Wangmeng
Ren, Jimmy
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2482 - 2491
[48] Spatio-temporal Attention Network for Video Instance Segmentation
Liu, Xiaoyu
Ren, Haibing
Ye, Tingmeng
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 725 - 727
[49] Adaptive Spatio-Temporal Convolutional Network for Video Deblurring
Duan, Fengzhi
Yao, Hongxun
IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 777 - 788
[50] Spatio-Temporal Convolution-Attention Video Network
Diba, Ali
Sharma, Vivek
Arzani, Mohammad. M.
Van Gool, Luc
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 859 - 869

← 1 2 3 4 5 →