Exploiting spatio-temporal knowledge for video action recognition

被引：3

作者：

Zhang, Huigang ^{[1
]}

Wang, Liuan ^{[1
]}

Sun, Jun ^{[1
]}

机构：

[1] Fujitsu R&D Ctr, Beijing 100022, Peoples R China

来源：

IET COMPUTER VISION | 2023年 / 17卷 / 02期

关键词：

action recognition; commonsense knowledge; GCN; STKM;

D O I：

10.1049/cvi2.12154

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action recognition has been a popular area of computer vision research in recent years. The goal of this task is to recognise human actions in video frames. Most existing methods often depend on the visual features and their relationships inside the videos. The extracted features only represent the visual information of the current video itself and cannot represent the general knowledge of particular actions beyond the video. Thus, there are some deviations in these features, and the recognition performance still requires improvement. In this sudy, we present a novel spatio-temporal knowledge module (STKM) to endow the current methods with commonsense knowledge. To this end, we first collect hybrid external knowledge from universal fields, which contains both visual and semantic information. Then graph convolution networks (GCN) are used to represent and aggregate this knowledge. The GCNs involve (i) a spatial graph to capture spatial relations and (ii) a temporal graph to capture serial occurrence relations among actions. By integrating knowledge and visual features, we can get better recognition results. Experiments on AVA, UCF101-24 and JHMDB datasets show the robustness and generalisation ability of STKM. The results report a new state-of-the-art 32.0 mAP on AVA v2.1. On UCF101-24 and JHMDB datasets, our method also improves by 1.5 AP and 2.6 AP, respectively, over the baseline method.

引用

页码：222 / 230

页数：9

共 50 条

[1] Spatio-temporal Video Autoencoder for Human Action Recognition
Sousa e Santos, Anderson Carlos
Pedrini, Helio
[J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
[2] Interpretable Spatio-temporal Attention for Video Action Recognition
Meng, Lili
Zhao, Bo
Chang, Bo
Huang, Gao
Sun, Wei
Tung, Frederich
Sigal, Leonid
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
[3] Human Action Recognition Based on a Spatio-Temporal Video Autoencoder
Sousa e Santos, Anderson Carlos
Pedrini, Helio
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)
[4] Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Wasim, Syed Talal
Khattak, Muhammad Uzair
Naseer, Muzammal
Khan, Salman
Shah, Mubarak
Khan, Fahad Shahbaz
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13732 - 13743
[5] Histogram of Fuzzy Local Spatio-Temporal Descriptors for Video Action Recognition
Zuo, Zheming
Yang, Longzhi
Liu, Yonghuai
Chao, Fei
Song, Ran
Qu, Yanpeng
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (06) : 4059 - 4067
[6] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
Gong, Suming
Chen, Ying
[J]. 2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
[7] Learning spatio-temporal features for action recognition from the side of the video
Pei, Lishen
Ye, Mao
Zhao, Xuezhuan
Xiang, Tao
Li, Tao
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
[8] VIDEO ACTION RECOGNITION WITH SPATIO-TEMPORAL GRAPH EMBEDDING AND SPLINE MODELING
Yuan, Yin
Zheng, Haomian
Li, Zhu
Zhang, David
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2422 - 2425
[9] Learning spatio-temporal features for action recognition from the side of the video
Lishen Pei
Mao Ye
Xuezhuan Zhao
Tao Xiang
Tao Li
[J]. Signal, Image and Video Processing, 2016, 10 : 199 - 206
[10] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
Borzeshi, Ehsan Zare
Concha, Oscar Perez
Piccardi, Massimo
[J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482

← 1 2 3 4 5 →