Mutually Reinforced Spatio-Temporal Convolutional Tube for Human Action Recognition

被引：0

作者：

Wu, Haoze ^{[1
]}

Liu, Jiawei ^{[1
]}

Zha, Zheng-Jun ^{[1
]}

Chen, Zhenzhong ^{[2
]}

Sun, Xiaoyan ^{[3
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Brain Inspired Intelligence Technol, Beijing, Peoples R China

[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China

[3] Microsoft Res Asia, Intelligent Multimedia Grp, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatiotemporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.

引用

页码：968 / 974

页数：7

共 50 条

[1] Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks
Sun, Lin
Jia, Kui
Yeung, Dit-Yan
Shi, Bertram E.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4597 - 4605
[2] Exploring hybrid spatio-temporal convolutional networks for human action recognition
Hao Wang
Yanhua Yang
Erkun Yang
Cheng Deng
Multimedia Tools and Applications, 2017, 76 : 15065 - 15081
[3] Exploring hybrid spatio-temporal convolutional networks for human action recognition
Wang, Hao
Yang, Yanhua
Yang, Erkun
Deng, Cheng
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (13) : 15065 - 15081
[4] Spatio-temporal information for human action recognition
Yao, Li
Liu, Yunjian
Huang, Shihui
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
[5] Spatio-temporal information for human action recognition
Li Yao
Yunjian Liu
Shihui Huang
EURASIP Journal on Image and Video Processing, 2016
[6] A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition
Hu, Lizhang
Xu, Jinhua
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 377 - 385
[7] Spatio-Temporal Steerable Pyramid for Human Action Recognition
Zhen, Xiantong
Shao, Ling
2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
[8] Spatio-temporal Video Autoencoder for Human Action Recognition
Sousa e Santos, Anderson Carlos
Pedrini, Helio
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
[9] Spatio-temporal Semantic Features for Human Action Recognition
Liu, Jia
Wang, Xiaonian
Li, Tianyu
Yang, Jie
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (10): : 2632 - 2649
[10] Human Action Recognition Using Spatio-temporal Classification
Fang, Chin-Hsien
Chen, Ju-Chin
Tseng, Chien-Chung
Lien, Jenn-Jier James
COMPUTER VISION - ACCV 2009, PT II, 2010, 5995 : 98 - 109

← 1 2 3 4 5 →