Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

被引：3

作者：

Agethen, Sebastian ^{[1
]}

Hsu, Winston H. ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei 10617, Taiwan

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 03期

关键词：

Kernel; Videos; Task analysis; Convolution; Feature extraction; YouTube; Mathematical model; Computational and artificial intelligence; neural networks; feedforward neural networks; recurrent neural networks; ACTION RECOGNITION; FUSION;

D O I：

10.1109/TMM.2019.2932564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.

引用

页码：819 / 829

页数：11

共 50 条

[1] Attention-Based Convolutional LSTM for Describing Video
Liu, Zhongyu
Chen, Tian
Ding, Enjie
Liu, Yafeng
Yu, Wanli
[J]. Ding, Enjie (enjied@cumt.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (08): : 133713 - 133724
[2] Attention-Based Convolutional LSTM for Describing Video
Liu, Zhongyu
Chen, Tian
Ding, Enjie
Liu, Yafeng
Yu, Wanli
[J]. IEEE ACCESS, 2020, 8 : 133713 - 133724
[3] Attention-based LSTM with Semantic Consistency for Videos Captioning
Guo, Zhao
Gao, Lianli
Song, Jingkuan
Xu, Xing
Shao, Jie
Shen, Heng Tao
[J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 357 - 361
[4] Detecting deepfake videos based on spatiotemporal attention and convolutional LSTM
Chen, Beijing
Li, Tianmu
Ding, Weiping
[J]. INFORMATION SCIENCES, 2022, 601 : 58 - 70
[5] Multi-kernel based Deep Residual Networks for Image Super-Resolution
Soh, Jae Woong
Park, Gu Yong
Cho, Nam Ik
[J]. INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT) 2019, 2019, 11049
[6] Robust fingerprint reconstruction using attention mechanism based autoencoders and multi-kernel autoencoders
Sweetlin, J. Dhalia
Bhuvaneshwari, R.
Bhagya, N.
Dharshini, N. Bavya
[J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8262 - 8277
[7] Gaze Estimation with Multi-scale Attention-based Convolutional Neural Networks
Zhang, Yuanyuan
Li, Jing
Ouyang, Gaoxiang
[J]. 2023 29TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE, M2VIP 2023, 2023,
[8] Exposing DeepFake Videos Using Attention Based Convolutional LSTM Network
Yishan Su
Huawei Xia
Qi Liang
Weizhi Nie
[J]. Neural Processing Letters, 2021, 53 : 4159 - 4175
[9] Exposing DeepFake Videos Using Attention Based Convolutional LSTM Network
Su, Yishan
Xia, Huawei
Liang, Qi
Nie, Weizhi
[J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4159 - 4175
[10] Attention-based Convolutional Neural Networks for Sentence Classification
Zhao, Zhiwei
Wu, Youzheng
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 705 - 709

← 1 2 3 4 5 →