Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

被引:3
|
作者
Agethen, Sebastian [1 ]
Hsu, Winston H. [1 ]
机构
[1] Natl Taiwan Univ, Taipei 10617, Taiwan
关键词
Kernel; Videos; Task analysis; Convolution; Feature extraction; YouTube; Mathematical model; Computational and artificial intelligence; neural networks; feedforward neural networks; recurrent neural networks; ACTION RECOGNITION; FUSION;
D O I
10.1109/TMM.2019.2932564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.
引用
收藏
页码:819 / 829
页数:11
相关论文
共 50 条
  • [1] Attention-Based Convolutional LSTM for Describing Video
    Liu, Zhongyu
    Chen, Tian
    Ding, Enjie
    Liu, Yafeng
    Yu, Wanli
    [J]. Ding, Enjie (enjied@cumt.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (08): : 133713 - 133724
  • [2] Attention-Based Convolutional LSTM for Describing Video
    Liu, Zhongyu
    Chen, Tian
    Ding, Enjie
    Liu, Yafeng
    Yu, Wanli
    [J]. IEEE ACCESS, 2020, 8 : 133713 - 133724
  • [3] Attention-based LSTM with Semantic Consistency for Videos Captioning
    Guo, Zhao
    Gao, Lianli
    Song, Jingkuan
    Xu, Xing
    Shao, Jie
    Shen, Heng Tao
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 357 - 361
  • [4] Detecting deepfake videos based on spatiotemporal attention and convolutional LSTM
    Chen, Beijing
    Li, Tianmu
    Ding, Weiping
    [J]. INFORMATION SCIENCES, 2022, 601 : 58 - 70
  • [5] Multi-kernel based Deep Residual Networks for Image Super-Resolution
    Soh, Jae Woong
    Park, Gu Yong
    Cho, Nam Ik
    [J]. INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT) 2019, 2019, 11049
  • [6] Robust fingerprint reconstruction using attention mechanism based autoencoders and multi-kernel autoencoders
    Sweetlin, J. Dhalia
    Bhuvaneshwari, R.
    Bhagya, N.
    Dharshini, N. Bavya
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8262 - 8277
  • [7] Gaze Estimation with Multi-scale Attention-based Convolutional Neural Networks
    Zhang, Yuanyuan
    Li, Jing
    Ouyang, Gaoxiang
    [J]. 2023 29TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE, M2VIP 2023, 2023,
  • [8] Exposing DeepFake Videos Using Attention Based Convolutional LSTM Network
    Yishan Su
    Huawei Xia
    Qi Liang
    Weizhi Nie
    [J]. Neural Processing Letters, 2021, 53 : 4159 - 4175
  • [9] Exposing DeepFake Videos Using Attention Based Convolutional LSTM Network
    Su, Yishan
    Xia, Huawei
    Liang, Qi
    Nie, Weizhi
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4159 - 4175
  • [10] Attention-based Convolutional Neural Networks for Sentence Classification
    Zhao, Zhiwei
    Wu, Youzheng
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 705 - 709