Attention Based End-to-End Network for Short Video Classification

被引:0
|
作者
Zhu, Hui [1 ]
Zou, Chao [2 ]
Wang, Zhenyu [2 ]
Xu, Kai [2 ]
Huang, Zihao [2 ]
机构
[1] Guangdong Mech & Elect Polytech, Sch Econ & Trade, Guangzhou, Peoples R China
[2] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
关键词
Short Video Classification; Deep Learning; Convolutional Neural Network; Self-Attention Mechanism;
D O I
10.1109/MSN57253.2022.00084
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It has been proved that three-dimensional (3D) convolutional kernel can effectively capture local features in the spatiotemporal range of videos, leading to impressive results of various models in video-related tasks. With the introduction of Transformer and the rise of self-attention mechanism, more self-attention models have been used on video representation learning recently. However, there exist limitations of local perception and self-attention operations respectively in both two types of models. Inspired by the global context network (GCNet), we take advantages of both 3D convolution and self-attention mechanism to design a novel operator called the GC-Conv block. The block performs local feature extraction and global context modeling with channel-level concatenation similarly to the dense connectivity pattern in DenseNet, which maintains the lightweight property at the same time. Furthermore, we apply it for multiple layers of our proposed end-to-end network in short video classification task while the temporal dependency is captured via dilated convolutions and bidirectional GRU for better representation. Finally, our model outperforms both state-of-the-art convolutional models and self-attention models on three human action recognition datasets with considerably fewer parameters, which demonstrates the effectiveness.
引用
收藏
页码:490 / 494
页数:5
相关论文
共 50 条
  • [31] Learning to localize image forgery using end-to-end attention network
    Ganapathi, Iyyakutti Iyappan
    Javed, Sajid
    Ali, Syed Sadaf
    Mahmood, Arif
    Vu, Ngoc-Son
    Werghi, Naoufel
    [J]. NEUROCOMPUTING, 2022, 512 : 25 - 39
  • [32] END-TO-END LIP SYNCHRONISATION BASED ON PATTERN CLASSIFICATION
    Kim, You Jin
    Heo, Hee Soo
    Chung, Soo-Whan
    Lee, Bong-Jin
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 598 - 605
  • [33] End-to-end video quality analysis and modeling for video streaming over IP network
    He, ZH
    Chen, CW
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 853 - 856
  • [34] End-to-end Answer Selection via Attention-Based Bi-LSTM Network
    Ren, Yuqi
    Zhang, Tongxuan
    Liu, Xikai
    Lin, Hongfei
    [J]. PROCEEDINGS OF 2018 1ST IEEE INTERNATIONAL CONFERENCE ON HOT INFORMATION-CENTRIC NETWORKING (HOTICN 2018), 2018, : 264 - 265
  • [35] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [36] A Novel End-to-End Image Caption Based on Multimodal Attention
    Li, Xue-Ming
    Yue, Gong
    Chen, Guang-Wei
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (06): : 867 - 874
  • [37] An End-to-End Robust Video Steganography Model Based on a Multi-Scale Neural Network
    Xu, Shutong
    Li, Zhaohong
    Zhang, Zhenzhen
    Liu, Junhui
    [J]. ELECTRONICS, 2022, 11 (24)
  • [38] Attention Flow: End-to-End Joint Attention Estimation
    Sumer, Omer
    Gerjets, Peter
    Trautwein, Ulrich
    Kasneci, Enkelejda
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 3316 - 3325
  • [39] End-To-End Security for Video Distribution
    Boho, Andras
    Van Wallendael, Glenn
    Dooms, Ann
    De Cock, Jan
    Braeckman, Geert
    Schelkens, Peter
    Preneel, Bart
    Van de Walle, Rik
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2013, 30 (02) : 97 - 107
  • [40] Retargeting Video With an End-to-End Framework
    Le, Thi-Ngoc-Hanh
    Huang, HuiGuang
    Chen, Yi-Ru
    Lee, Tong-Yee
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6164 - 6176