MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning

被引:0
|
作者
Li, Xuelong [1 ]
Zhao, Bin [2 ,3 ]
Lu, Xiaoqiang [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Xian 710119, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[3] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning Optimal, Xian 710072, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.
引用
收藏
页码:2208 / 2214
页数:7
相关论文
共 50 条
  • [1] CAM-RNN: Co-Attention Model Based RNN for Video Captioning
    Zhao, Bin
    Li, Xuelong
    Lu, Xiaoqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (11) : 5552 - 5565
  • [2] Detecting Personal Medication Intake in Twitter via Domain Attention-Based RNN with Multi-Level Features
    Xiong, Shufeng
    Batra, Vishwash
    Liu, Liangliang
    Xi, Lei
    Sun, Changxia
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [3] Multi-level video captioning method based on semantic space
    Yao, Xiao
    Zeng, Yuanlin
    Gu, Min
    Yuan, Ruxi
    Li, Jie
    Ge, Junyi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72113 - 72130
  • [4] Attention-based RNN with question-aware loss and multi-level copying mechanism for natural answer generation
    Zhao, Fen
    Shao, Huishuang
    Li, Shuo
    Wang, Yintong
    Yu, Yan
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 7249 - 7264
  • [5] Auxiliary Classifier based Residual RNN for Image Captioning
    Cayli, Ozkan
    Kilic, Volkan
    Onan, Aytug
    Wang, Wenwu
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1126 - 1130
  • [6] MMA-RNN: A multi-level multi-task attention-based recurrent neural network for discrimination and localization of atrial fibrillation
    Sun, Yifan
    Shen, Jingyan
    Jiang, Yunfan
    Huang, Zhaohui
    Hao, Minsheng
    Zhang, Xuegong
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [7] Attention Based RNN Model for Document Image Quality Assessment
    Li, Pengchao
    Peng, Liangrui
    Cai, Junyang
    Ding, Xiaoqing
    Ge, Shuangkui
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 819 - 825
  • [8] RNN-T BASED OPEN-VOCABULARY KEYWORD SPOTTING IN MANDARIN WITH MULTI-LEVEL DETECTION
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5649 - 5653
  • [9] MRCap: Multi-modal and Multi-level Relationship-based Dense Video Captioning
    Chen, Wei
    Niu, Jianwei
    Liu, Xuefeng
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2615 - 2620
  • [10] Video quality enhancement based on visual attention model and multi-level exposure correction
    Guo-Shiang Lin
    Xian-Wei Ji
    [J]. Multimedia Tools and Applications, 2016, 75 : 9903 - 9925