Video Captioning by Adversarial LSTM

被引:146
|
作者
Yang, Yang [1 ,2 ]
Zhou, Jie [1 ,2 ]
Ai, Jiangbo [1 ,2 ]
Bin, Yi [1 ,2 ]
Hanjalic, Alan
Shen, Heng Tao [1 ,2 ,3 ]
Ji, Yanli [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
[3] Delft Univ Technol, Intelligent Syst Dept, Multimedia Comp Grp, NL-2628 CD Delft, Netherlands
基金
中国国家自然科学基金;
关键词
Video captioning; adversarial training; LSTM;
D O I
10.1109/TIP.2018.2855422
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel approach to video captioning based on adversarial learning and long shortterm memory (LSTM). With this solution concept, we aim at compensating for the deficiencies of LSTM-based video captioning methods that generally show potential to effectively handle temporal nature of video data when generating captions but also typically suffer from exponential error accumulation. Specifically, we adopt a standard generative adversarial network (GAN) architecture, characterized by an interplay of two competing processes: a "generator" that generates textual sentences given the visual content of a video and a "discriminator" that controls the accuracy of the generated sentences. The discriminator acts as an "adversary" toward the generator, and with its controlling mechanism, it helps the generator to become more accurate. For the generator module, we take an existing video captioning concept using LSTM network. For the discriminator, we propose a novel realization specifically tuned for the video captioning problem and taking both the sentences and video features as input. This leads to our proposed LSTM-GAN system architecture, for which we show experimentally to significantly outperform the existing methods on standard public datasets.
引用
下载
收藏
页码:5600 / 5611
页数:12
相关论文
共 50 条
  • [1] Adversarial Video Captioning Regular Paper
    Adari, Suman K.
    Garcia, Washington
    Butler, Kevin
    2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2019, : 24 - 27
  • [2] Residual attention-based LSTM for video captioning
    Xiangpeng Li
    Zhilong Zhou
    Lijiang Chen
    Lianli Gao
    World Wide Web, 2019, 22 : 621 - 636
  • [3] Residual attention-based LSTM for video captioning
    Li, Xiangpeng
    Zhou, Zhilong
    Chen, Lijiang
    Gao, Lianli
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 621 - 636
  • [4] Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
    Song, Jingkuan
    Gao, Lianli
    Guo, Zhao
    Liu, Wu
    Zhang, Dongxiang
    Shen, Heng Tao
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2737 - 2743
  • [5] Learning Multimodal Attention LSTM Networks for Video Captioning
    Xu, Jun
    Yao, Ting
    Zhang, Yongdong
    Mei, Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 537 - 545
  • [6] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [7] Attention-based Densely Connected LSTM for Video Captioning
    Zhu, Yongqing
    Jiang, Shuqiang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 802 - 810
  • [8] Unsupervised Video Summarization with Adversarial LSTM Networks
    Mahasseni, Behrooz
    Lam, Michael
    Todorovic, Sinisa
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2982 - 2991
  • [9] Video captioning using Semantically Contextual Generative Adversarial Network
    Munusamy, Hemalatha
    Sekhar, C. Chandra
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221
  • [10] Adversarial Reinforcement Learning With Object-Scene Relational Graph for Video Captioning
    Hua, Xia
    Wang, Xinqing
    Rui, Ting
    Shao, Faming
    Wang, Dong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2004 - 2016