Memory Augmented Deep Recurrent Neural Network for Video Question Answering

被引:15
|
作者
Yin, Chengxiang [1 ]
Tang, Jian [1 ,2 ]
Xu, Zhiyuan [1 ]
Wang, Yanzhi [3 ]
机构
[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
[2] DiDi AI Labs, Beijing 100193, Peoples R China
[3] Northeastern Univ, Dept Elect & Engn, Boston, MA 02115 USA
关键词
Task analysis; Knowledge discovery; Computational modeling; Recurrent neural networks; Data models; Semantics; Deep learning; differentiable neural computer (DNC); memory augmented neural network; recurrent neural network (RNN); video question answering (VideoQA);
D O I
10.1109/TNNLS.2019.2938015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.
引用
收藏
页码:3159 / 3167
页数:9
相关论文
共 50 条
  • [41] A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering
    Guo, Zhicheng
    Zhao, Jiaxuan
    Jiao, Licheng
    Liu, Xu
    Liu, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 38 - 49
  • [42] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
    Gao, Lianli
    Lei, Yu
    Zeng, Pengpeng
    Song, Jingkuan
    Wang, Meng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
  • [43] Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
    Fan, Chenyou
    Zhang, Xiaofan
    Zhang, Shu
    Wang, Wensheng
    Zhang, Chi
    Huang, Heng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1999 - 2007
  • [44] Affective question answering on video
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    NEUROCOMPUTING, 2019, 363 : 125 - 139
  • [45] Progressive Attention Memory Network for Movie Story Question Answering
    Kim, Junyeong
    Ma, Minuk
    Kim, Kyungsu
    Kim, Sungjin
    Yoo, Chang D.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8329 - 8338
  • [46] LINEARLY AUGMENTED DEEP NEURAL NETWORK
    Ghahremani, Pegah
    Droppo, Jasha
    Seltzer, Michael L.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5085 - 5089
  • [47] Video Graph Transformer for Video Question Answering
    Xiao, Junbin
    Zhou, Pan
    Chua, Tat-Seng
    Yan, Shuicheng
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 39 - 58
  • [48] Video Reference: A Video Question Answering Engine
    Gao, Lei
    Li, Guangda
    Zheng, Yan-Tao
    Hong, Richang
    Chua, Tat-Seng
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 799 - +
  • [49] Intelligent Question Answering System based on Artificial Neural Network
    Ansari, Ahlam
    Maknojia, Moonish
    Shaikh, Altamash
    PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ENGINEERING & TECHNOLOGY ICETECH-2016, 2016, : 758 - 763
  • [50] Deep Modular Bilinear Attention Network for Visual Question Answering
    Yan, Feng
    Silamu, Wushouer
    Li, Yanbing
    SENSORS, 2022, 22 (03)