Memory Augmented Deep Recurrent Neural Network for Video Question Answering

被引:15
|
作者
Yin, Chengxiang [1 ]
Tang, Jian [1 ,2 ]
Xu, Zhiyuan [1 ]
Wang, Yanzhi [3 ]
机构
[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
[2] DiDi AI Labs, Beijing 100193, Peoples R China
[3] Northeastern Univ, Dept Elect & Engn, Boston, MA 02115 USA
关键词
Task analysis; Knowledge discovery; Computational modeling; Recurrent neural networks; Data models; Semantics; Deep learning; differentiable neural computer (DNC); memory augmented neural network; recurrent neural network (RNN); video question answering (VideoQA);
D O I
10.1109/TNNLS.2019.2938015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.
引用
收藏
页码:3159 / 3167
页数:9
相关论文
共 50 条
  • [21] Deep Neural Network to Predict Answer Votes on Community Question Answering Sites
    Roy, Pradeep Kumar
    NEURAL PROCESSING LETTERS, 2021, 53 (02) : 1633 - 1646
  • [22] Deep Neural Network to Predict Answer Votes on Community Question Answering Sites
    Pradeep Kumar Roy
    Neural Processing Letters, 2021, 53 : 1633 - 1646
  • [23] Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
    Jiang, Ai-Wen
    Liu, Bo
    Wang, Ming-Wen
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (04) : 738 - 748
  • [24] Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
    Ai-Wen Jiang
    Bo Liu
    Ming-Wen Wang
    Journal of Computer Science and Technology, 2017, 32 : 738 - 748
  • [25] Deep memory and prediction neural network for video prediction
    Liu, Zhipeng
    Chai, Xiujuan
    Chen, Xilin
    NEUROCOMPUTING, 2019, 331 : 235 - 241
  • [26] Neural Reasoning, Fast and Slow, for Video Question Answering
    Thao Minh Le
    Vuong Le
    Venkatesh, Svetha
    Truyen Tran
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] Neuroevolution of a Modular Memory-Augmented Neural Network for Deep Memory Problems
    Khadka, Shauharda
    Chung, Jen Jen
    Tumer, Kagan
    EVOLUTIONARY COMPUTATION, 2019, 27 (04) : 639 - 664
  • [28] Conditional Cross Correlation Network for Video Question Answering
    Ouenniche, Kaouther
    Tapu, Ruxandra
    Zaharia, Titus
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 25 - 32
  • [29] Pairwise VLAD Interaction Network for Video Question Answering
    Wang, Hui
    Guo, Dan
    Hua, Xian-Sheng
    Wang, Meng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5119 - 5127
  • [30] Progressive Graph Attention Network for Video Question Answering
    Peng, Liang
    Yang, Shuangji
    Bin, Yi
    Wang, Guoqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2871 - 2879