Memory Augmented Deep Recurrent Neural Network for Video Question Answering

被引:15
|
作者
Yin, Chengxiang [1 ]
Tang, Jian [1 ,2 ]
Xu, Zhiyuan [1 ]
Wang, Yanzhi [3 ]
机构
[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
[2] DiDi AI Labs, Beijing 100193, Peoples R China
[3] Northeastern Univ, Dept Elect & Engn, Boston, MA 02115 USA
关键词
Task analysis; Knowledge discovery; Computational modeling; Recurrent neural networks; Data models; Semantics; Deep learning; differentiable neural computer (DNC); memory augmented neural network; recurrent neural network (RNN); video question answering (VideoQA);
D O I
10.1109/TNNLS.2019.2938015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.
引用
收藏
页码:3159 / 3167
页数:9
相关论文
共 50 条
  • [31] CATEGORY DRIVEN DEEP RECURRENT NEURAL NETWORK FOR VIDEO SUMMARIZATION
    Song, Xinhui
    Chen, Ke
    Lei, Jie
    Sun, Li
    Wang, Zhiyuan
    Xie, Lei
    Song, Mingli
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2016,
  • [32] PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering
    Mozafari, Jamshid
    Kazemi, Arefeh
    Moradi, Parham
    Nematbakhsh, Mohammad Ali
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [33] Optimal Deep Neural Network-Based Model for Answering Visual Medical Question
    Gasmi, Karim
    Ben Ltaifa, Ibtihel
    Lejeune, Gael
    Alshammari, Hamoud
    Ben Ammar, Lassaad
    Mahmood, Mahmood A.
    CYBERNETICS AND SYSTEMS, 2022, 53 (05) : 403 - 424
  • [34] Multimodal Dual Attention Memory for Video Story Question Answering
    Kim, Kyung-Min
    Choi, Seong-Ho
    Kim, Jin-Hwa
    Zhang, Byoung-Tak
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 698 - 713
  • [35] An Augmented Reality Question Answering System Based on Ensemble Neural Networks
    Chen, Chi-Hua
    Wu, Chen-Ling
    Lo, Chi-Chun
    Hwang, Feng-Jang
    IEEE ACCESS, 2017, 5 : 17425 - 17435
  • [36] Enhancing Recurrent Neural Networks with Positional Attention for Question Answering
    Chen, Qin
    Hu, Qinmin
    Huang, Jimmy Xiangji
    He, Liang
    An, Weijie
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 993 - 996
  • [37] Memory-Augmented Neural Networks on FPGA for Real-Time and Energy-Efficient Question Answering
    Park, Seongsik
    Jang, Jaehee
    Kim, Seijoon
    Na, Byunggook
    Yoon, Sungroh
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (01) : 162 - 175
  • [38] Lightweight recurrent cross-modal encoder for video question answering
    Immanuel, Steve Andreas
    Jeong, Cheol
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [39] Question-Aware Tube-Switch Network for Video Question Answering
    Yang, Tianhao
    Zha, Zheng-Jun
    Xie, Hongtao
    Wang, Meng
    Zhang, Hanwang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1184 - 1192
  • [40] Applying a Convolutional Neural Network to Legal Question Answering
    Kim, Mi-Young
    Xu, Ying
    Goebel, Randy
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2017, 10091 : 282 - 294