Memory Augmented Deep Recurrent Neural Network for Video Question Answering

被引:15
|
作者
Yin, Chengxiang [1 ]
Tang, Jian [1 ,2 ]
Xu, Zhiyuan [1 ]
Wang, Yanzhi [3 ]
机构
[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA
[2] DiDi AI Labs, Beijing 100193, Peoples R China
[3] Northeastern Univ, Dept Elect & Engn, Boston, MA 02115 USA
关键词
Task analysis; Knowledge discovery; Computational modeling; Recurrent neural networks; Data models; Semantics; Deep learning; differentiable neural computer (DNC); memory augmented neural network; recurrent neural network (RNN); video question answering (VideoQA);
D O I
10.1109/TNNLS.2019.2938015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.
引用
收藏
页码:3159 / 3167
页数:9
相关论文
共 50 条
  • [1] Frame Augmented Alternating Attention Network for Video Question Answering
    Zhang, Wenqiao
    Tang, Siliang
    Cao, Yanpeng
    Pu, Shiliang
    Wu, Fei
    Zhuang, Yueting
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 1032 - 1041
  • [2] Video Question Answering Using a Forget Memory Network
    Ge, Yuanyuan
    Xu, Youjiang
    Han, Yahong
    COMPUTER VISION, PT I, 2017, 771 : 404 - 415
  • [3] Hierarchical Recurrent Contextual Attention Network for Video Question Answering
    Zhou, Fei
    Han, Yahong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 280 - 290
  • [4] Recurrent unit augmented memory network for video summarisation
    Su, Min
    Ma, Ran
    Zhang, Bing
    Li, Kai
    IET COMPUTER VISION, 2023, 17 (06) : 710 - 721
  • [5] Video Question Answering via Attribute-Augmented Attention Network Learning
    Ye, Yunan
    Zhao, Zhou
    Li, Yimeng
    Chen, Long
    Xiao, Jun
    Zhuang, Yueting
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 829 - 832
  • [6] Deep neural network approach for arabic community question answering
    Almiman, Ali
    Osman, Nada
    Torki, Marwan
    ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (06) : 4427 - 4434
  • [7] A Deep Neural Network Framework for English Hindi Question Answering
    Gupta, Deepak
    Ekbal, Asif
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [8] Deep Attention Neural Tensor Network for Visual Question Answering
    Bai, Yalong
    Fu, Jianlong
    Zhao, Tiejun
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
  • [9] A Question Routing Technique Using Deep Neural Network for Communities of Question Answering
    Azzam, Amr
    Tazi, Neamat
    Hossny, Ahmad
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT I, 2017, 10177 : 35 - 49
  • [10] Recurrent Memory Reasoning Network for Expert Finding in Community Question Answering
    Fu, Jinlan
    Li, Yi
    Zhang, Qi
    Wu, Qinzhuo
    Ma, Renfeng
    Huang, Xuanjing
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 187 - 195