Memory Augmented Deep Recurrent Neural Network for Video Question Answering

被引：15

作者：

Yin, Chengxiang ^{[1
]}

Tang, Jian ^{[1
,2
]}

Xu, Zhiyuan ^{[1
]}

Wang, Yanzhi ^{[3
]}

机构：

[1] Syracuse Univ, Dept Elect Engn & Comp Sci, Syracuse, NY 13244 USA

[2] DiDi AI Labs, Beijing 100193, Peoples R China

[3] Northeastern Univ, Dept Elect & Engn, Boston, MA 02115 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2020年 / 31卷 / 09期

关键词：

Task analysis; Knowledge discovery; Computational modeling; Recurrent neural networks; Data models; Semantics; Deep learning; differentiable neural computer (DNC); memory augmented neural network; recurrent neural network (RNN); video question answering (VideoQA);

D O I：

10.1109/TNNLS.2019.2938015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.

引用

页码：3159 / 3167

页数：9

共 50 条

[31] CATEGORY DRIVEN DEEP RECURRENT NEURAL NETWORK FOR VIDEO SUMMARIZATION
Song, Xinhui
Chen, Ke
Lei, Jie
Sun, Li
Wang, Zhiyuan
Xie, Lei
Song, Mingli
2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2016,
[32] PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering
Mozafari, Jamshid
Kazemi, Arefeh
Moradi, Parham
Nematbakhsh, Mohammad Ali
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[33] Optimal Deep Neural Network-Based Model for Answering Visual Medical Question
Gasmi, Karim
Ben Ltaifa, Ibtihel
Lejeune, Gael
Alshammari, Hamoud
Ben Ammar, Lassaad
Mahmood, Mahmood A.
CYBERNETICS AND SYSTEMS, 2022, 53 (05) : 403 - 424
[34] Multimodal Dual Attention Memory for Video Story Question Answering
Kim, Kyung-Min
Choi, Seong-Ho
Kim, Jin-Hwa
Zhang, Byoung-Tak
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 698 - 713
[35] An Augmented Reality Question Answering System Based on Ensemble Neural Networks
Chen, Chi-Hua
Wu, Chen-Ling
Lo, Chi-Chun
Hwang, Feng-Jang
IEEE ACCESS, 2017, 5 : 17425 - 17435
[36] Enhancing Recurrent Neural Networks with Positional Attention for Question Answering
Chen, Qin
Hu, Qinmin
Huang, Jimmy Xiangji
He, Liang
An, Weijie
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 993 - 996
[37] Memory-Augmented Neural Networks on FPGA for Real-Time and Energy-Efficient Question Answering
Park, Seongsik
Jang, Jaehee
Kim, Seijoon
Na, Byunggook
Yoon, Sungroh
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (01) : 162 - 175
[38] Lightweight recurrent cross-modal encoder for video question answering
Immanuel, Steve Andreas
Jeong, Cheol
KNOWLEDGE-BASED SYSTEMS, 2023, 276
[39] Question-Aware Tube-Switch Network for Video Question Answering
Yang, Tianhao
Zha, Zheng-Jun
Xie, Hongtao
Wang, Meng
Zhang, Hanwang
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1184 - 1192
[40] Applying a Convolutional Neural Network to Legal Question Answering
Kim, Mi-Young
Xu, Ying
Goebel, Randy
NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2017, 10091 : 282 - 294

← 1 2 3 4 5 →