Multi-interaction Network with Object Relation for Video Question Answering

被引:50
|
作者
Jin, Weike [1 ]
Zhao, Zhou [1 ]
Gu, Mao [1 ]
Yu, Jun [2 ]
Xiao, Jun [1 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
video question answering; multi-interaction; object relation;
D O I
10.1145/3343031.3351065
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video question answering is an important task for testing machine's ability of video understanding. The existing methods normally focus on the combination of recurrent and convolutional neural networks to capture spatial and temporal information of the video. Recently, some work has also shown that using attention mechanism can achieve better performance. In this paper, we propose a new model called Multi-interaction network for video question answering. There are two types of interactions in our model. The first type is the multi-modal interaction between the visual and textual information. The second type is the multi-level interaction inside the multi-modal interaction. Specifically, instead of using original self-attention, we propose a new attention mechanism called multi-interaction, which can capture both element-wise and segment-wise sequence interactions, simultaneously. And in addition to the normal frame-level interaction, we also take the object relations into consideration, in order to obtain more fine-grained information, such as motions and other potential relations among these objects. We evaluate our method on TGIF-QA and other two video QA datasets. The qualitative and quantitative experimental results show the effectiveness of our model, which achieves the new state-of-the-art performance.
引用
收藏
页码:1193 / 1201
页数:9
相关论文
共 50 条
  • [31] Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network
    Zhao, Zhou
    Jiang, Xinghua
    Cai, Deng
    Xiao, Jun
    He, Xiaofei
    Pu, Shiliang
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3690 - 3696
  • [32] Frame Augmented Alternating Attention Network for Video Question Answering
    Zhang, Wenqiao
    Tang, Siliang
    Cao, Yanpeng
    Pu, Shiliang
    Wu, Fei
    Zhuang, Yueting
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 1032 - 1041
  • [33] A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering
    Guo, Zhicheng
    Zhao, Jiaxuan
    Jiao, Licheng
    Liu, Xu
    Liu, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 38 - 49
  • [34] Hierarchical Recurrent Contextual Attention Network for Video Question Answering
    Zhou, Fei
    Han, Yahong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 280 - 290
  • [35] Question-aware memory network for multi-hop question answering in human-robot interaction
    Li, Xinmeng
    Alazab, Mamoun
    Li, Qian
    Yu, Keping
    Yin, Quanjun
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 851 - 861
  • [36] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
    Gao, Lianli
    Lei, Yu
    Zeng, Pengpeng
    Song, Jingkuan
    Wang, Meng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
  • [37] Explore Multi-Step Reasoning in Video Question Answering
    Han, Yahong
    PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 5 - 5
  • [38] Explore Multi-Step Reasoning in Video Question Answering
    Song, Xiaomeng
    Shi, Yucheng
    Chen, Xin
    Han, Yahong
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 239 - 247
  • [39] Relation-aware Hierarchical Attention Framework for Video Question Answering
    Li, Fangtao
    Liu, Zihe
    Bai, Ting
    Yan, Chenghao
    Cao, Chenyu
    Wu, Bin
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 164 - 172
  • [40] MINE: A Method of Multi-Interaction Heterogeneous Information Network Embedding
    Zhu, Dongjie
    Sun, Yundong
    Li, Xiaofang
    Du, Haiwen
    Qu, Rongning
    Yu, Pingping
    Piao, Xuefeng
    Higgs, Russell
    Cao, Ning
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1343 - 1356