Multi-interaction Network with Object Relation for Video Question Answering

被引:50
|
作者
Jin, Weike [1 ]
Zhao, Zhou [1 ]
Gu, Mao [1 ]
Yu, Jun [2 ]
Xiao, Jun [1 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
video question answering; multi-interaction; object relation;
D O I
10.1145/3343031.3351065
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video question answering is an important task for testing machine's ability of video understanding. The existing methods normally focus on the combination of recurrent and convolutional neural networks to capture spatial and temporal information of the video. Recently, some work has also shown that using attention mechanism can achieve better performance. In this paper, we propose a new model called Multi-interaction network for video question answering. There are two types of interactions in our model. The first type is the multi-modal interaction between the visual and textual information. The second type is the multi-level interaction inside the multi-modal interaction. Specifically, instead of using original self-attention, we propose a new attention mechanism called multi-interaction, which can capture both element-wise and segment-wise sequence interactions, simultaneously. And in addition to the normal frame-level interaction, we also take the object relations into consideration, in order to obtain more fine-grained information, such as motions and other potential relations among these objects. We evaluate our method on TGIF-QA and other two video QA datasets. The qualitative and quantitative experimental results show the effectiveness of our model, which achieves the new state-of-the-art performance.
引用
收藏
页码:1193 / 1201
页数:9
相关论文
共 50 条
  • [21] Progressive Graph Attention Network for Video Question Answering
    Peng, Liang
    Yang, Shuangji
    Bin, Yi
    Wang, Guoqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2871 - 2879
  • [22] Video Question Answering Using a Forget Memory Network
    Ge, Yuanyuan
    Xu, Youjiang
    Han, Yahong
    COMPUTER VISION, PT I, 2017, 771 : 404 - 415
  • [23] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
    Le, Thao Minh
    Le, Vuong
    Venkatesh, Svetha
    Tran, Truyen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (11) : 3027 - 3050
  • [24] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
    Thao Minh Le
    Vuong Le
    Svetha Venkatesh
    Truyen Tran
    International Journal of Computer Vision, 2021, 129 : 3027 - 3050
  • [25] Knowledge Graph Relation Path Network for Multi-Hop Intelligent Question Answering
    Zhang Y.-M.
    Ji Q.
    Xu X.-S.
    Cheng Z.-B.
    Xiao G.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (11): : 3092 - 3099
  • [26] Question-aware memory network for multi-hop question answering in human–robot interaction
    Xinmeng Li
    Mamoun Alazab
    Qian Li
    Keping Yu
    Quanjun Yin
    Complex & Intelligent Systems, 2022, 8 : 851 - 861
  • [27] Video Question Answering With Prior Knowledge and Object-Sensitive Learning
    Zeng, Pengpeng
    Zhang, Haonan
    Gao, Lianli
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5936 - 5948
  • [28] Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
    Chao, Guan-Lin
    Rastogi, Abhinav
    Yavuz, Semih
    Hakkani-Tur, Dilek
    Chen, Jindong
    Lane, Ian
    20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 215 - 225
  • [29] Question-Aware Tube-Switch Network for Video Question Answering
    Yang, Tianhao
    Zha, Zheng-Jun
    Xie, Hongtao
    Wang, Meng
    Zhang, Hanwang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1184 - 1192
  • [30] MINE: A method of multi-interaction heterogeneous information network embedding
    Zhu D.
    Sun Y.
    Li X.
    Du H.
    Qu R.
    Yu P.
    Piao X.
    Higgs R.
    Cao N.
    Yu, Pingping (yppflx@hotmail.com), 2020, Tech Science Press (63): : 1343 - 1356