Multi-Granularity Interaction and Integration Network for Video Question Answering

被引:7
|
作者
Wang, Yuanyuan [1 ]
Liu, Meng [2 ]
Wu, Jianlong [3 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
[2] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Question answering (information retrieval); Object oriented modeling; Video question answering; multi-granularity interaction modeling; long-tailed answers;
D O I
10.1109/TCSVT.2023.3278492
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video question answering, aiming to answer a natural language question related to the given video, has gained popularity in the last few years. Although significant improvements have been achieved, it is still confronted with two challenges: the sufficient comprehension of video content and the long-tailed answers. To this end, we propose a multi-granularity interaction and integration network for video question answering. It jointly explores multi-level intra-granularity and inter-granularity relations to enhance the comprehension of videos. To be specific, we first build a word-enhanced visual representation module to achieve cross-modal alignment. And then we advance a multi-granularity interaction module to explore the intra-granularity and inter-granularity relationships. Finally, a question-guided interaction module is developed to select question-related visual representations for answer prediction. In addition, we employ the seesaw loss for open-ended tasks to alleviate the long-tailed word distribution effect. Both the quantitative and qualitative results on TGIF-QA, MSRVTT-QA, and MSVD-QA datasets demonstrate the superiority of our model over several state-of-the-art approaches.
引用
收藏
页码:7684 / 7695
页数:12
相关论文
共 50 条
  • [1] Hierarchical synchronization with structured multi-granularity interaction for video question answering
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    NEUROCOMPUTING, 2024, 582
  • [2] Multi-Granularity Cross-Attention Network for Visual Question Answering
    Wang, Yue
    Gao, Wei
    Cheng, Xinzhou
    Wang, Xin
    Zhao, Huiying
    Xie, Zhipu
    Xu, Lexi
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2098 - 2103
  • [3] Multi-Granularity Relational Attention Network for Audio-Visual Question Answering
    Li, Linjun
    Jin, Tao
    Lin, Wang
    Jiang, Hao
    Pan, Wenwen
    Wang, Jian
    Xiao, Shuwen
    Xia, Yan
    Jiang, Weihao
    Zhao, Zhou
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7080 - 7094
  • [4] Multi-granularity Temporal Question Answering over Knowledge Graphs
    Chen, Ziyang
    Liao, Jinzhi
    Zhao, Xiang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11378 - 11392
  • [5] Multi-granularity Hierarchical Feature Extraction for Question-Answering Understanding
    Xingguo Qin
    Ya Zhou
    Guimin Huang
    Maolin Li
    Jun Li
    Cognitive Computation, 2023, 15 : 121 - 131
  • [6] Multi-granularity Hierarchical Feature Extraction for Question-Answering Understanding
    Qin, Xingguo
    Zhou, Ya
    Huang, Guimin
    Li, Maolin
    Li, Jun
    COGNITIVE COMPUTATION, 2023, 15 (01) : 121 - 131
  • [7] M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering
    Wang, He
    Pan, Haiwei
    Zhang, Kejia
    He, Shuning
    Chen, Chunling
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 141 - 154
  • [8] Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering
    Wang, Wei
    Yan, Ming
    Wu, Chen
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1705 - 1714
  • [9] Multi-interaction Network with Object Relation for Video Question Answering
    Jin, Weike
    Zhao, Zhou
    Gu, Mao
    Yu, Jun
    Xiao, Jun
    Zhuang, Yueting
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1193 - 1201
  • [10] Chinese Knowledge Base Question Answering by Attention-Based Multi-Granularity Model
    Shen, Cun
    Huang, Tinglei
    Liang, Xiao
    Li, Feng
    Fu, Kun
    INFORMATION, 2018, 9 (04)