Video question answering via traffic knowledge database and question classification

被引:0
|
作者
Sun, Xiaoyong [1 ]
Dai, Yu [1 ]
Wang, Yuchen [1 ]
Ma, Weifeng [1 ]
Lin, Xuefen [1 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Zhejiang, Peoples R China
关键词
Video question answering; Knowledge; Transformer; Question classification; WEB;
D O I
10.1007/s00530-023-01240-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video question answering (VideoQA) is a task that involves answering questions related to videos. The main idea is to understand the content of the video and to combine it with the relevant semantic context to answer various types of questions. Existing methods typically analyze the spatiotemporal correlations of the entire video to answer questions. However, for some simple questions, the answer is related to only a specific frame of the video, and analyzing the entire video undoubtedly increases the learning cost. For some complex questions, the information contained in the video is limited, and these methods are not sufficient to fully answer such questions. Therefore, we proposes a VideoQA model based on question classification and a traffic knowledge database. The model starts from the perspective of the question and classifies the questions into general scene questions and causal questions using different methods to process these two types of questions. For general scene questions, we first extract the key frames of the video to convert it into a simpler image question-answering task and then we use top-down and bottom-up attention mechanisms to process it. For causal questions, we design a lightweight traffic knowledge database that provides relevant traffic knowledge not originally present in VideoQA datasets, to help model reasoning. Then, we use a question and knowledge-guided aggregation graph attention network to process causal questions. The experimental results show that while greatly reducing resource costs, our model performs better on the TrafficQA dataset than do models utilizing millions of external data for pretraining.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Video question answering via traffic knowledge database and question classification
    Xiaoyong Sun
    Yu Dai
    Yuchen Wang
    Weifeng Ma
    Xuefen Lin
    [J]. Multimedia Systems, 2024, 30
  • [2] Knowledge Proxy Intervention for Deconfounded Video Question Answering
    Li, Jiangtong
    Niu, Li
    Zhang, Liqing
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2770 - 2781
  • [3] A dataset for medical instructional video classification and question answering
    Gupta, Deepak
    Attal, Kush
    Demner-Fushman, Dina
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [4] A Video Question Answering Model Based on Knowledge Distillation
    Shao, Zhuang
    Wan, Jiahui
    Zong, Linlin
    [J]. INFORMATION, 2023, 14 (06)
  • [5] A dataset for medical instructional video classification and question answering
    Deepak Gupta
    Kush Attal
    Dina Demner-Fushman
    [J]. Scientific Data, 10
  • [6] Simple Question Answering over Knowledge Graph Enhanced by Question Pattern Classification
    Cui, Hai
    Peng, Tao
    Feng, Lizhou
    Bao, Tie
    Liu, Lu
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (10) : 2741 - 2761
  • [7] Simple Question Answering over Knowledge Graph Enhanced by Question Pattern Classification
    Hai Cui
    Tao Peng
    Lizhou Feng
    Tie Bao
    Lu Liu
    [J]. Knowledge and Information Systems, 2021, 63 : 2741 - 2761
  • [8] Contrastive Video Question Answering via Video Graph Transformer
    Xiao, Junbin
    Zhou, Pan
    Yao, Angela
    Li, Yicong
    Hong, Richang
    Yan, Shuicheng
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
  • [9] Video Question Answering with Phrases via Semantic Roles
    Sadhu, Arka
    Chen, Kan
    Nevatia, Ram
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2460 - 2478
  • [10] Question Classification for Arabic Question Answering Systems
    Al Chalabi, Hani Maluf
    Ray, Santosh Kumar
    Shaalan, Khaled
    [J]. 2015 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY RESEARCH (ICTRC), 2015, : 310 - 313