Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding

被引:0
|
作者
Xu, Yuanxing [1 ]
Wei, Yuting [1 ]
Wu, Bin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep video understanding; Multimodal analysis; Relation discrimination; Question answering;
D O I
10.1145/3581783.3612871
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The surge in video and social media content underscores the need for a deeper understanding of multimedia data. Most of the existing mature video understanding techniques perform well with short formats and content that requires only shallow understanding, but do not perform well with long format videos that require deep understanding and reasoning. Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics to address the problem of holistically analyzing long videos and extract useful knowledge to solve different types of queries. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an image-language pretrained model. This model adeptly selects frames pertinent to queries, obviating the need for a complete movie-level knowledge graph. Our approach achieved first and fourth positions for two groups of movie-level queries. Sufficient experiments and final rankings demonstrate its effectiveness and robustness.
引用
收藏
页码:9591 / 9595
页数:5
相关论文
共 50 条
  • [1] Query-aware video encoder for video moment retrieval
    Hao, Jiachang
    Sun, Haifeng
    Ren, Pengfei
    Wang, Jingyu
    Qi, Qi
    Liao, Jianxin
    NEUROCOMPUTING, 2022, 483 : 72 - 86
  • [2] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
    Hou, Zhijian
    Ngo, Chong-Wah
    Chan, W. K.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3900 - 3908
  • [3] Query-aware sparse coding for web multi-video summarization
    Ji, Zhong
    Ma, Yaru
    Pang, Yanwei
    Li, Xuelong
    INFORMATION SCIENCES, 2019, 478 : 152 - 166
  • [4] DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization
    Messaoud, Safa
    Lourentzou, Ismini
    Boughoula, Assma
    Zehni, Mona
    Zhao, Zhizhen
    Zhai, Chengxiang
    Schwing, Alexander G.
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1389 - 1399
  • [5] VID-WIN: Fast Video Event Matching With Query-Aware Windowing at the Edge for the Internet of Multimedia Things
    Yadav, Piyush
    Salwala, Dhaval
    Curry, Edward
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (13): : 10367 - 10389
  • [6] QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning
    Li, Guoliang
    Zhou, Xuanhe
    Li, Shifu
    Gao, Bo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2118 - 2130
  • [7] Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization
    Yang, Shuo
    Wu, Xinxiao
    Shang, Zirui
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7451 - 7461
  • [8] ERaL: Exceptional Regions-Aware Deep Video Interpolation Localization
    Ding, Xiangling
    Zhao, Yulin
    Gu, Qing
    Zhang, Dengyong
    Yang, Gaobo
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1885 - 1889
  • [9] Deep Video Understanding with Video-Language Model
    Liu, Runze
    Fang, Yaqun
    Yu, Fan
    Tian, Ruiqi
    Ren, Tongwei
    Wu, Gangshan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9551 - 9555
  • [10] Deep multi-query video retrieval
    Akbacak E.
    Vural C.
    Journal of Visual Communication and Image Representation, 2022, 85