Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding

被引：0

作者：

Xu, Yuanxing ^{[1
]}

Wei, Yuting ^{[1
]}

Wu, Bin ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Deep video understanding; Multimodal analysis; Relation discrimination; Question answering;

D O I：

10.1145/3581783.3612871

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The surge in video and social media content underscores the need for a deeper understanding of multimedia data. Most of the existing mature video understanding techniques perform well with short formats and content that requires only shallow understanding, but do not perform well with long format videos that require deep understanding and reasoning. Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics to address the problem of holistically analyzing long videos and extract useful knowledge to solve different types of queries. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an image-language pretrained model. This model adeptly selects frames pertinent to queries, obviating the need for a complete movie-level knowledge graph. Our approach achieved first and fourth positions for two groups of movie-level queries. Sufficient experiments and final rankings demonstrate its effectiveness and robustness.

引用

页码：9591 / 9595

页数：5

共 50 条

[1] Query-aware video encoder for video moment retrieval
Hao, Jiachang
Sun, Haifeng
Ren, Pengfei
Wang, Jingyu
Qi, Qi
Liao, Jianxin
NEUROCOMPUTING, 2022, 483 : 72 - 86
[2] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
Hou, Zhijian
Ngo, Chong-Wah
Chan, W. K.
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3900 - 3908
[3] Query-aware sparse coding for web multi-video summarization
Ji, Zhong
Ma, Yaru
Pang, Yanwei
Li, Xuelong
INFORMATION SCIENCES, 2019, 478 : 152 - 166
[4] DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization
Messaoud, Safa
Lourentzou, Ismini
Boughoula, Assma
Zehni, Mona
Zhao, Zhizhen
Zhai, Chengxiang
Schwing, Alexander G.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1389 - 1399
[5] VID-WIN: Fast Video Event Matching With Query-Aware Windowing at the Edge for the Internet of Multimedia Things
Yadav, Piyush
Salwala, Dhaval
Curry, Edward
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (13): : 10367 - 10389
[6] QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning
Li, Guoliang
Zhou, Xuanhe
Li, Shifu
Gao, Bo
PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2118 - 2130
[7] Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization
Yang, Shuo
Wu, Xinxiao
Shang, Zirui
Luo, Jiebo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7451 - 7461
[8] ERaL: Exceptional Regions-Aware Deep Video Interpolation Localization
Ding, Xiangling
Zhao, Yulin
Gu, Qing
Zhang, Dengyong
Yang, Gaobo
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1885 - 1889
[9] Deep Video Understanding with Video-Language Model
Liu, Runze
Fang, Yaqun
Yu, Fan
Tian, Ruiqi
Ren, Tongwei
Wu, Gangshan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9551 - 9555
[10] Deep multi-query video retrieval
Akbacak E.
Vural C.
Journal of Visual Communication and Image Representation, 2022, 85

← 1 2 3 4 5 →