Multimodal Video Retrieval and Multimodal Language Modelling

被引:0
|
作者
Wang, Hui [1 ]
Kittler, Josef [2 ]
Gales, Mark [3 ]
Cooper, Rob [4 ]
Mulvenna, Maurice [5 ]
Ng, Wing [6 ]
Hua, Yang [1 ]
Gault, Richard [1 ]
Haider, Abbas [1 ]
Wu, Guanfeng [7 ]
机构
[1] Queens Univ Belfast, Belfast, North Ireland
[2] Univ Surrey, London, England
[3] Univ Cambridge, Cambridge, England
[4] BBC, London, England
[5] Univ Ulster, Belfast, North Ireland
[6] South China Univ Technol China, Guangzhou, Peoples R China
[7] Southwest Jiatong Univ China, Chengdu, Peoples R China
关键词
Information Retrieval; Deep Learning; Large Language Models; Multimodal data retrieval; Multimodal data understanding and interaction;
D O I
10.1145/3652583.3660001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example-based search, has become increasingly crucial. Existing metadata often fails to meet the needs of specific types of searches, especially when videos contain elements from different modalities, such as visual and audio. Consequently, developing video retrieval methods that can handle multi-modal content is essential. In designing our novel video retrieval framework named Multi-modal Video Search by Examples (MVSE)1, we focused on accuracy (precision and recall), efficiency (retrieval time in seconds), interactivity, and extensibility, with key components including advanced data processing and a user-friendly interface aimed at enhancing search effectiveness and user experience. With the advent of Large Language Models (LLMs), the interaction between multimodal data, including image and audio has been transformed with a significant leap forward towards a bigger goal of artificial general intelligence. This workshop aims to bring together experts from diverse domains to explore the possibilities of developing novel ways of multimodal data search, understanding and interaction.
引用
收藏
页码:1345 / 1355
页数:11
相关论文
共 50 条
  • [1] MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding
    Anand, Vishal
    Ramesh, Raksha
    Jin, Boshen
    Wang, Ziyin
    Lei, Xiaoxiao
    Lin, Ching-Yung
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4868 - 4872
  • [2] Multimodal search for effective video retrieval
    Natsev, Apostol
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 525 - 528
  • [3] Multimodal video retrieval with CLIP: a user study
    Tayfun Alpay
    Sven Magg
    Philipp Broze
    Daniel Speck
    Information Retrieval Journal, 2023, 26
  • [4] Multimodal video retrieval with CLIP: a user study
    Alpay, Tayfun
    Magg, Sven
    Broze, Philipp
    Speck, Daniel
    INFORMATION RETRIEVAL JOURNAL, 2023, 26 (1-2):
  • [5] Multimodal Video Retrieval with the 2017 IMOTION System
    Rossetto, Luca
    Giangreco, Ivan
    Tanase, Claudiu
    Schuldt, Heiko
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 457 - 460
  • [6] MDMMT: Multidomain Multimodal Transformer for Video Retrieval
    Dzabraev, Maksim
    Kalashnikov, Maksim
    Komkov, Stepan
    Petiushko, Aleksandr
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3349 - 3358
  • [7] Video browsing and retrieval based on multimodal integration
    Zhu, YY
    Zhou, DG
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 650 - 653
  • [8] AdaCLIP: Towards Pragmatic Multimodal Video Retrieval
    Hu, Zhiming
    Ye, Angela Ning
    Khorasgani, Salar Hosseini
    Mohomed, Iqbal
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5623 - 5633
  • [9] An efficient access method for multimodal video retrieval
    Sperandio, Ricardo C.
    Patrocinio, Zenilton K. G., Jr.
    de Paula, Hugo B.
    Guimaraes, Silvio J. F.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (04) : 1357 - 1375
  • [10] An efficient access method for multimodal video retrieval
    Ricardo C. Sperandio
    Zenilton K. G. Patrocínio
    Hugo B. de Paula
    Silvio J. F. Guimarães
    Multimedia Tools and Applications, 2015, 74 : 1357 - 1375