Survey on Video-Text Cross-Modal Retrieval

被引:0
|
作者
Chen, Lei [1 ]
Xi, Yimeng [1 ]
Liu, Libo [1 ]
机构
[1] School of Information Engineering, Ningxia University, Yinchuan,750021, China
关键词
Benchmarking - Information retrieval - Lime - Modal analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Modalities define the specific forms in which data exist. The swift expansion of various modal data types has brought multimodal learning into the limelight. As a crucial subset of this field, cross-modal retrieval has achieved noteworthy advancements, particularly in integrating images and text. However, videos, as opposed to images, encapsulate a richer array of modal data and offer a more extensive spectrum of information. This richness aligns well with the growing user demand for comprehensive and adaptable information retrieval solutions. Consequently, video-text cross-modal retrieval has emerged as a burgeoning area of research in recent times. To thoroughly comprehend video-text cross-modal retrieval and its state-of-the-art developments, a methodical review and summarization of the existing representative methods is conducted. Initially, the focus is on analyzing current deep learning-based unidirectional and bidirectional video-text cross-modal retrieval methods. This analysis includes an in- depth exploration of seminal works within each category, highlighting their strengths and weaknesses. Subsequently, the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. The performance of several standard methods in benchmark datasets is compared. Finally, the application prospects and future research challenges of video- text cross-modal retrieval are discussed. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:1 / 20
相关论文
共 50 条
  • [1] Multilevel Semantic Interaction Alignment for Video-Text Cross-Modal Retrieval
    Chen, Lei
    Deng, Zhen
    Liu, Libo
    Yin, Shibai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6559 - 6575
  • [2] Cross-Modal Video Retrieval Model Based on Video-Text Dual Alignment
    Che, Zhanbin
    Guo, Huaili
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (02) : 303 - 311
  • [3] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
  • [4] CMMT: Cross-Modal Meta-Transformer for Video-Text Retrieval
    Gao, Yizhao
    Lu, Zhiwu
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 76 - 84
  • [5] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Li, Juncheng
    Metze, Florian
    Roy-Chowdhury, Amit K.
    [J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
  • [6] Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval
    Liu, Hui
    Lv, Gang
    Gu, Yanhong
    Nian, Fudong
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 298 - 310
  • [7] Multi-Feature Graph Attention Network for Cross-Modal Video-Text Retrieval
    Hao, Xiaoshuai
    Zhou, Yucan
    Wu, Dayan
    Zhang, Wanqian
    Li, Bo
    Wang, Weiping
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 135 - 143
  • [8] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
    Nian, Fudong
    Ding, Ling
    Hu, Yuxia
    Gu, Yanhong
    [J]. MATHEMATICS, 2022, 10 (18)
  • [9] CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval
    Zhuo, Yaoxin
    Li, Yikang
    Hsiao, Jenhao
    Ho, Chiuman
    Li, Baoxin
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 158 - 166
  • [10] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
    Jin, Peng
    Huang, Jinfa
    Xiong, Pengfei
    Tian, Shangxuan
    Liu, Chang
    Ji, Xiangyang
    Yuan, Li
    Chen, Jie
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2472 - 2482