Survey on Video-Text Cross-Modal Retrieval

被引：0

作者：

Chen, Lei ^{[1
]}

Xi, Yimeng ^{[1
]}

Liu, Libo ^{[1
]}

机构：

[1] School of Information Engineering, Ningxia University, Yinchuan,750021, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 04期

关键词：

Benchmarking - Information retrieval - Lime - Modal analysis;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Modalities define the specific forms in which data exist. The swift expansion of various modal data types has brought multimodal learning into the limelight. As a crucial subset of this field, cross-modal retrieval has achieved noteworthy advancements, particularly in integrating images and text. However, videos, as opposed to images, encapsulate a richer array of modal data and offer a more extensive spectrum of information. This richness aligns well with the growing user demand for comprehensive and adaptable information retrieval solutions. Consequently, video-text cross-modal retrieval has emerged as a burgeoning area of research in recent times. To thoroughly comprehend video-text cross-modal retrieval and its state-of-the-art developments, a methodical review and summarization of the existing representative methods is conducted. Initially, the focus is on analyzing current deep learning-based unidirectional and bidirectional video-text cross-modal retrieval methods. This analysis includes an in- depth exploration of seminal works within each category, highlighting their strengths and weaknesses. Subsequently, the discussion shifts to an experimental viewpoint, introducing benchmark datasets and evaluation metrics specific to video-text cross-modal retrieval. The performance of several standard methods in benchmark datasets is compared. Finally, the application prospects and future research challenges of video- text cross-modal retrieval are discussed. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.

引用

页码：1 / 20

共 50 条

[1] Multilevel Semantic Interaction Alignment for Video-Text Cross-Modal Retrieval
Chen, Lei
Deng, Zhen
Liu, Libo
Yin, Shibai
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6559 - 6575
[2] Cross-Modal Video Retrieval Model Based on Video-Text Dual Alignment
Che, Zhanbin
Guo, Huaili
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (02) : 303 - 311
[3] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
Jin, Weike
Zhao, Zhou
Zhang, Pengcheng
Zhu, Jieming
He, Xiuqiang
Zhuang, Yueting
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
[4] CMMT: Cross-Modal Meta-Transformer for Video-Text Retrieval
Gao, Yizhao
Lu, Zhiwu
[J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 76 - 84
[5] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
Mithun, Niluthpol Chowdhury
Li, Juncheng
Metze, Florian
Roy-Chowdhury, Amit K.
[J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
[6] Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval
Liu, Hui
Lv, Gang
Gu, Yanhong
Nian, Fudong
[J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 298 - 310
[7] Multi-Feature Graph Attention Network for Cross-Modal Video-Text Retrieval
Hao, Xiaoshuai
Zhou, Yucan
Wu, Dayan
Zhang, Wanqian
Li, Bo
Wang, Weiping
[J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 135 - 143
[8] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
Nian, Fudong
Ding, Ling
Hu, Yuxia
Gu, Yanhong
[J]. MATHEMATICS, 2022, 10 (18)
[9] CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval
Zhuo, Yaoxin
Li, Yikang
Hsiao, Jenhao
Ho, Chiuman
Li, Baoxin
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 158 - 166
[10] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Jin, Peng
Huang, Jinfa
Xiong, Pengfei
Tian, Shangxuan
Liu, Chang
Ji, Xiangyang
Yuan, Li
Chen, Jie
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2472 - 2482

← 1 2 3 4 5 →