Text-guided distillation learning to diversify video embeddings for text-video retrieval

被引:0
|
作者
Lee, Sangmin [1 ]
Kim, Hyung-Il [2 ]
Ro, Yong Man [3 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Elect & Telecommun Res Inst, Visual Intelligence Res Sect, Daejeon 34129, South Korea
[3] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon 34141, South Korea
关键词
text-video retrieval; Diverse video embedding; Text-guided distillation learning; Text-agnostic; One-to-many correspondence;
D O I
10.1016/j.patcog.2024.110754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional text-video retrieval methods typically match a video with a text on a one-to-one manner. However, a single video can contain diverse semantics, and text descriptions can vary significantly. Therefore, such methods fail to match a video with multiple texts simultaneously. In this paper, we propose a novel approach to tackle this one-to-many correspondence problem in text-video retrieval. We devise diverse temporal aggregation and a multi-key memory to address temporal and semantic diversity, consequently constructing multiple video embedding paths from a single video. Additionally, we introduce text-guided distillation learning that enables each video path to acquire meaningful distinct competencies in representing varied semantics. Our video embedding approach is text-agnostic, allowing the prepared video embeddings to be used continuously for any new text query. Experiments show our method outperforms existing methods on four datasets. We further validate the effectiveness of our designs with ablation studies and analyses on diverse video embeddings.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Text-guided Graph Temporal Modeling for few-shot video classification
    Deng, Fuqin
    Zhong, Jiaming
    Li, Nannan
    Fu, Lanhui
    Jiang, Bingchun
    Yi, Ningbo
    Qi, Feng
    Xin, He
    Lam, Tin Lun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [42] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [43] Text-Guided Object Detector for Multi-modal Video Question Answering
    Shen, Ruoyue
    Inoue, Nakamasa
    Shinoda, Koichi
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1032 - 1042
  • [44] CelebV-Text: A Large-Scale Facial Text-Video Dataset
    Yu, Jianhui
    Zhu, Hao
    Jiang, Liming
    Loy, Chen Change
    Cai, Weidong
    Wu, Wayne
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14805 - 14814
  • [45] Fine-grained Cross-modal Alignment Network for Text-Video Retrieval
    Han, Ning
    Chen, Jingjing
    Xiao, Guangyi
    Zhang, Hao
    Zeng, Yawen
    Chen, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3826 - 3834
  • [46] Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
    Jiang, Chen
    Liu, Hong
    Yu, Xuzheng
    Wang, Qing
    Cheng, Yuan
    Xu, Jia
    Liu, Zhongyi
    Guo, Qingpei
    Chu, Wei
    Yang, Ming
    Qi, Yuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4626 - 4636
  • [47] Deep learning for video-text retrieval: a review
    Cunjuan Zhu
    Qi Jia
    Wei Chen
    Yanming Guo
    Yu Liu
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [48] Deep learning for video-text retrieval: a review
    Zhu, Cunjuan
    Jia, Qi
    Chen, Wei
    Guo, Yanming
    Liu, Yu
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [49] Face database generation based on text-video correlation
    Zeng, Dan
    Bao, Yixin
    Liu, Ke
    Zhao, Fan
    Tian, Qi
    NEUROCOMPUTING, 2016, 207 : 240 - 249
  • [50] Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval
    Ma, Wentao
    Chen, Qingchao
    Zhou, Tongqing
    Zhao, Shan
    Cai, Zhiping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5486 - 5497