Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

被引:0
|
作者
Wang, Yimu [1 ]
Shi, Peng [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval. The shared sparse space is initialized with a finite number of sparse concepts, each of which refers to a number of words. With the text data at hand, we learn and update the shared sparse space in a supervised manner using the proposed similarity and alignment losses. Moreover, to enable multi-grained alignment, we incorporate frame representations for better modeling the video modality and calculating fine-grained and coarse-grained similarities. Benefiting from the learned shared sparse space and multi-grained similarities, extensive experiments on several video-text retrieval benchmarks demonstrate the superiority of S3MA over existing methods. Our code is available at link.
引用
收藏
页码:633 / 649
页数:17
相关论文
共 50 条
  • [21] Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval
    Shi, Yaya
    Liu, Haowei
    Xu, Haiyang
    Ma, Zongyang
    Ye, Qinghao
    Hu, Anwen
    Yan, Ming
    Zhang, Ji
    Huang, Fei
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4460 - 4470
  • [22] A NOVEL CONVOLUTIONAL ARCHITECTURE FOR VIDEO-TEXT RETRIEVAL
    Li, Zheng
    Guo, Caili
    Yang, Bo
    Feng, Zerun
    Zhang, Hao
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [23] Progressive Semantic Matching for Video-Text Retrieval
    Liu, Hongying
    Luo, Ruyi
    Shang, Fanhua
    Niu, Mantang
    Liu, Yuanyuan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5083 - 5091
  • [24] A Framework for Video-Text Retrieval with Noisy Supervision
    Vaseqi, Zahra
    Fan, Pengnan
    Clark, James
    Levine, Martin
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 373 - 383
  • [25] Visual Consensus Modeling for Video-Text Retrieval
    Cao, Shuqiang
    Wang, Bairui
    Zhang, Wei
    Ma, Lin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 167 - 175
  • [26] Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders
    Dai, Zuozhuo
    Cheng, Kaihui
    Shao, Fangtao
    Dong, Zilong
    Zhu, Siyu
    PATTERN RECOGNITION, 2025, 159
  • [27] Query-Adaptive Late Fusion for Hierarchical Fine-Grained Video-Text Retrieval
    Ma, Wentao
    Chen, Qingchao
    Liu, Fang
    Zhou, Tongqing
    Cai, Zhiping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 7150 - 7161
  • [28] CLIP Based Multi-Event Representation Generation for Video-Text Retrieval
    Tu R.
    Mao X.
    Kong W.
    Cai C.
    Zhao W.
    Wang H.
    Huang H.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2169 - 2179
  • [29] Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval
    Wang, Wei
    Gao, Junyu
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2386 - 2397
  • [30] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124