Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

被引:5
|
作者
Wu, Xiaoyu [1 ]
Wang, Tiantian [1 ]
Wang, Shengjin [2 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-model learning; text-video retrieval; semantic correlation; multi-task learning;
D O I
10.3390/electronics9122125
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval.
引用
收藏
页码:1 / 17
页数:16
相关论文
共 50 条
  • [31] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
    Zuheng Ming
    Jean-Christophe Burie
    Muhammad Muzzamil Luqman
    International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 33 - 48
  • [32] Multilevel Semantic Interaction Alignment for Video-Text Cross-Modal Retrieval
    Chen, Lei
    Deng, Zhen
    Liu, Libo
    Yin, Shibai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6559 - 6575
  • [33] Analyzing semantic correlation for cross-modal retrieval
    Liang Xie
    Peng Pan
    Yansheng Lu
    Multimedia Systems, 2015, 21 : 525 - 539
  • [34] Analyzing semantic correlation for cross-modal retrieval
    Xie, Liang
    Pan, Peng
    Lu, Yansheng
    MULTIMEDIA SYSTEMS, 2015, 21 (06) : 525 - 539
  • [35] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zhang, Xiong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Long
    Zhang, Haisu
    REMOTE SENSING, 2023, 15 (18)
  • [36] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [37] On Metric Learning for Audio-Text Cross-Modal Retrieval
    Mei, Xinhao
    Liu, Xubo
    Sun, Jianyuan
    Plumbley, Mark
    Wang, Wenwu
    INTERSPEECH 2022, 2022, : 4142 - 4146
  • [38] Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
    Lu, Zhu
    Fang, Deng
    Kun, Liu
    Tingting, He
    Yuanyuan, Liu
    Data Analysis and Knowledge Discovery, 2021, 5 (12) : 110 - 122
  • [39] Multi-task Ranking with User Behaviors for Text-video Search
    Liu, Peidong
    Liao, Dongliang
    Wang, Jinpeng
    Wu, Yangxin
    Li, Gongfu
    Xia, Shu-Tao
    Xu, Jin
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 126 - 130
  • [40] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159