Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

被引:5
|
作者
Wu, Xiaoyu [1 ]
Wang, Tiantian [1 ]
Wang, Shengjin [2 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-model learning; text-video retrieval; semantic correlation; multi-task learning;
D O I
10.3390/electronics9122125
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval.
引用
收藏
页码:1 / 17
页数:16
相关论文
共 50 条
  • [41] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
    Han, De
    Cheng, Xing
    Guo, Nan
    Ye, Xiaochun
    Rainer, Benjamin
    Priller, Peter
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
  • [42] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Jin, Yi
    Li, Jie
    Lang, Congyan
    Ruan, Qiuqi
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (03) : 905 - 920
  • [43] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Yi Jin
    Jie Li
    Congyan Lang
    Qiuqi Ruan
    Multidimensional Systems and Signal Processing, 2017, 28 : 905 - 920
  • [44] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
  • [45] Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval
    Xie, De
    Deng, Cheng
    Li, Chao
    Liu, Xianglong
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3626 - 3637
  • [46] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
  • [47] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [48] Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Cai, Anni
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1201 - 1216
  • [49] Survey on Video-Text Cross-Modal Retrieval
    Chen, Lei
    Xi, Yimeng
    Liu, Libo
    Computer Engineering and Applications, 2024, 60 (04) : 1 - 20
  • [50] An Adversarial Learning and Canonical Correlation Analysis Based Cross-Modal Retrieval Model
    Thi-Hong Vuong
    Thanh-Huyen Pham
    Tri-Thanh Nguyen
    Quang-Thuy Ha
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 153 - 164