Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

被引：5

作者：

Wu, Xiaoyu ^{[1
]}

Wang, Tiantian ^{[1
]}

Wang, Shengjin ^{[2
]}

机构：

[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

来源：

ELECTRONICS | 2020年 / 9卷 / 12期

基金：

中国国家自然科学基金;

关键词：

cross-model learning; text-video retrieval; semantic correlation; multi-task learning;

D O I：

10.3390/electronics9122125

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval.

引用

页码：1 / 17

页数：16

共 50 条

[41] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
Han, De
Cheng, Xing
Guo, Nan
Ye, Xiaochun
Rainer, Benjamin
Priller, Peter
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
[42] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
Jin, Yi
Li, Jie
Lang, Congyan
Ruan, Qiuqi
MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (03) : 905 - 920
[43] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
Yi Jin
Jie Li
Congyan Lang
Qiuqi Ruan
Multidimensional Systems and Signal Processing, 2017, 28 : 905 - 920
[44] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
Xu, Xing
Song, Jingkuan
Lu, Huimin
Yang, Yang
Shen, Fumin
Huang, Zi
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
[45] Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval
Xie, De
Deng, Cheng
Li, Chao
Liu, Xianglong
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3626 - 3637
[46] Self-Supervised Correlation Learning for Cross-Modal Retrieval
Liu, Yaxin
Wu, Jianlong
Qu, Leigang
Gan, Tian
Yin, Jianhua
Nie, Liqiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
[47] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
Yuan, Xu
Zhong, Hua
Chen, Zhikui
Zhong, Fangming
Hu, Yueming
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
[48] Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
Hua, Yan
Wang, Shuhui
Liu, Siyuan
Cai, Anni
Huang, Qingming
IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1201 - 1216
[49] Survey on Video-Text Cross-Modal Retrieval
Chen, Lei
Xi, Yimeng
Liu, Libo
Computer Engineering and Applications, 2024, 60 (04) : 1 - 20
[50] An Adversarial Learning and Canonical Correlation Analysis Based Cross-Modal Retrieval Model
Thi-Hong Vuong
Thanh-Huyen Pham
Tri-Thanh Nguyen
Quang-Thuy Ha
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 153 - 164

← 1 2 3 4 5 →