Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval

被引:0
|
作者
Qin, Xue-Yang [1 ]
Li, Li-Shuang [1 ]
Tang, Jing-Yao [1 ]
Hao, Fei [2 ]
Ge, Mei-Ling [3 ]
Pang, Guang-Yao [4 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
[3] Weifang Univ, Sch Comp Engn, Weifang 261061, Peoples R China
[4] Wuzhou Univ, Guangxi Coll & Univ Key Lab Intelligent Ind Softwa, Wuzhou 543002, Peoples R China
基金
中国国家自然科学基金;
关键词
image-text retrieval; cross-modal retrieval; multi-task learning; graph convolutional network;
D O I
10.1007/s11390-024-4125-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text retrieval aims to capture the semantic correspondence between images and texts, which serves as a foundation and crucial component in multi-modal recommendations, search systems, and online shopping. Existing mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text retrieval. To this end, a multi-task visual semantic embedding network (MVSEN) is proposed for image-text retrieval. Specifically, we design two auxiliary tasks, including text-text matching and multi-label classification, for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training perspective. Besides, we present an intra- and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between modalities. Subsequently, we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text pairs. Experimental results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets, Flickr30K and MSCO-CO, with rSum improvements of 8.2% and 3.0%, respectively.
引用
收藏
页码:811 / 826
页数:16
相关论文
共 50 条
  • [1] MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
    Feng, Duoduo
    He, Xiangteng
    Peng, Yuxin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)
  • [2] Image-Text Embedding Learning via Visual and Textual Semantic Reasoning
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 641 - 656
  • [3] Direction-Oriented Visual-Semantic Embedding Model for Remote Sensing Image-Text Retrieval
    Ma, Qing
    Pan, Jiancheng
    Bai, Cong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [4] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [5] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Chan Hur
    Hyeyoung Park
    [J]. Multimedia Tools and Applications, 2024, 83 : 49689 - 49705
  • [6] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Hur, Chan
    Park, Hyeyoung
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49689 - 49705
  • [7] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [8] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [9] Multi-task Network Embedding
    Xu, Linchuan
    Wei, Xiaokai
    Cao, Jiannong
    Yu, Philip S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2017, : 571 - 580
  • [10] Multi-task network embedding
    Linchuan Xu
    Xiaokai Wei
    Jiannong Cao
    Philip S. Yu
    [J]. International Journal of Data Science and Analytics, 2019, 8 : 183 - 198