Image-text bidirectional learning network based cross-modal retrieval

被引:11
|
作者
Li, Zhuoyi [1 ,2 ]
Lu, Huibin [1 ,2 ]
Fu, Hao [1 ,2 ]
Gu, Guanghua [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; bidirectional learning network; common representation space; discriminant consistency loss; bidirectional crisscross loss; REPRESENTATION;
D O I
10.1016/j.neucom.2022.02.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of cross-modal retrieval has attracted significant attention in the cross-media retrieval community. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. The existing numerous cross-modal retrieval approaches tend to jointly construct a common subspace, while these methods fail to consider mutual influence between modalities sufficiently during the whole training process. In this paper, we propose a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. The method constructs a common representation space and directly measures the similarity of heterogeneous data. More specifically, a multi-layer supervision network is proposed to learn the cross-modal relevance of the generated representations. Moreover, a bidirectional crisscross loss function is proposed to preserve the modal invariance with the bidirectional learning strategy in the common representation space. The loss functions of discriminant consistency and the bidirectional crisscross loss are integrated into an objective function which aims to minimize the intra-class distance and maximize the inter-class distance. Comprehensive experimental results on four widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [1] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [2] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [3] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [4] Probability Distribution Representation Learning for Image-Text Cross-Modal Retrieval
    Yang, Chen
    Liu, Libo
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (05): : 751 - 759
  • [5] Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval
    Haoyu Lu
    Yuqi Huo
    Mingyu Ding
    Nanyi Fei
    Zhiwu Lu
    [J]. Machine Intelligence Research, 2023, 20 : 569 - 582
  • [6] Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval
    Lu, Haoyu
    Huo, Yuqi
    Ding, Mingyu
    Fei, Nanyi
    Lu, Zhiwu
    [J]. MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 569 - 582
  • [7] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
  • [8] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [9] Cross-modal fabric image-text retrieval based on convolutional neural network and TinyBERT
    Xiang, Jun
    Zhang, Ning
    Pan, Ruru
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59725 - 59746
  • [10] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757