Image-text bidirectional learning network based cross-modal retrieval

被引:11
|
作者
Li, Zhuoyi [1 ,2 ]
Lu, Huibin [1 ,2 ]
Fu, Hao [1 ,2 ]
Gu, Guanghua [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; bidirectional learning network; common representation space; discriminant consistency loss; bidirectional crisscross loss; REPRESENTATION;
D O I
10.1016/j.neucom.2022.02.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of cross-modal retrieval has attracted significant attention in the cross-media retrieval community. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. The existing numerous cross-modal retrieval approaches tend to jointly construct a common subspace, while these methods fail to consider mutual influence between modalities sufficiently during the whole training process. In this paper, we propose a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. The method constructs a common representation space and directly measures the similarity of heterogeneous data. More specifically, a multi-layer supervision network is proposed to learn the cross-modal relevance of the generated representations. Moreover, a bidirectional crisscross loss function is proposed to preserve the modal invariance with the bidirectional learning strategy in the common representation space. The loss functions of discriminant consistency and the bidirectional crisscross loss are integrated into an objective function which aims to minimize the intra-class distance and maximize the inter-class distance. Comprehensive experimental results on four widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [41] Adaptive Cross-Modal Embeddings for Image-Text Alignment
    Wehrmann, Pinatas
    Kolling, Camila
    Barros, Rodrigo C.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12313 - 12320
  • [42] Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval
    Xiong, Siyu
    Pan, Lili
    Ma, Xueqiang
    Hu, Qinghua
    Beckman, Eric
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (10) : 4423 - 4434
  • [43] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
  • [44] A TRANSFORMER-BASED CROSS-MODAL IMAGE-TEXT RETRIEVAL METHOD USING FEATURE DECOUPLING AND RECONSTRUCTION
    Zhang, Huan
    Sun, Yingzhi
    Liao, Yu
    Xu, SiYuan
    Yang, Rui
    Wang, Shuang
    Hou, Biao
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1796 - 1799
  • [45] Cross-Modal Image-Text Matching via Coupled Projection Learning Hashing
    Zhao, Huan
    Wang, Haoqian
    Zha, Xupeng
    Wang, Song
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 367 - 376
  • [46] TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
    Li, Qiqi
    Ma, Longfei
    Jiang, Zheng
    Li, Mingyong
    Jin, Bo
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3713 - 3728
  • [47] Simulation of cross-modal image-text retrieval algorithm under convolutional neural network structure and hash method
    Yang, XianBen
    Zhang, Wei
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (05): : 7106 - 7132
  • [48] Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval
    Wei, Yuqi
    Li, Ning
    [J]. Computer Engineering and Applications, 2023, 59 (16) : 115 - 124
  • [49] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [50] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Zuo, Lin
    Shen, Fumin
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425