Bidirectional image-sentence retrieval by local and global deep matching

被引:25
|
作者
Ma, Lin [1 ]
Jiang, Wenhao [1 ]
Jie, Zequn [1 ]
Wang, Xu [2 ]
机构
[1] Tencent AI Lab, Shenzhen 518060, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
关键词
Bidirectional image-sentence retrieval; Multimodal matching; Image embedding; Sentence embedding;
D O I
10.1016/j.neucom.2018.11.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel local and global deep matching model to tackle bidirectional image-sentence retrieval. Our proposed matching model can simultaneously exploit the image representation, sentence representation, as well as their complicated matching relationships from both local and global perspectives. For images, two different convolutional neural networks (CNNs) are leveraged to encode the local and global contents, with selective attentions to the image sub-regions and the whole image. For sentences, a CNN based sentence model and Fisher vector are employed to capture the global and local semantic meanings, respectively. Relying on the local and global representations of the image and sentence, the proposed deep matching model learns the complicated image-sentence matching relationships from local and global perspectives by integrating cross-modality correlations with intra-modality similarities. Extensive experimental results demonstrate that the proposed local and global matching model outperforms the state-of-the-art bidirectional retrieval approaches on the Flickr8K, Flickr30K, and MSCOCO datasets. Moreover, the image and sentence representations exploited in local and global levels are demonstrated to play synergic and complementary roles for bidirectional image-sentence retrieval. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:36 / 44
页数:9
相关论文
共 50 条
  • [1] Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping
    Yu, Tianyuan
    Bai, Liang
    Guo, Jinlin
    Yang, Zheng
    Xie, Yuxiang
    [J]. MULTIMEDIA MODELING, MMM 2017, PT II, 2017, 10133 : 136 - 147
  • [2] Deep Top-k Ranking for Image-Sentence Matching
    Zhang, Lingling
    Luo, Minnan
    Liu, Jun
    Chang, Xiaojun
    Yang, Yi
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 775 - 785
  • [3] Dynamic Pruning of Regions for Image-Sentence Matching
    Wu, Jie
    Liu, Weifeng
    Wang, Leiquan
    Shen, Xiuxuan
    Wei, Yiwei
    Wu, Chunlei
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [4] Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval
    Wang, Shuhui
    Chen, Yangyu
    Zhuo, Junbao
    Huang, Qingming
    Tian, Qi
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1398 - 1406
  • [5] Saliency-Guided Attention Network for Image-Sentence Matching
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5753 - 5762
  • [6] Comprehensive Framework of Early and Late Fusion for Image-Sentence Retrieval
    Wang, Yifan
    Xu, Xing
    Yu, Wei
    Xu, Ruicong
    Cao, Zuo
    Shen, Heng Tao
    [J]. IEEE MULTIMEDIA, 2022, 29 (03) : 38 - 47
  • [7] Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing
    Qimin Cheng
    Yuzhuo Zhou
    Haiyan Huang
    Zhongyuan Wang
    [J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (08) : 1532 - 1535
  • [8] Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Huang, Haiyan
    Wang, Zhongyuan
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (08) : 1532 - 1535
  • [9] Modality-Invariant Image-Text Embedding for Image-Sentence Matching
    Liu, Ruoyu
    Zhao, Yao
    Wei, Shikui
    Zheng, Liang
    Yang, Yi
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [10] Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
    Xu, Xing
    Wang, Yifan
    He, Yixuan
    Yang, Yang
    Hanjalic, Alan
    Shen, Heng Tao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)