HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval

被引:10
|
作者
Guo, Jie [1 ]
Wang, Meiting [1 ]
Zhou, Yan [1 ]
Song, Bin [1 ]
Chi, Yuhao [1 ]
Fan, Wei [2 ]
Chang, Jianglong [2 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[2] Guangdong OPPO Mobile Telecommun Corp Ltd, Dong Guan 523860, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text retrieval; feature aggregation; graph convolution network; hierarchical alignment; ATTENTION;
D O I
10.1109/TMM.2023.3248160
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text retrieval (ITR) is a challenging task in the field of multimodal information processing due to the semantic gap between different modalities. In recent years, researchers have made great progress in exploring the accurate alignment between image and text. However, existing works mainly focus on the fine-grained alignment between image regions and sentence fragments, which ignores the guiding significance of context background information. Actually, integrating the local fine-grained information and global context background information can provide more semantic clues for retrieval. In this paper, we propose a novel Hierarchical Graph Alignment Network (HGAN) for image-text retrieval. First, to capture the comprehensive multimodal features, we construct the feature graphs for the image and text modality respectively. Then, a multi-granularity shared space is established with a designed Multi-granularity Feature Aggregation and Rearrangement (MFAR) module, which enhances the semantic corresponding relations between the local and global information, and obtains more accurate feature representations for the image and text modalities. Finally, the ultimate image and text features are further refined through three-level similarity functions to achieve the hierarchical alignment. To justify the proposed model, we perform extensive experiments on MS-COCO and Flickr30K datasets. Experimental results show that the proposed HGAN outperforms the state-of-the-art methods on both datasets, which demonstrates the effectiveness and superiority of our model.
引用
收藏
页码:9189 / 9202
页数:14
相关论文
共 50 条
  • [21] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
    Seo, Sanghyun
    Kim, Juntae
    [J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353
  • [22] Automatic image-text alignment for large-scale web image indexing and retrieval
    Zhou, Ning
    Fan, Jianping
    [J]. PATTERN RECOGNITION, 2015, 48 (01) : 205 - 219
  • [23] HADA: A Graph-Based Amalgamation Framework in Image-Text Retrieval
    Nguyen, Manh-Duy
    Nguyen, Binh T.
    Gurrin, Cathal
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 717 - 731
  • [24] Feature Interaction Based Graph Convolutional Networks for Image-Text Retrieval
    Hu, Yongli
    Gao, Feili
    Sun, Yanfeng
    Gao, Junbin
    Yin, Baocai
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 217 - 229
  • [25] TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval
    Xu, Xing
    Sun, Jialiang
    Cao, Zuo
    Zhang, Yin
    Zhu, Xiaofeng
    Shen, Heng Tao
    [J]. INFORMATION FUSION, 2023, 91 : 327 - 337
  • [26] An automatic image-text alignment method for large-scale web image retrieval
    Baopeng Zhang
    Yanyun Qu
    Jinye Peng
    Jianping Fan
    [J]. Multimedia Tools and Applications, 2017, 76 : 21401 - 21421
  • [27] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
    Shang, Heng
    Zhao, Guoshuai
    Shi, Jing
    Qian, Xueming
    [J]. IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
  • [28] An automatic image-text alignment method for large-scale web image retrieval
    Zhang, Baopeng
    Qu, Yanyun
    Peng, Jinye
    Fan, Jianping
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (20) : 21401 - 21421
  • [29] Context-Aware Attention Network for Image-Text Retrieval
    Zhang, Qi
    Lei, Zhen
    Zhang, Zhaoxiang
    Li, Stan Z.
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
  • [30] Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image-Text Retrieval
    Yang, Rui
    Wang, Shuang
    Han, Yingping
    Li, Yuanheng
    Zhao, Dong
    Quan, Dou
    Guo, Yanhe
    Jiao, Licheng
    Yang, Zhi
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62