HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval

被引:10
|
作者
Guo, Jie [1 ]
Wang, Meiting [1 ]
Zhou, Yan [1 ]
Song, Bin [1 ]
Chi, Yuhao [1 ]
Fan, Wei [2 ]
Chang, Jianglong [2 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[2] Guangdong OPPO Mobile Telecommun Corp Ltd, Dong Guan 523860, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text retrieval; feature aggregation; graph convolution network; hierarchical alignment; ATTENTION;
D O I
10.1109/TMM.2023.3248160
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text retrieval (ITR) is a challenging task in the field of multimodal information processing due to the semantic gap between different modalities. In recent years, researchers have made great progress in exploring the accurate alignment between image and text. However, existing works mainly focus on the fine-grained alignment between image regions and sentence fragments, which ignores the guiding significance of context background information. Actually, integrating the local fine-grained information and global context background information can provide more semantic clues for retrieval. In this paper, we propose a novel Hierarchical Graph Alignment Network (HGAN) for image-text retrieval. First, to capture the comprehensive multimodal features, we construct the feature graphs for the image and text modality respectively. Then, a multi-granularity shared space is established with a designed Multi-granularity Feature Aggregation and Rearrangement (MFAR) module, which enhances the semantic corresponding relations between the local and global information, and obtains more accurate feature representations for the image and text modalities. Finally, the ultimate image and text features are further refined through three-level similarity functions to achieve the hierarchical alignment. To justify the proposed model, we perform extensive experiments on MS-COCO and Flickr30K datasets. Experimental results show that the proposed HGAN outperforms the state-of-the-art methods on both datasets, which demonstrates the effectiveness and superiority of our model.
引用
收藏
页码:9189 / 9202
页数:14
相关论文
共 50 条
  • [1] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    [J]. SENSORS, 2023, 23 (05)
  • [2] Cross-modal alignment with graph reasoning for image-text retrieval
    Zheng Cui
    Yongli Hu
    Yanfeng Sun
    Junbin Gao
    Baocai Yin
    [J]. Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
  • [3] Cross-modal alignment with graph reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Gao, Junbin
    Yin, Baocai
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632
  • [4] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [5] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [6] Prototype local-global alignment network for image-text retrieval
    Meng, Lingtao
    Zhang, Feifei
    Zhang, Xi
    Xu, Changsheng
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
  • [7] Step-Wise Hierarchical Alignment Network for Image-Text Matching
    Ji, Zhong
    Chen, Kexin
    Wang, Haoran
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 765 - 771
  • [8] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [9] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [10] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198