Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

被引:29
|
作者
Yu, Jing [1 ,2 ]
Lu, Yuhang [1 ,2 ]
Qin, Zengchang [3 ]
Zhang, Weifeng [4 ,5 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[4] Hangzhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
D O I
10.1007/978-3-030-00776-8_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.
引用
下载
收藏
页码:223 / 234
页数:12
相关论文
共 50 条
  • [31] Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval
    Zhong Ji
    Kexin Chen
    Yuqing He
    Yanwei Pang
    Xuelong Li
    Science China Information Sciences, 2022, 65
  • [32] Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval
    Ji, Zhong
    Chen, Kexin
    He, Yuqing
    Pang, Yanwei
    Li, Xuelong
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (07)
  • [33] Semantic-embedding Guided Graph Network for cross-modal retrieval
    Yuan, Mengru
    Zhang, Huaxiang
    Liu, Dongmei
    Wang, Lin
    Liu, Li
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 93
  • [34] Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Yang, Min
    Gui, Lin
    He, Yulan
    Pei, Wenjie
    Xu, Ruifeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1767 - 1777
  • [35] Cross-Modal Retrieval with Improved Graph Convolution
    Hongtu, Zhang
    Chunjian, Hua
    Yi, Jiang
    Jianfeng, Yu
    Ying, Chen
    Computer Engineering and Applications, 2024, 60 (11) : 95 - 104
  • [36] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300
  • [37] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [38] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [39] MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval
    Wang, Zhiming
    Dong, Zhihua
    Yang, Xiaoyu
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 261 - 273
  • [40] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159