Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

被引:29
|
作者
Yu, Jing [1 ,2 ]
Lu, Yuhang [1 ,2 ]
Qin, Zengchang [3 ]
Zhang, Weifeng [4 ,5 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[4] Hangzhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
D O I
10.1007/978-3-030-00776-8_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.
引用
下载
收藏
页码:223 / 234
页数:12
相关论文
共 50 条
  • [21] A Graph Model for Cross-modal Retrieval
    Wang, Shixun
    Pan, Peng
    Lu, Yansheng
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
  • [22] Iterative graph attention memory network for cross-modal retrieval
    Dong, Xinfeng
    Zhang, Huaxiang
    Dong, Xiao
    Lu, Xu
    KNOWLEDGE-BASED SYSTEMS, 2021, 226
  • [23] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
    Wu, Hongchang
    Guan, Ziyu
    Zhi, Tao
    zhao, Wei
    Xu, Cai
    Han, Hong
    Yang, Yarning
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
  • [24] SEMI-SUPERVISED GRAPH CONVOLUTIONAL HASHING NETWORK FOR LARGE-SCALE CROSS-MODAL RETRIEVAL
    Shen, Zhanjian
    Zhai, Deming
    Liu, Xianming
    Jiang, Junjun
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2366 - 2370
  • [25] Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval
    Zhang, Peng-Fei
    Li, Yang
    Huang, Zi
    Xu, Xin-Shun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 466 - 479
  • [26] Cross-modal information balance-aware reasoning network for image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Hao, Fei
    Pang, Guangyao
    Wang, Zehao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
  • [27] Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval
    Qian, Shengsheng
    Xue, Dizhan
    Fang, Quan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3520 - 3532
  • [28] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
  • [29] Information Aggregation Semantic Adversarial Network for Cross-Modal Retrieval
    Wang, Hongfei
    Feng, Aimin
    Liu, Xuejun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [30] Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval
    Zhong JI
    Kexin CHEN
    Yuqing HE
    Yanwei PANG
    Xuelong LI
    Science China(Information Sciences), 2022, (07) : 157 - 169