Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

被引:29
|
作者
Yu, Jing [1 ,2 ]
Lu, Yuhang [1 ,2 ]
Qin, Zengchang [3 ]
Zhang, Weifeng [4 ,5 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[4] Hangzhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
D O I
10.1007/978-3-030-00776-8_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.
引用
下载
收藏
页码:223 / 234
页数:12
相关论文
共 50 条
  • [1] Graph Convolutional Network Hashing for Cross-Modal Retrieval
    Xu, Ruiqing
    Li, Chao
    Yan, Junchi
    Deng, Cheng
    Liu, Xianglong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 982 - 988
  • [2] Adversarial Graph Convolutional Network for Cross-Modal Retrieval
    Dong, Xinfeng
    Liu, Li
    Zhu, Lei
    Nie, Liqiang
    Zhang, Huaxiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1634 - 1645
  • [3] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
    Bai, Cong
    Zeng, Chao
    Ma, Qing
    Zhang, Jinglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
  • [4] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [5] RETRACTED: Graph Convolutional Networks for Cross-Modal Information Retrieval (Retracted Article)
    Yang, Xianben
    Zhang, Wei
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [6] Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval
    Wei, Yuqi
    Li, Ning
    Computer Engineering and Applications, 2023, 59 (16) : 115 - 124
  • [7] Semi-supervised constrained graph convolutional network for cross-modal retrieval
    Zhang, Lei
    Chen, Leiting
    Ou, Weihua
    Zhou, Chuan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [8] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [9] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [10] Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval
    Meng, Hui
    Zhang, Huaxiang
    Liu, Li
    Liu, Dongmei
    Lu, Xu
    Guo, Xinru
    NEUROCOMPUTING, 2024, 595