Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

被引:29
|
作者
Yu, Jing [1 ,2 ]
Lu, Yuhang [1 ,2 ]
Qin, Zengchang [3 ]
Zhang, Weifeng [4 ,5 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[4] Hangzhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
D O I
10.1007/978-3-030-00776-8_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.
引用
下载
收藏
页码:223 / 234
页数:12
相关论文
共 50 条
  • [41] GRAPH PATTERN LOSS BASED DIVERSIFIED ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
    Chen, Xueying
    Zhang, Rong
    Zhan, Yibing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2391 - 2395
  • [42] Weighted Graph-Structured Semantics Constraint Network for Cross-Modal Retrieval
    Zhang, Lei
    Chen, Leiting
    Zhou, Chuan
    Li, Xin
    Yang, Fan
    Yi, Zhang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1551 - 1564
  • [43] Local Graph Convolutional Networks for Cross-Modal Hashing
    Chen, Yudong
    Wang, Sen
    Lu, Jianglin
    Chen, Zhi
    Zhang, Zheng
    Huang, Zi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1921 - 1928
  • [44] Survey on Video-Text Cross-Modal Retrieval
    Chen, Lei
    Xi, Yimeng
    Liu, Libo
    Computer Engineering and Applications, 2024, 60 (04) : 1 - 20
  • [45] Cross-Modal Coherence for Text-to-Image Retrieval
    Alikhani, Malihe
    Han, Fangda
    Ravi, Hareesh
    Kapadia, Mubbasir
    Pavlovic, Vladimir
    Stone, Matthew
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10427 - 10435
  • [46] Simulation of cross-modal image-text retrieval algorithm under convolutional neural network structure and hash method
    Yang, XianBen
    Zhang, Wei
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (05): : 7106 - 7132
  • [47] Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
    Wang, Yun
    Zhang, Tong
    Zhang, Xueya
    Cui, Zhen
    Huang, Yuge
    Shen, Pengcheng
    Li, Shaoxin
    Yang, Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1793 - 1802
  • [48] Collaborative Subspace Graph Hashing for Cross-modal Retrieval
    Zhang, Xiang
    Dong, Guohua
    Du, Yimo
    Wu, Chengkun
    Luo, Zhigang
    Yang, Canqun
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 213 - 221
  • [49] Combination subspace graph learning for cross-modal retrieval
    Xu, Gongwen
    Li, Xiaomei
    Shi, Lin
    Zhang, Zhijun
    Zhai, Aidong
    ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) : 1333 - 1343
  • [50] Deep Memory Network for Cross-Modal Retrieval
    Song, Ge
    Wang, Dong
    Tan, Xiaoyang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1261 - 1275