Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

被引：29

作者：

Yu, Jing ^{[1
,2
]}

Lu, Yuhang ^{[1
,2
]}

Qin, Zengchang ^{[3
]}

Zhang, Weifeng ^{[4
,5
]}

Liu, Yanbing ^{[1
]}

Tan, Jianlong ^{[1
]}

Guo, Li ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China

[4] Hangzhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China

[5] Zhejiang Future Technol Inst, Jiaxing, Peoples R China

来源：

ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I | 2018年 / 11164卷

关键词：

D O I：

10.1007/978-3-030-00776-8_21

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.

引用

下载

页码：223 / 234

页数：12

共 50 条

[21] A Graph Model for Cross-modal Retrieval
Wang, Shixun
Pan, Peng
Lu, Yansheng
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
[22] Iterative graph attention memory network for cross-modal retrieval
Dong, Xinfeng
Zhang, Huaxiang
Dong, Xiao
Lu, Xu
KNOWLEDGE-BASED SYSTEMS, 2021, 226
[23] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
Wu, Hongchang
Guan, Ziyu
Zhi, Tao
zhao, Wei
Xu, Cai
Han, Hong
Yang, Yarning
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
[24] SEMI-SUPERVISED GRAPH CONVOLUTIONAL HASHING NETWORK FOR LARGE-SCALE CROSS-MODAL RETRIEVAL
Shen, Zhanjian
Zhai, Deming
Liu, Xianming
Jiang, Junjun
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2366 - 2370
[25] Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval
Zhang, Peng-Fei
Li, Yang
Huang, Zi
Xu, Xin-Shun
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 466 - 479
[26] Cross-modal information balance-aware reasoning network for image-text retrieval
Qin, Xueyang
Li, Lishuang
Hao, Fei
Pang, Guangyao
Wang, Zehao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
[27] Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval
Qian, Shengsheng
Xue, Dizhan
Fang, Quan
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3520 - 3532
[28] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
Jin, Weike
Zhao, Zhou
Zhang, Pengcheng
Zhu, Jieming
He, Xiuqiang
Zhuang, Yueting
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
[29] Information Aggregation Semantic Adversarial Network for Cross-Modal Retrieval
Wang, Hongfei
Feng, Aimin
Liu, Xuejun
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[30] Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval
Zhong JI
Kexin CHEN
Yuqing HE
Yanwei PANG
Xuelong LI
Science China(Information Sciences), 2022, (07) : 157 - 169

← 1 2 3 4 5 →