Semantic Modeling of Textual Relationships in Cross-modal Retrieval

被引：3

作者：

Yu, Jing ^{[1
]}

Yang, Chenghao ^{[2
]}

Qin, Zengchang ^{[2
]}

Yang, Zhuoqian ^{[2
]}

Hu, Yue ^{[1
]}

Shi, Zhiguo ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Beihang Univ, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China

[3] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I | 2019年 / 11775卷

关键词：

Textual relationships; Relationship integration; Cross-modal retrieval; Knowledge graph; Graph Convolutional Network;

D O I：

10.1007/978-3-030-29551-6_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relationships, statistical co-occurrence, and prior relationships in knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal similarity measure is learned by distance metric learning. Experimental results show that, by leveraging the rich relational semantics in texts, our model can outperform the state-of-the-art models by 3.4% on 6.3% in accuracy on two benchmark datasets.

引用

页码：24 / 32

页数：9

共 50 条

[41] Cross-modal image sentiment analysis via deep correlation of textual semantic
Zhang, Ke
Zhu, Yunwen
Zhang, Wenjun
Zhu, Yonghua
KNOWLEDGE-BASED SYSTEMS, 2021, 216
[42] Adversarial Cross-Modal Retrieval
Wang, Bokun
Yang, Yang
Xu, Xing
Hanjalic, Alan
Shen, Heng Tao
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
[43] A language-guided cross-modal semantic fusion retrieval method
Zhu, Ligu
Zhou, Fei
Wang, Suping
Shi, Lei
Kou, Feifei
Li, Zeyu
Zhou, Pengpeng
SIGNAL PROCESSING, 2025, 234
[44] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
Hao Pan
Jun Huang
International Journal of Multimedia Information Retrieval, 2022, 11 : 369 - 382
[45] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
Pan, Hao
Huang, Jun
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (03) : 369 - 382
[46] Discriminative Latent Semantic Regression for Cross-Modal Hashing of Multimedia Retrieval
Wan, Jianwu
Wang, Yi
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[47] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
Li, Zhengxin
Zhao, Wenzhe
Du, Xuanyi
Zhou, Guangyao
Zhang, Songlin
REMOTE SENSING, 2024, 16 (01)
[48] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
Ji, Zhenyan
Yao, Weina
Wei, Wei
Song, Houbing
Pi, Huaiyu
IEEE ACCESS, 2019, 7 : 23667 - 23674
[49] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
Shu, Xinsheng
Li, Mingyong
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
[50] Towards learning a semantic-consistent subspace for cross-modal retrieval
Xu, Meixiang
Zhu, Zhenfeng
Zhao, Yao
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 389 - 412

← 1 2 3 4 5 →