Semantic Modeling of Textual Relationships in Cross-modal Retrieval

被引:3
|
作者
Yu, Jing [1 ]
Yang, Chenghao [2 ]
Qin, Zengchang [2 ]
Yang, Zhuoqian [2 ]
Hu, Yue [1 ]
Shi, Zhiguo [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Beihang Univ, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[3] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China
关键词
Textual relationships; Relationship integration; Cross-modal retrieval; Knowledge graph; Graph Convolutional Network;
D O I
10.1007/978-3-030-29551-6_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relationships, statistical co-occurrence, and prior relationships in knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal similarity measure is learned by distance metric learning. Experimental results show that, by leveraging the rich relational semantics in texts, our model can outperform the state-of-the-art models by 3.4% on 6.3% in accuracy on two benchmark datasets.
引用
收藏
页码:24 / 32
页数:9
相关论文
共 50 条
  • [41] Cross-modal image sentiment analysis via deep correlation of textual semantic
    Zhang, Ke
    Zhu, Yunwen
    Zhang, Wenjun
    Zhu, Yonghua
    KNOWLEDGE-BASED SYSTEMS, 2021, 216
  • [42] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [43] A language-guided cross-modal semantic fusion retrieval method
    Zhu, Ligu
    Zhou, Fei
    Wang, Suping
    Shi, Lei
    Kou, Feifei
    Li, Zeyu
    Zhou, Pengpeng
    SIGNAL PROCESSING, 2025, 234
  • [44] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
    Hao Pan
    Jun Huang
    International Journal of Multimedia Information Retrieval, 2022, 11 : 369 - 382
  • [45] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
    Pan, Hao
    Huang, Jun
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (03) : 369 - 382
  • [46] Discriminative Latent Semantic Regression for Cross-Modal Hashing of Multimedia Retrieval
    Wan, Jianwu
    Wang, Yi
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [47] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [48] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [49] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
    Shu, Xinsheng
    Li, Mingyong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
  • [50] Towards learning a semantic-consistent subspace for cross-modal retrieval
    Xu, Meixiang
    Zhu, Zhenfeng
    Zhao, Yao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 389 - 412