CAESAR: concept augmentation based semantic representation for cross-modal retrieval

被引:0
|
作者
Lei Zhu
Jiayu Song
Xiangxiang Wei
Hao Yu
Jun Long
机构
[1] Central South University,School of Computer Science and Engineering
[2] Central South University,Big Data and Knowledge Engineering Institute
来源
关键词
Cross-modal retrieval; Deep learning; Multi-modal representation learning; Concept augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
With the increasing amount of multimedia data, cross-modal retrieval has attracted more attentions in the area of multimedia and computer vision. To bridge the semantic gap between multi-modal data and improve the performance of retrieval, we propose an effective concept augmentation based method, named CAESAR, which is an end-to-end framework including cross-modal correlation learning and concept augmentation based semantic mapping learning. To enhance the representation and correlation learning, a novel multi-modal CNNs based CCA model is developed, which is to capture high-level semantic information during the cross-modal feature learning, and then capture maximal nonlinear correlation. In addition, to learn the semantic relationships between multi-modal samples, a concept learning model named CaeNet is proposed, which is realized by word2vec and LDA to capture the closer relations between texts and abstract concepts. Reenforce by the abstract concept information, cross-modal semantic mappings are learnt with a semantic alignment strategy. We conduct comprehensive experiments on four benchmark multimedia datasets. The results show that our method has great performance for cross-modal retrieval.
引用
收藏
页码:34213 / 34243
页数:30
相关论文
共 50 条
  • [21] Representation separation adversarial networks for cross-modal retrieval
    Deng, Jiaxin
    Ou, Weihua
    Gou, Jianping
    Song, Heping
    Wang, Anzhi
    Xu, Xing
    [J]. WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
  • [22] Cross-modal hashing retrieval with compatible triplet representation
    Hao, Zhifeng
    Jin, Yaochu
    Yan, Xueming
    Wang, Chuyue
    Yang, Shangshang
    Ge, Hong
    [J]. NEUROCOMPUTING, 2024, 602
  • [23] Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity
    Weihua Ou
    Ruisheng Xuan
    Jianping Gou
    Quan Zhou
    Yongfeng Cao
    [J]. Multimedia Tools and Applications, 2020, 79 : 14733 - 14750
  • [24] Variational Deep Representation Learning for Cross-Modal Retrieval
    Yang, Chen
    Deng, Zongyong
    Li, Tianyu
    Liu, Hao
    Liu, Libo
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510
  • [25] Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity
    Ou, Weihua
    Xuan, Ruisheng
    Gou, Jianping
    Zhou, Quan
    Cao, Yongfeng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14733 - 14750
  • [26] Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures
    Zheng, Qibin
    Ren, Xiaoguang
    Liu, Yi
    Qin, Wei
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [27] Cross-modal video retrieval algorithm based on multi-semantic clues
    Ding L.
    Li Y.
    Yu C.
    Liu Y.
    Wang X.
    Qi S.
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 596 - 604
  • [28] Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval
    Zhang, Li
    Wu, Xiangqian
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7154 - 7164
  • [29] Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
    Lu, Zhu
    Fang, Deng
    Kun, Liu
    Tingting, He
    Yuanyuan, Liu
    [J]. Data Analysis and Knowledge Discovery, 2021, 5 (12) : 110 - 122
  • [30] Multi-attention based semantic deep hashing for cross-modal retrieval
    Zhu, Liping
    Tian, Gangyi
    Wang, Bingyao
    Wang, Wenjie
    Zhang, Di
    Li, Chengyang
    [J]. APPLIED INTELLIGENCE, 2021, 51 (08) : 5927 - 5939