CAESAR: concept augmentation based semantic representation for cross-modal retrieval

被引:0
|
作者
Lei Zhu
Jiayu Song
Xiangxiang Wei
Hao Yu
Jun Long
机构
[1] Central South University,School of Computer Science and Engineering
[2] Central South University,Big Data and Knowledge Engineering Institute
来源
关键词
Cross-modal retrieval; Deep learning; Multi-modal representation learning; Concept augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
With the increasing amount of multimedia data, cross-modal retrieval has attracted more attentions in the area of multimedia and computer vision. To bridge the semantic gap between multi-modal data and improve the performance of retrieval, we propose an effective concept augmentation based method, named CAESAR, which is an end-to-end framework including cross-modal correlation learning and concept augmentation based semantic mapping learning. To enhance the representation and correlation learning, a novel multi-modal CNNs based CCA model is developed, which is to capture high-level semantic information during the cross-modal feature learning, and then capture maximal nonlinear correlation. In addition, to learn the semantic relationships between multi-modal samples, a concept learning model named CaeNet is proposed, which is realized by word2vec and LDA to capture the closer relations between texts and abstract concepts. Reenforce by the abstract concept information, cross-modal semantic mappings are learnt with a semantic alignment strategy. We conduct comprehensive experiments on four benchmark multimedia datasets. The results show that our method has great performance for cross-modal retrieval.
引用
收藏
页码:34213 / 34243
页数:30
相关论文
共 50 条
  • [1] CAESAR: concept augmentation based semantic representation for cross-modal retrieval
    Zhu, Lei
    Song, Jiayu
    Wei, Xiangxiang
    Yu, Hao
    Long, Jun
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34213 - 34243
  • [2] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
    Zhu, Lei
    Song, Jiayu
    Zhu, Xiaofeng
    Zhang, Chengyuan
    Zhang, Shichao
    Yuan, Xinpan
    [J]. IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
  • [3] Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph
    Qi, Mengshi
    Wang, Yunhong
    Li, Annan
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 744 - 752
  • [4] Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval
    Zhu, Lei
    Zhang, Chengyuan
    Song, Jiayu
    Zhang, Shichao
    Tian, Chunwei
    Zhu, Xinghui
    [J]. IEEE MULTIMEDIA, 2022, 29 (03) : 17 - 26
  • [5] Cross-Modal Retrieval Based on Semantic Filtering and Adaptive Pooling
    Qiao, Nan
    Mao, Junyi
    Xie, Hao
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 296 - 310
  • [6] Deep Semantic Mapping for Cross-Modal Retrieval
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    [J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241
  • [7] Analyzing semantic correlation for cross-modal retrieval
    Liang Xie
    Peng Pan
    Yansheng Lu
    [J]. Multimedia Systems, 2015, 21 : 525 - 539
  • [8] Semantic consistency hashing for cross-modal retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 193 : 250 - 259
  • [9] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [10] Analyzing semantic correlation for cross-modal retrieval
    Xie, Liang
    Pan, Peng
    Lu, Yansheng
    [J]. MULTIMEDIA SYSTEMS, 2015, 21 (06) : 525 - 539