CAESAR: concept augmentation based semantic representation for cross-modal retrieval

被引：0

作者：

Lei Zhu

Jiayu Song

Xiangxiang Wei

Hao Yu

Jun Long

机构：

[1] Central South University,School of Computer Science and Engineering

[2] Central South University,Big Data and Knowledge Engineering Institute

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Cross-modal retrieval; Deep learning; Multi-modal representation learning; Concept augmentation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

With the increasing amount of multimedia data, cross-modal retrieval has attracted more attentions in the area of multimedia and computer vision. To bridge the semantic gap between multi-modal data and improve the performance of retrieval, we propose an effective concept augmentation based method, named CAESAR, which is an end-to-end framework including cross-modal correlation learning and concept augmentation based semantic mapping learning. To enhance the representation and correlation learning, a novel multi-modal CNNs based CCA model is developed, which is to capture high-level semantic information during the cross-modal feature learning, and then capture maximal nonlinear correlation. In addition, to learn the semantic relationships between multi-modal samples, a concept learning model named CaeNet is proposed, which is realized by word2vec and LDA to capture the closer relations between texts and abstract concepts. Reenforce by the abstract concept information, cross-modal semantic mappings are learnt with a semantic alignment strategy. We conduct comprehensive experiments on four benchmark multimedia datasets. The results show that our method has great performance for cross-modal retrieval.

引用

页码：34213 / 34243

页数：30

共 50 条

[1] CAESAR: concept augmentation based semantic representation for cross-modal retrieval
Zhu, Lei
Song, Jiayu
Wei, Xiangxiang
Yu, Hao
Long, Jun
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34213 - 34243
[2] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
Zhu, Lei
Song, Jiayu
Zhu, Xiaofeng
Zhang, Chengyuan
Zhang, Shichao
Yuan, Xinpan
[J]. IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
[3] Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph
Qi, Mengshi
Wang, Yunhong
Li, Annan
[J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 744 - 752
[4] Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval
Zhu, Lei
Zhang, Chengyuan
Song, Jiayu
Zhang, Shichao
Tian, Chunwei
Zhu, Xinghui
[J]. IEEE MULTIMEDIA, 2022, 29 (03) : 17 - 26
[5] Cross-Modal Retrieval Based on Semantic Filtering and Adaptive Pooling
Qiao, Nan
Mao, Junyi
Xie, Hao
Wang, Zhiguo
Yin, Guangqiang
[J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 296 - 310
[6] Deep Semantic Mapping for Cross-Modal Retrieval
Wang, Cheng
Yang, Haojin
Meinel, Christoph
[J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241
[7] Analyzing semantic correlation for cross-modal retrieval
Liang Xie
Peng Pan
Yansheng Lu
[J]. Multimedia Systems, 2015, 21 : 525 - 539
[8] Semantic consistency hashing for cross-modal retrieval
Yao, Tao
Kong, Xiangwei
Fu, Haiyan
Tian, Qi
[J]. NEUROCOMPUTING, 2016, 193 : 250 - 259
[9] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Gur, Shir
Neverova, Natalia
Stauffer, Chris
Lim, Ser-Nam
Kiela, Douwe
Reiter, Austin
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
[10] Analyzing semantic correlation for cross-modal retrieval
Xie, Liang
Pan, Peng
Lu, Yansheng
[J]. MULTIMEDIA SYSTEMS, 2015, 21 (06) : 525 - 539

← 1 2 3 4 5 →