CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

被引：186

作者：

Peng, Yuxin ^{[1
]}

Qi, Jinwei ^{[1
]}

Huang, Xin ^{[1
]}

Yuan, Yuxin ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2018年 / 20卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; fine-grained correlation; joint optimization; multi-task learning; REPRESENTATION; MODEL;

D O I：

10.1109/TMM.2017.2742704

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on deep neural network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: 1) In the first learning stage, they only model intramodality correlation, but ignore intermodality correlation with rich complementary context. 2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intramodality and intermodality correlation. 3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multigrained fusion by hierarchical network, and the contributions are as follows: 1) In the first learning stage, CCL exploits multilevel association with joint optimization to preserve the complementary context from intramodality and intermodality correlation simultaneously. 2) In the second learning stage, a multitask learning strategy is designed to adaptively balance the intramodality semantic category constraints and intermodality pairwise similarity constraints. 3) CCL adopts multigrained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance.

引用

页码：405 / 420

页数：16

共 50 条

[31] Self-Supervised Correlation Learning for Cross-Modal Retrieval
Liu, Yaxin
Wu, Jianlong
Qu, Leigang
Gan, Tian
Yin, Jianhua
Nie, Liqiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
[32] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
Yuan, Xu
Zhong, Hua
Chen, Zhikui
Zhong, Fangming
Hu, Yueming
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
[33] Show and Tell in the Loop: Cross-Modal Circular Correlation Learning
Peng, Yuxin
Qi, Jinwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1538 - 1550
[34] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
Zhang, Chengyuan
Song, Jiayu
Zhu, Xiaofeng
Zhu, Lei
Zhang, Shichao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
[35] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
Hua, Yan
Du, Jianhe
PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
[36] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification (vol 27, pg 664, 2022)
Niu, Zhendong
TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (05) : 880 - 880
[37] Infant cross-modal learning
Chow, Hiu Mei
Tsui, Angeline Sin-Mei
Ma, Yuen Ki
Yat, Mei Ying
Tseng, Chia-huei
I-PERCEPTION, 2014, 5 (04): : 463 - 463
[38] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
Xu, Xing
Song, Jingkuan
Lu, Huimin
Yang, Yang
Shen, Fumin
Huang, Zi
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
[39] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
Shu, Xinsheng
Li, Mingyong
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
[40] PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer
Yu, Wei
Li, Yanping
Wang, Rui
Cao, Wenming
Xiang, Wei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3369 - 3382

← 1 2 3 4 5 →