CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

被引:186
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
Huang, Xin [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; fine-grained correlation; joint optimization; multi-task learning; REPRESENTATION; MODEL;
D O I
10.1109/TMM.2017.2742704
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on deep neural network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: 1) In the first learning stage, they only model intramodality correlation, but ignore intermodality correlation with rich complementary context. 2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intramodality and intermodality correlation. 3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multigrained fusion by hierarchical network, and the contributions are as follows: 1) In the first learning stage, CCL exploits multilevel association with joint optimization to preserve the complementary context from intramodality and intermodality correlation simultaneously. 2) In the second learning stage, a multitask learning strategy is designed to adaptively balance the intramodality semantic category constraints and intermodality pairwise similarity constraints. 3) CCL adopts multigrained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance.
引用
收藏
页码:405 / 420
页数:16
相关论文
共 50 条
  • [31] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
  • [32] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [33] Show and Tell in the Loop: Cross-Modal Circular Correlation Learning
    Peng, Yuxin
    Qi, Jinwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1538 - 1550
  • [34] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [35] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [37] Infant cross-modal learning
    Chow, Hiu Mei
    Tsui, Angeline Sin-Mei
    Ma, Yuen Ki
    Yat, Mei Ying
    Tseng, Chia-huei
    I-PERCEPTION, 2014, 5 (04): : 463 - 463
  • [38] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
  • [39] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
    Shu, Xinsheng
    Li, Mingyong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
  • [40] PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer
    Yu, Wei
    Li, Yanping
    Wang, Rui
    Cao, Wenming
    Xiang, Wei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3369 - 3382