Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval

被引:20
|
作者
Qian, Shengsheng [1 ,2 ]
Xue, Dizhan [1 ,2 ]
Fang, Quan [1 ,2 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Correlation; Semantics; Task analysis; Adaptation models; Adaptive systems; Birds; Oceans; Cross-modal retrieval; Deep learning; Graph convolutional networks;
D O I
10.1109/TMM.2021.3101642
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The cross-modal retrieval task has raised continuous attention in recent years with the increasing scale of multi-modal data, which has broad application prospects including multimedia data management and intelligent search engine. Most existing methods mainly project data of different modalities into a common representation space where label information is often exploited to distinguish samples from different semantic categories. However, they typically treat each label as an independent individual and ignore the underlying semantic structure of labels. In this paper, we propose an end-to-end adaptive label-aware graph convolutional network (ALGCN) by designing both the instance representation learning branch and the label representation learning branch, which can obtain modality-invariant and discriminative representations for cross-modal retrieval. Firstly, we construct an instance representation learning branch to transform instances of different modalities into a common representation space. Secondly, we adopt Graph Convolutional Network (GCN) to learn inter-dependent classifiers in the label representation learning branch. In addition, a novel adaptive correlation matrix is proposed to efficiently explore and preserve the semantic structure of labels in a data-driven manner. Together with a robust self-supervision loss for GCN, the GCN model can be supervised to learn an effective and robust correlation matrix for feature propagation. Comprehensive experimental results on three benchmark datasets, NUS-WIDE, MIRFlickr and MS-COCO, demonstrate the superiority of ALGCN, compared with the state-of-the-art methods in cross-modal retrieval.
引用
下载
收藏
页码:3520 / 3532
页数:13
相关论文
共 50 条
  • [41] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [42] Data-Aware Proxy Hashing for Cross-modal Retrieval
    Tu, Rong-Cheng
    Mao, Xian-Ling
    Ji, Wenjin
    Wei, Wei
    Huang, Heyan
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 686 - 696
  • [43] StacMR: Scene-Text Aware Cross-Modal Retrieval
    Mafla, Andres
    Rezende, Rafael S.
    Gomez, Lluis
    Larlus, Diane
    Karatzas, Dimosthenis
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2219 - 2229
  • [44] Semi-Supervised Cross-Modal Retrieval With Label Prediction
    Mandal, Devraj
    Rao, Pramod
    Biswas, Soma
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2345 - 2353
  • [45] Deep Label Feature Fusion Hashing for Cross-Modal Retrieval
    Ren, Dongxiao
    Xu, Weihua
    Wang, Zhonghua
    Sun, Qinxiu
    IEEE ACCESS, 2022, 10 : 100276 - 100285
  • [46] Cross-modal Retrieval by Real Label Partial Least Squares
    He, Jianfeng
    Ma, Bingpeng
    Wang, Shuhui
    Liu, Yugui
    Huang, Qingming
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 227 - 231
  • [47] Modality-Fused Graph Network for Cross-Modal Retrieval
    Wu, Fei
    LI, Shuaishuai
    Peng, Guangchuan
    Ma, Yongheng
    Jing, Xiao-Yuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 1094 - 1097
  • [48] Iterative graph attention memory network for cross-modal retrieval
    Dong, Xinfeng
    Zhang, Huaxiang
    Dong, Xiao
    Lu, Xu
    KNOWLEDGE-BASED SYSTEMS, 2021, 226
  • [49] Exploring Graph-Structured Semantics for Cross-Modal Retrieval
    Zhang, Lei
    Chen, Leiting
    Zhou, Chuan
    Yang, Fan
    Li, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4277 - 4286
  • [50] SEMI-SUPERVISED GRAPH CONVOLUTIONAL HASHING NETWORK FOR LARGE-SCALE CROSS-MODAL RETRIEVAL
    Shen, Zhanjian
    Zhai, Deming
    Liu, Xianming
    Jiang, Junjun
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2366 - 2370