Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval

被引:20
|
作者
Qian, Shengsheng [1 ,2 ]
Xue, Dizhan [1 ,2 ]
Fang, Quan [1 ,2 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Correlation; Semantics; Task analysis; Adaptation models; Adaptive systems; Birds; Oceans; Cross-modal retrieval; Deep learning; Graph convolutional networks;
D O I
10.1109/TMM.2021.3101642
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The cross-modal retrieval task has raised continuous attention in recent years with the increasing scale of multi-modal data, which has broad application prospects including multimedia data management and intelligent search engine. Most existing methods mainly project data of different modalities into a common representation space where label information is often exploited to distinguish samples from different semantic categories. However, they typically treat each label as an independent individual and ignore the underlying semantic structure of labels. In this paper, we propose an end-to-end adaptive label-aware graph convolutional network (ALGCN) by designing both the instance representation learning branch and the label representation learning branch, which can obtain modality-invariant and discriminative representations for cross-modal retrieval. Firstly, we construct an instance representation learning branch to transform instances of different modalities into a common representation space. Secondly, we adopt Graph Convolutional Network (GCN) to learn inter-dependent classifiers in the label representation learning branch. In addition, a novel adaptive correlation matrix is proposed to efficiently explore and preserve the semantic structure of labels in a data-driven manner. Together with a robust self-supervision loss for GCN, the GCN model can be supervised to learn an effective and robust correlation matrix for feature propagation. Comprehensive experimental results on three benchmark datasets, NUS-WIDE, MIRFlickr and MS-COCO, demonstrate the superiority of ALGCN, compared with the state-of-the-art methods in cross-modal retrieval.
引用
下载
收藏
页码:3520 / 3532
页数:13
相关论文
共 50 条
  • [31] Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
    Wang, Yun
    Zhang, Tong
    Zhang, Xueya
    Cui, Zhen
    Huang, Yuge
    Shen, Pengcheng
    Li, Shaoxin
    Yang, Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1793 - 1802
  • [32] Collaborative Subspace Graph Hashing for Cross-modal Retrieval
    Zhang, Xiang
    Dong, Guohua
    Du, Yimo
    Wu, Chengkun
    Luo, Zhigang
    Yang, Canqun
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 213 - 221
  • [33] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [34] Combination subspace graph learning for cross-modal retrieval
    Xu, Gongwen
    Li, Xiaomei
    Shi, Lin
    Zhang, Zhijun
    Zhai, Aidong
    ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) : 1333 - 1343
  • [35] Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval
    Huo, Yadong
    Qin, Qibing
    Dai, Jiangyan
    Wang, Lei
    Zhang, Wenfeng
    Huang, Lei
    Wang, Chengduan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 576 - 589
  • [36] Graph Convolutional Network Semantic Enhancement Hashing for Self-supervised Cross-Modal Retrieval
    Hu, Jinyu
    Li, Mingyong
    Zhang, Jiayan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 410 - 422
  • [37] Adaptive Graph Attention Hashing for Unsupervised Cross-Modal Retrieval via Multimodal Transformers
    Li, Yewen
    Ge, Mingyuan
    Ji, Yucheng
    Li, Mingyong
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 1 - 15
  • [38] Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval
    Li, Chuang
    Fei, Lunke
    Kang, Peipei
    Liang, Jiahao
    Fang, Xiaozhao
    Teng, Shaohua
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 459 - 472
  • [39] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
  • [40] Candidate Label-aware Similarity Graph for Partial Label Data
    Xie, Tian
    Chen, Hongchang
    Gao, Chao
    Li, Shaomei
    Huang, Ruiyang
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 884 - 889