Transformer-based Dual Relation Graph for Multi-label Image Recognition

被引:46
|
作者
Zhao, Jiawei [1 ]
Yan, Ke [2 ]
Zhao, Yifan [1 ]
Guo, Xiaowei [2 ]
Huang, Feiyue [2 ]
Li, Jia [1 ,3 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China
[2] Tencent Youtu Lab, Shanghai, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.00023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.
引用
下载
收藏
页码:163 / 172
页数:10
相关论文
共 50 条
  • [1] STMG: Swin transformer for multi-label image recognition with graph convolution network
    Wang, Yangtao
    Xie, Yanzhao
    Fan, Lisheng
    Hu, Guangxing
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 10051 - 10063
  • [2] STMG: Swin transformer for multi-label image recognition with graph convolution network
    Yangtao Wang
    Yanzhao Xie
    Lisheng Fan
    Guangxing Hu
    Neural Computing and Applications, 2022, 34 : 10051 - 10063
  • [3] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
    Hoffmann, David Sebastian
    Clasen, Kai Norman
    Demir, Begum
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
  • [4] Multi-Label Image Recognition with Graph Convolutional Networks
    Chen, Zhao-Min
    Wei, Xiu-Shen
    Wang, Peng
    Guo, Yanwen
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5172 - 5181
  • [5] Graph Attention Transformer Network for Multi-label Image Classification
    Yuan, Jin
    Chen, Shikai
    Zhang, Yao
    Shi, Zhongchao
    Geng, Xin
    Fan, Jianping
    Rui, Yong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [6] Modular Graph Transformer Networks for Multi-Label Image Classification
    Nguyen, Hoang D.
    Vu, Xuan-Son
    Le, Duc-Trong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9092 - 9100
  • [7] Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image Classification
    Zhou, Wei
    Jiang, Weitao
    Chen, Dihu
    Hu, Haifeng
    Su, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1143 - 1157
  • [8] Learning label correlations for multi-label image recognition with graph networks
    Li, Qing
    Peng, Xiaojiang
    Qiao, Yu
    Peng, Qiang
    PATTERN RECOGNITION LETTERS, 2020, 138 : 378 - 384
  • [9] DATran: Dual Attention Transformer for Multi-Label Image Classification
    Zhou, Wei
    Zheng, Zhijie
    Su, Tao
    Hu, Haifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
  • [10] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
    Ju, Xincheng
    Zhang, Dong
    Li, Junhui
    Zhou, Guodong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520