Transformer-based Dual Relation Graph for Multi-label Image Recognition

被引：46

作者：

Zhao, Jiawei ^{[1
]}

Yan, Ke ^{[2
]}

Zhao, Yifan ^{[1
]}

Guo, Xiaowei ^{[2
]}

Huang, Feiyue ^{[2
]}

Li, Jia ^{[1
,3
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China

[2] Tencent Youtu Lab, Shanghai, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV48922.2021.00023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.

引用

下载

页码：163 / 172

页数：10

共 50 条

[1] STMG: Swin transformer for multi-label image recognition with graph convolution network
Wang, Yangtao
Xie, Yanzhao
Fan, Lisheng
Hu, Guangxing
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 10051 - 10063
[2] STMG: Swin transformer for multi-label image recognition with graph convolution network
Yangtao Wang
Yanzhao Xie
Lisheng Fan
Guangxing Hu
Neural Computing and Applications, 2022, 34 : 10051 - 10063
[3] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
Hoffmann, David Sebastian
Clasen, Kai Norman
Demir, Begum
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
[4] Multi-Label Image Recognition with Graph Convolutional Networks
Chen, Zhao-Min
Wei, Xiu-Shen
Wang, Peng
Guo, Yanwen
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5172 - 5181
[5] Graph Attention Transformer Network for Multi-label Image Classification
Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[6] Modular Graph Transformer Networks for Multi-Label Image Classification
Nguyen, Hoang D.
Vu, Xuan-Son
Le, Duc-Trong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9092 - 9100
[7] Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image Classification
Zhou, Wei
Jiang, Weitao
Chen, Dihu
Hu, Haifeng
Su, Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1143 - 1157
[8] Learning label correlations for multi-label image recognition with graph networks
Li, Qing
Peng, Xiaojiang
Qiao, Yu
Peng, Qiang
PATTERN RECOGNITION LETTERS, 2020, 138 : 378 - 384
[9] DATran: Dual Attention Transformer for Multi-Label Image Classification
Zhou, Wei
Zheng, Zhijie
Su, Tao
Hu, Haifeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
[10] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
Ju, Xincheng
Zhang, Dong
Li, Junhui
Zhou, Guodong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520

← 1 2 3 4 5 →