Transformer-based Dual Relation Graph for Multi-label Image Recognition

被引:46
|
作者
Zhao, Jiawei [1 ]
Yan, Ke [2 ]
Zhao, Yifan [1 ]
Guo, Xiaowei [2 ]
Huang, Feiyue [2 ]
Li, Jia [1 ,3 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, SCSE, Beijing, Peoples R China
[2] Tencent Youtu Lab, Shanghai, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.00023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e. MS-COCO and VOC 2007 dataset.
引用
收藏
页码:163 / 172
页数:10
相关论文
共 50 条
  • [21] Semantic-Aware Graph Matching Mechanism for Multi-Label Image Recognition
    Wu Y.
    Feng S.
    Wang Y.
    IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33 (11) : 6788 - 6803
  • [22] Capsule Graph Neural Network for Multi-Label Image Recognition (Student Abstract)
    Zheng, Xiangping
    Liang, Xun
    Wu, Bo
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13117 - 13118
  • [23] Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
    Chen, Tianshui
    Xu, Muxin
    Hui, Xiaolu
    Wu, Hefeng
    Lin, Liang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 522 - 531
  • [24] A Graph-Based Transformer Neural Network for Multi-Label ADR Prediction
    Yadav, Monika
    Ahlawat, Prachi
    Singh, Vijendra
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [25] Image emotion multi-label classification based on multi-graph learning
    Wang, Meixia
    Zhao, Yuhai
    Wang, Yejiang
    Xu, Tongze
    Sun, Yiming
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [26] Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation
    Guarascio, Massimo
    Minici, Marco
    Pisani, Francesco Sergio
    De Francesco, Erika
    Lambardi, Pasquale
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (04) : 1021 - 1043
  • [27] A multi-scale semantic attention representation for multi-label image recognition with graph networks
    Liang, Jun
    Xu, Feiteng
    Yu, Songsen
    Neurocomputing, 2022, 491 : 14 - 23
  • [28] Multi-modality multi-label ocular abnormalities detection with transformer-based semantic dictionary learning
    Siswadi, Anneke Annassia Putri
    Bricq, Stephanie
    Meriaudeau, Fabrice
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (11) : 3433 - 3444
  • [29] A multi-scale semantic attention representation for multi-label image recognition with graph networks
    Liang, Jun
    Xu, Feiteng
    Yu, Songsen
    NEUROCOMPUTING, 2022, 491 : 14 - 23
  • [30] A Unified Modular Framework with Deep Graph Convolutional Networks for Multi-label Image Recognition
    Lin, Qifan
    Chen, Zhaoliang
    Wang, Shiping
    Guo, Wenzhong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 54 - 65