STMG: Swin transformer for multi-label image recognition with graph convolution network

被引:0
|
作者
Yangtao Wang
Yanzhao Xie
Lisheng Fan
Guangxing Hu
机构
[1] Guangzhou University,School of Computer Science and Cyber Engineering
[2] Huazhong University of Science and Technology,undefined
来源
Neural Computing and Applications | 2022年 / 34卷
关键词
Swin transformer; Graph convolution network; Multi-label image recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Vision Transformer (ViT) has achieved promising single-label image classification results compared to conventional neural network-based models. Nevertheless, few ViT related studies have explored the label dependencies in the multi-label image recognition field. To this end, we propose STMG that combines transformer and graph convolution network (GCN) to extract the image features and learn the label dependencies for multi-label image recognition. STMG consists of an image representation learning module and a label co-occurrence embedding module. Firstly, in the image representation learning module, to avoid computing the similarity between each two patches, we adopt Swin transformer instead of ViT to generate the image feature for each input image. Secondly, in the label co-occurrence embedding module, we design a two-layer GCN to adaptively capture the label dependencies to output the label co-occurrence embeddings. At last, STMG fuses the image feature and label co-occurrence embeddings to produce the image classification results with the commonly-used multi-label classification loss function and a L2-norm loss function. We conduct extensive experiments on two multi-label image datasets including MS-COCO and FLICKR25K. Experimental results demonstrate STMG can achieve better performance including the convergence efficiency and classification results compared to the state-of-the-art multi-label image recognition methods. Our code is open-sourced and publicly available on GitHub: https://github.com/lzHZWZ/STMG.
引用
收藏
页码:10051 / 10063
页数:12
相关论文
共 50 条
  • [41] Adaptive knowledge graph for multi-label image classificationAdaptive knowledge graph for multi-label image classificationZ. Lin et al.
    Zhihong Lin
    Xue-song Tang
    Kuangrong Hao
    Mingbo Zhao
    Yubing Li
    Applied Intelligence, 2025, 55 (1)
  • [42] SMART: Semantic-Aware Masked Attention Relational Transformer for Multi-label Image Recognition
    Wu, Hongjun
    Xu, Cheng
    Liu, Hongzhe
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2158 - 2162
  • [43] Multi-Label Graph Convolutional Network Representation Learning
    Shi, Min
    Tang, Yufei
    Zhu, Xingquan
    Liu, Jianxun
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (05) : 1169 - 1181
  • [44] An Underwater Multi-Label Classification Algorithm Based on a Bilayer Graph Convolution Learning Network with Constrained Codec
    Li, Yun
    Wang, Su
    Mo, Jiawei
    Wei, Xin
    ELECTRONICS, 2024, 13 (16)
  • [45] Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image Classification
    Zhou, Wei
    Jiang, Weitao
    Chen, Dihu
    Hu, Haifeng
    Su, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1143 - 1157
  • [46] DCA-GCN: a dual-branching channel attention and graph convolution network for multi-label remote sensing image classification
    Yang, Minhang
    Liu, Hui
    Gao, Liang
    Qian, Yurong
    Xiao, Zhengqing
    JOURNAL OF APPLIED REMOTE SENSING, 2021, 15 (04)
  • [47] DATran: Dual Attention Transformer for Multi-Label Image Classification
    Zhou, Wei
    Zheng, Zhijie
    Su, Tao
    Hu, Haifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
  • [48] Two-Stream Transformer for Multi-Label Image Classification
    Zhu, Xuelin
    Cao, Jiuxin
    Ge, Jiawei
    Liu, Weijia
    Liu, Bo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3598 - 3607
  • [49] Multi-Label Auroral Image Classification Based on CNN and Transformer
    Su, Hang
    Yang, Qiuju
    Ning, Yixuan
    Hu, Zejun
    Liu, Lili
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1835 - 1848
  • [50] Three-way graph convolutional network for multi-label classification in multi-label information system
    Yu, Bin
    Xie, Hengjie
    Fu, Yu
    Xu, Zeshui
    APPLIED SOFT COMPUTING, 2024, 161