STMG: Swin transformer for multi-label image recognition with graph convolution network

被引:0
|
作者
Yangtao Wang
Yanzhao Xie
Lisheng Fan
Guangxing Hu
机构
[1] Guangzhou University,School of Computer Science and Cyber Engineering
[2] Huazhong University of Science and Technology,undefined
来源
关键词
Swin transformer; Graph convolution network; Multi-label image recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Vision Transformer (ViT) has achieved promising single-label image classification results compared to conventional neural network-based models. Nevertheless, few ViT related studies have explored the label dependencies in the multi-label image recognition field. To this end, we propose STMG that combines transformer and graph convolution network (GCN) to extract the image features and learn the label dependencies for multi-label image recognition. STMG consists of an image representation learning module and a label co-occurrence embedding module. Firstly, in the image representation learning module, to avoid computing the similarity between each two patches, we adopt Swin transformer instead of ViT to generate the image feature for each input image. Secondly, in the label co-occurrence embedding module, we design a two-layer GCN to adaptively capture the label dependencies to output the label co-occurrence embeddings. At last, STMG fuses the image feature and label co-occurrence embeddings to produce the image classification results with the commonly-used multi-label classification loss function and a L2-norm loss function. We conduct extensive experiments on two multi-label image datasets including MS-COCO and FLICKR25K. Experimental results demonstrate STMG can achieve better performance including the convergence efficiency and classification results compared to the state-of-the-art multi-label image recognition methods. Our code is open-sourced and publicly available on GitHub: https://github.com/lzHZWZ/STMG.
引用
收藏
页码:10051 / 10063
页数:12
相关论文
共 50 条
  • [1] STMG: Swin transformer for multi-label image recognition with graph convolution network
    Wang, Yangtao
    Xie, Yanzhao
    Fan, Lisheng
    Hu, Guangxing
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 10051 - 10063
  • [2] Graph Attention Transformer Network for Multi-label Image Classification
    Yuan, Jin
    Chen, Shikai
    Zhang, Yao
    Shi, Zhongchao
    Geng, Xin
    Fan, Jianping
    Rui, Yong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [3] Transformer-based Dual Relation Graph for Multi-label Image Recognition
    Zhao, Jiawei
    Yan, Ke
    Zhao, Yifan
    Guo, Xiaowei
    Huang, Feiyue
    Li, Jia
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 163 - 172
  • [4] Multi-label image recognition with two-stream dynamic graph convolution networks
    Cao, Pingping
    Chen, Pengpeng
    Niu, Qiang
    Image and Vision Computing, 2021, 113
  • [5] Multi-label image recognition with two-stream dynamic graph convolution networks
    Cao, Pingping
    Chen, Pengpeng
    Niu, Qiang
    IMAGE AND VISION COMPUTING, 2021, 113
  • [6] G-CAM: Graph Convolution Network Based Class Activation Mapping for Multi-label Image Recognition
    Wang, Yangtao
    Xie, Yanzhao
    Liu, Yu
    Fan, Lisheng
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 322 - 330
  • [7] Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion
    Wang, Yangtao
    Xie, Yanzhao
    Liu, Yu
    Zhou, Ke
    Li, Xiaocui
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1575 - 1584
  • [8] A hash centroid construction method with Swin transformer for multi-label image retrieval
    Xie, Yanzhao
    Wang, Yangtao
    Wei, Rukai
    Liu, Yu
    Zhou, Ke
    Fan, Lisheng
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (15): : 10891 - 10907
  • [9] Multi-label image recognition for electric power equipment inspection based on multi-scale dynamic graph convolution network
    Yan, Yunfeng
    Han, Yadong
    Qi, Donglian
    Lin, Jiajun
    Yang, Zhi
    Jin, Lingfeng
    ENERGY REPORTS, 2023, 9 : 1928 - 1937
  • [10] Multi-label image recognition for electric power equipment inspection based on multi-scale dynamic graph convolution network
    Yan, Yunfeng
    Han, Yadong
    Qi, Donglian
    Lin, Jiajun
    Yang, Zhi
    Jin, Lingfeng
    ENERGY REPORTS, 2023, 9 : 1928 - 1937