STMG: Swin transformer for multi-label image recognition with graph convolution network

被引:0
|
作者
Yangtao Wang
Yanzhao Xie
Lisheng Fan
Guangxing Hu
机构
[1] Guangzhou University,School of Computer Science and Cyber Engineering
[2] Huazhong University of Science and Technology,undefined
来源
Neural Computing and Applications | 2022年 / 34卷
关键词
Swin transformer; Graph convolution network; Multi-label image recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Vision Transformer (ViT) has achieved promising single-label image classification results compared to conventional neural network-based models. Nevertheless, few ViT related studies have explored the label dependencies in the multi-label image recognition field. To this end, we propose STMG that combines transformer and graph convolution network (GCN) to extract the image features and learn the label dependencies for multi-label image recognition. STMG consists of an image representation learning module and a label co-occurrence embedding module. Firstly, in the image representation learning module, to avoid computing the similarity between each two patches, we adopt Swin transformer instead of ViT to generate the image feature for each input image. Secondly, in the label co-occurrence embedding module, we design a two-layer GCN to adaptively capture the label dependencies to output the label co-occurrence embeddings. At last, STMG fuses the image feature and label co-occurrence embeddings to produce the image classification results with the commonly-used multi-label classification loss function and a L2-norm loss function. We conduct extensive experiments on two multi-label image datasets including MS-COCO and FLICKR25K. Experimental results demonstrate STMG can achieve better performance including the convergence efficiency and classification results compared to the state-of-the-art multi-label image recognition methods. Our code is open-sourced and publicly available on GitHub: https://github.com/lzHZWZ/STMG.
引用
收藏
页码:10051 / 10063
页数:12
相关论文
共 50 条
  • [21] A Graph-Based Transformer Neural Network for Multi-Label ADR Prediction
    Yadav, Monika
    Ahlawat, Prachi
    Singh, Vijendra
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [22] Multi-label graph node classification with label attentive neighborhood convolution
    Zhou, Cangqi
    Chen, Hui
    Zhang, Jing
    Li, Qianmu
    Hu, Dianming
    Sheng, Victor S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 180
  • [23] Multi-label guided graph attention network for education image retrieval
    Nguyen, Van Thanh
    Nguyen, Huu Quynh
    Tran, Anh Dat
    Dao, Thi Thuy Quynh
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [24] Tran-GCN: Multi-label Pattern Image Retrieval via Transformer Driven Graph Convolutional Network
    Li, Ying
    Guan, Chunming
    Cai, Rui
    Ye Erwan
    Ding Yuxiang
    Gao, Jiaquan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6301 - 6310
  • [25] Graph attention mechanism with global contextual information for multi-label image recognition
    Ban, Xiaoxiao
    Li, Peihua
    Wang, Qilong
    Zhou, Shoujun
    Guo, Shijie
    Wang, Yuanquan
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (06)
  • [26] Semantic-Aware Graph Matching Mechanism for Multi-Label Image Recognition
    Wu, Yanan
    Feng, Songhe
    Wang, Yang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6788 - 6803
  • [27] Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
    Chen, Tianshui
    Xu, Muxin
    Hui, Xiaolu
    Wu, Hefeng
    Lin, Liang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 522 - 531
  • [28] Graph convolutional network for multi-label VHR remote sensing scene recognition
    Khan, Nagma
    Chaudhuri, Ushasi
    Banerjee, Biplab
    Chaudhuri, Subhasis
    NEUROCOMPUTING, 2019, 357 : 36 - 46
  • [29] GKGNet: Group K-Nearest Neighbor Based Graph Convolutional Network for Multi-label Image Recognition
    Yao, Ruijie
    Jin, Sheng
    Xu, Lumin
    Zeng, Wang
    Liu, Wentao
    Qian, Chen
    Luo, Ping
    Wu, Ji
    COMPUTER VISION-ECCV 2024, PT XVIII, 2025, 15076 : 91 - 107
  • [30] M-GCN: Brain-inspired memory graph convolutional network for multi-label image recognition
    Xiao Yao
    Feiyang Xu
    Min Gu
    Peipei Wang
    Neural Computing and Applications, 2022, 34 : 6489 - 6502