A semantic guidance-based fusion network for multi-label image classification

被引:0
|
作者
Wang, Jiuhang [1 ,2 ]
Tang, Hongying [1 ]
Luo, Shanshan [1 ]
Yang, Liqi [1 ,2 ]
Liu, Shusheng [1 ,2 ]
Hong, Aoping [1 ,2 ]
Li, Baoqing [1 ]
机构
[1] Shanghai lnstitute Microsyst & informat Technol, Sci & Technol Microsyst Lab, 1455 Pingcheng Rd, Shanghai 201800, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun, 1 Yanqihu East Rd, Beijing 100049, Peoples R China
关键词
Image spatial correlation; Label semantic correlation; Layered semantic guidance fusion; Multi-label image classification;
D O I
10.1016/j.patrec.2024.08.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label image classification (MLIC), a fundamental task assigning multiple labels to each image, has been seen notable progress in recent years. Considering simultaneous appearances of objects in the physical world, modeling object correlations is crucial for enhancing classification accuracy. This involves accounting for spatial image feature correlation and label semantic correlation. However, existing methods struggle to establish these correlations due to complex spatial location and label semantic relationships. On the other hand, regarding the fusion of image feature relevance and label semantic relevance, existing methods typically learn a semantic representation in the final CNN layer to combine spatial and label semantic correlations. However, different CNN layers capture features at diverse scales and possess distinct discriminative abilities. To address these issues, in this paper we introduce the Semantic Guidance-Based Fusion Network (SGFN) for MLIC. To model spatial image feature correlation, we leverage the advanced TResNet architecture as the backbone network and employ the Feature Aggregation Module for capturing global spatial correlation. For label semantic correlation, we establish both local and global semantic correlation. We further enrich model features by learning semantic representations across multiple convolutional layers. Our method outperforms current state-of-the-art techniques on PASCAL VOC (2007, 2012) and MS-COCO datasets.
引用
收藏
页码:254 / 261
页数:8
相关论文
共 50 条
  • [31] Multi-label movie genre classification based on multimodal fusion
    Zihui Cai
    Hongwei Ding
    Jinlu Wu
    Ying Xi
    Xuemeng Wu
    Xiaohui Cui
    Multimedia Tools and Applications, 2024, 83 : 36823 - 36840
  • [32] Multi-label classification of traditional national costume pattern image semantic understanding
    Zhao H.-Y.
    Zhou W.
    Hou X.-G.
    Qi G.-L.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2020, 28 (03): : 695 - 703
  • [33] Multiple Semantic Embedding with Graph Convolutional Networks for Multi-Label Image Classification
    Zhou, Tong
    Feng, Songhe
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 449 - 461
  • [34] Multi-label semantic sharing based on graph convolutional network for image-to-text retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    VISUAL COMPUTER, 2024, : 1827 - 1840
  • [35] Attention-Augmented Memory Network for Image Multi-Label Classification
    Zhou, Wei
    Hou, Yanke
    Chen, Dihu
    Hu, Haifeng
    Su, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [36] Clustering Based Multi-Label Classification for Image Annotation and Retrieval
    Nasierding, Gulisong
    Tsoumakas, Grigorios
    Kouzani, Abbas Z.
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 4514 - +
  • [37] Multi-label Garbage Image Classification Based on Deep Learning
    Yan, Kang
    Si, Wenyu
    Hang, Jin
    Zhou, Hong
    Zhu, Quanyin
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 150 - 153
  • [38] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    Computers and Electrical Engineering, 2022, 101
  • [39] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [40] Multi-Label Fundus Image Classification Using Attention Mechanisms and Feature Fusion
    Li, Zhenwei
    Xu, Mengying
    Yang, Xiaoli
    Han, Yanqi
    MICROMACHINES, 2022, 13 (06)