Feature learning network with transformer for multi-label image classification

被引:20
|
作者
Zhou, Wei [1 ]
Dou, Peng [1 ]
Su, Tao [1 ]
Hu, Haifeng [1 ]
Zheng, Zhijie [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
关键词
Multi-label classification; Transformer; Multi-scale features; Spatial attention; Salient features; Feature suppression; ATTENTION;
D O I
10.1016/j.patcog.2022.109203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so it is difficult to identify small-scale objects from images. Besides, current attention-based methods tend to learn the most salient feature regions in images, but fail to excavate various potential useful features concealed by the most salient feature, thus limiting the further improve-ment of model performance. To address above issues, we propose a novel Feature Learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran). Specifically, in order to solve the problem that current methods are difficult to identify small-scale objects, we first present a novel multi-scale fusion module (MSFM) to align high-level features and low-level features to learn multi-scale features. Additionally, a spatial attention module (SAM) utilizing transformer encoder is introduced to capture salient object features in images to enhance the model performance. Furthermore, we devise a feature enhancement and suppression module (FESM) with the aim of excavating potential useful features concealed by the most salient features. By suppressing the most salient features obtained in current SAM layer, and then forcing subsequent SAM layer to excavate potential salient features in fea-ture maps, FL-Tran model can learn various useful features more comprehensively. Extensive experiments on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets demonstrate that our proposed FL-Tran model outperforms current state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-Label Image Classification by Feature Attention Network
    Yan, Zheng
    Liu, Weiwei
    Wen, Shiping
    Yang, Yin
    [J]. IEEE ACCESS, 2019, 7 : 98005 - 98013
  • [2] Graph Attention Transformer Network for Multi-label Image Classification
    Yuan, Jin
    Chen, Shikai
    Zhang, Yao
    Shi, Zhongchao
    Geng, Xin
    Fan, Jianping
    Rui, Yong
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [3] Label correlation guided discriminative label feature learning for multi-label chest image classification
    Zhang, Kai
    Liang, Wei
    Cao, Peng
    Liu, Xiaoli
    Yang, Jinzhu
    Zaiane, Osmar
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 245
  • [4] MULTIMODAL LEARNING FOR MULTI-LABEL IMAGE CLASSIFICATION
    Pang, Yanwei
    Ma, Zhao
    Yuan, Yuan
    Li, Xuelong
    Wang, Kongqiao
    [J]. 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 1797 - 1800
  • [5] Causal multi-label learning for image classification
    Tian, Yingjie
    Bai, Kunlong
    Yu, Xiaotong
    Zhu, Siyu
    [J]. NEURAL NETWORKS, 2023, 167 : 626 - 637
  • [6] Multi-label Active Learning for Image Classification
    Wu, Jian
    Sheng, Victor S.
    Zhang, Jing
    Zhao, Pengpeng
    Cui, Zhiming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5227 - 5231
  • [7] Two-Stream Transformer for Multi-Label Image Classification
    Zhu, Xuelin
    Cao, Jiuxin
    Ge, Jiawei
    Liu, Weijia
    Liu, Bo
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3598 - 3607
  • [8] DATran: Dual Attention Transformer for Multi-Label Image Classification
    Zhou, Wei
    Zheng, Zhijie
    Su, Tao
    Hu, Haifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
  • [9] Modular Graph Transformer Networks for Multi-Label Image Classification
    Nguyen, Hoang D.
    Vu, Xuan-Son
    Le, Duc-Trong
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9092 - 9100
  • [10] Multi-Label Active Learning with Label Correlation for Image Classification
    Ye, Chen
    Wu, Jian
    Sheng, Victor S.
    Zhao, Pengpeng
    Cui, Zhiming
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3437 - 3441