Feature learning network with transformer for multi-label image classification

被引：20

作者：

Zhou, Wei ^{[1
]}

Dou, Peng ^{[1
]}

Su, Tao ^{[1
]}

Hu, Haifeng ^{[1
]}

Zheng, Zhijie ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China

来源：

PATTERN RECOGNITION | 2023年 / 136卷

关键词：

Multi-label classification; Transformer; Multi-scale features; Spatial attention; Salient features; Feature suppression; ATTENTION;

D O I：

10.1016/j.patcog.2022.109203

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so it is difficult to identify small-scale objects from images. Besides, current attention-based methods tend to learn the most salient feature regions in images, but fail to excavate various potential useful features concealed by the most salient feature, thus limiting the further improve-ment of model performance. To address above issues, we propose a novel Feature Learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran). Specifically, in order to solve the problem that current methods are difficult to identify small-scale objects, we first present a novel multi-scale fusion module (MSFM) to align high-level features and low-level features to learn multi-scale features. Additionally, a spatial attention module (SAM) utilizing transformer encoder is introduced to capture salient object features in images to enhance the model performance. Furthermore, we devise a feature enhancement and suppression module (FESM) with the aim of excavating potential useful features concealed by the most salient features. By suppressing the most salient features obtained in current SAM layer, and then forcing subsequent SAM layer to excavate potential salient features in fea-ture maps, FL-Tran model can learn various useful features more comprehensively. Extensive experiments on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets demonstrate that our proposed FL-Tran model outperforms current state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：16

共 50 条

[1] Multi-Label Image Classification by Feature Attention Network
Yan, Zheng
Liu, Weiwei
Wen, Shiping
Yang, Yin
[J]. IEEE ACCESS, 2019, 7 : 98005 - 98013
[2] Graph Attention Transformer Network for Multi-label Image Classification
Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[3] Label correlation guided discriminative label feature learning for multi-label chest image classification
Zhang, Kai
Liang, Wei
Cao, Peng
Liu, Xiaoli
Yang, Jinzhu
Zaiane, Osmar
[J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 245
[4] MULTIMODAL LEARNING FOR MULTI-LABEL IMAGE CLASSIFICATION
Pang, Yanwei
Ma, Zhao
Yuan, Yuan
Li, Xuelong
Wang, Kongqiao
[J]. 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 1797 - 1800
[5] Causal multi-label learning for image classification
Tian, Yingjie
Bai, Kunlong
Yu, Xiaotong
Zhu, Siyu
[J]. NEURAL NETWORKS, 2023, 167 : 626 - 637
[6] Multi-label Active Learning for Image Classification
Wu, Jian
Sheng, Victor S.
Zhang, Jing
Zhao, Pengpeng
Cui, Zhiming
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5227 - 5231
[7] Two-Stream Transformer for Multi-Label Image Classification
Zhu, Xuelin
Cao, Jiuxin
Ge, Jiawei
Liu, Weijia
Liu, Bo
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3598 - 3607
[8] DATran: Dual Attention Transformer for Multi-Label Image Classification
Zhou, Wei
Zheng, Zhijie
Su, Tao
Hu, Haifeng
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
[9] Modular Graph Transformer Networks for Multi-Label Image Classification
Nguyen, Hoang D.
Vu, Xuan-Son
Le, Duc-Trong
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9092 - 9100
[10] Multi-Label Active Learning with Label Correlation for Image Classification
Ye, Chen
Wu, Jian
Sheng, Victor S.
Zhao, Pengpeng
Cui, Zhiming
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3437 - 3441

← 1 2 3 4 5 →