Multi-level information fusion Transformer with background filter for fine-grained image recognition

被引：0

作者：

Yu, Ying ^{[1
,2
]}

Wang, Jinghui ^{[2
]}

Pedrycz, Witold ^{[3
]}

Miao, Duoqian ^{[4
]}

Qian, Jin ^{[2
]}

机构：

[1] East China Jiaotong Univ, State Key Lab Performance Monitoring & Protecting, Nanchang 330013, Jiangxi, Peoples R China

[2] East China Jiaotong Univ, Sch Software, Nanchang 330013, Jiangxi, Peoples R China

[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2G7, Canada

[4] Tongji Univ, Sch Elect & Informat Engn, Shanghai, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 17-18期

基金：

中国国家自然科学基金;

关键词：

Fine-grained image recognition; Vision Transformer; Multi-level information; Information fusion;

D O I：

10.1007/s10489-024-05584-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Compared to traditional image recognition, Fine-Grained Image Recognition (FGIR) faces significant challenges due to the subtle distinctions among different categories and the notable variances within the same category. Furthermore, the complexity of backgrounds and the extraction of discriminative features limited to small local regions further exacerbate the difficulty. Recently, several studies have demonstrated the effectiveness of the Vision Transformer (ViT) in FGIR. However, these investigations have frequently overlooked critical information embedded within class tokens across different layers, while also neglecting the subtle local details hidden within patch tokens. To address these issues and enhance FGIR performance, we introduce a novel ViT-based network architecture MIFBF. The proposed model builds upon ViT by incorporating three modules: Complementary Class Tokens Combination module (CCTC), Patches Information Integration module (PII), and Attention Cropping Module (ACM). The CCTC module integrates multi-layer class tokens to capture complementary information, thereby enhancing the model's representational capacity. The PII module delves into the rich local details encoded in patch tokens to improve classification accuracy. The ACM module generates regions of interest based on ViT's self-attention weights and effectively filters background noise, thereby directing the model's attention to the most relevant image areas. Experiments conducted on three different datasets validate the effectiveness of the proposed model, yielding state-of-the-art results and highlighting its superiority in FGIR tasks.

引用

页码：8108 / 8119

页数：12

共 50 条

[1] Multi-level network based on transformer encoder for fine-grained image–text matching
Lei Yang
Yong Feng
Mingliang Zhou
Xiancai Xiong
Yongheng Wang
Baohua Qiang
[J]. Multimedia Systems, 2023, 29 : 1981 - 1994
[2] From coarse to fine: multi-level feature fusion network for fine-grained image retrieval
Wang, Shijie
Wang, Zhihui
Wang, Ning
Wang, Hong
Li, Haojie
[J]. MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1515 - 1528
[3] From coarse to fine: multi-level feature fusion network for fine-grained image retrieval
Shijie Wang
Zhihui Wang
Ning Wang
Hong Wang
Haojie Li
[J]. Multimedia Systems, 2022, 28 : 1515 - 1528
[4] MFF-Trans: Multi-level Feature Fusion Transformer for Fine-Grained Visual Classification
Hang, Qi
Yan, Xuefeng
Gong, Lina
[J]. WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 220 - 234
[5] Multi-level network based on transformer encoder for fine-grained image-text matching
Yang, Lei
Feng, Yong
Zhou, Mingliang
Xiong, Xiancai
Wang, Yongheng
Qiang, Baohua
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1981 - 1994
[6] Fine-grained Image Caption based on Multi-level Attention
Yang Zhenyu
Zhang Jiao
[J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 72 - 78
[7] Fine-grained image recognition via trusted multi-granularity information fusion
Yu, Ying
Tang, Hong
Qian, Jin
Zhu, Zhiliang
Cai, Zhen
Lv, Jingqin
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1105 - 1117
[8] Fine-grained image recognition via trusted multi-granularity information fusion
Ying Yu
Hong Tang
Jin Qian
Zhiliang Zhu
Zhen Cai
Jingqin Lv
[J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 1105 - 1117
[9] Hybrid Granularities Transformer for Fine-Grained Image Recognition
Yu, Ying
Wang, Jinghui
[J]. ENTROPY, 2023, 25 (04)
[10] Multi-Stage Training with Multi-Level Knowledge Self-Distillation for Fine-Grained Image Recognition
Yu, Ying
Wei, Wei
Tang, Hong
Qian, Jin
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (08): : 1834 - 1845

← 1 2 3 4 5 →