Improving Fine-Grained Image Classification With Multimodal Information

被引：2

作者：

Xu, Jie ^{[1
]}

Zhang, Xiaoqian ^{[1
]}

Zhao, Changming ^{[2
]}

Geng, Zili ^{[1
]}

Feng, Yuren ^{[1
]}

Miao, Ke ^{[1
]}

Li, Yunji ^{[2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[2] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Image classification; Visualization; Data mining; Birds; Spatiotemporal phenomena; Fuses; Multimodal information; fine-grained image classification; multi-temporal feature fusion; self-attention; dynamic MLP; NETWORK;

D O I：

10.1109/TMM.2023.3291819

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Fine-grained image datasets have small inter-class differences and large intra-class differences, which is a difficulty of the fine-grained image classification. Traditional fine-grained image classification methods only focus on the visual features of images. However, this limitation can be eliminated when these methods are improved with multimodal information. This paper proposes an improved fine-grained image classification method with multimodal information that includes multimodal data preprocessing, multimodal feature extraction, multi-temporal feature fusion and decision correction. The preprocessing method proposed solves the problems of scattered distribution, difficult processing and uneven contribution to prediction of multimodal data through normalization, packing phrases and weighted concatenating methods. When extracting multimodal features, the SAMLP (Self-Attention MLP) module proposed combines self-attention with MLP to capture the internal correlation of multimodal information. The multi-temporal feature fusion proposed is divided into early feature fusion and late feature fusion. The former refers to adding multimodal information markers to the original image, and the latter refers to designing a multi-cascade dynamic MLP structure to fuse visual features and multimodal features. In view of the limitation of feature fusion, a decision strategy is proposed to revise the prediction results of fused features according to the prediction results of multimodal features. Ablation experiment on INAT18-1K and INAT21-1K datasets shows that our method is effective in improving classification with multimodal information. Experiments on the INAT2021_mini large dataset show that the comprehensive method in this article has higher accuracy and negligible efficiency loss compared with the state-of-the-art method.

引用

页码：2082 / 2095

页数：14

共 50 条

[21] Adversarial erasing attention for fine-grained image classification
Ji, Jinsheng
Jiang, Linfeng
Zhang, Tao
Zhong, Weilin
Xiong, Huilin
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (15) : 22867 - 22889
[22] Exploiting spatial relation for fine-grained image classification
Qi, Lei
Lu, Xiaoqiang
Li, Xuelong
PATTERN RECOGNITION, 2019, 91 : 47 - 55
[23] Survey of Vision Transformer in Fine-Grained Image Classification
Sun, Lulu
Liu, Jianping
Wang, Jian
Xing, Jialu
Zhang, Yue
Wang, Chenyang
Computer Engineering and Applications, 60 (10): : 30 - 46
[24] Robust fine-grained image classification with noisy labels
Tan, Xinxing
Dong, Zemin
Zhao, Hualing
VISUAL COMPUTER, 2022, 39 (11): : 5637 - 5650
[25] Application of Image Classification for Fine-Grained Nudity Detection
Ion, Cristian
Minea, Cristian
ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 3 - 15
[26] A Fine-Grained Image Classification Method Built on MobileViT
Lu, Zhengqiu
Wang, Haiying
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (06)
[27] Aggregate attention module for fine-grained image classification
Wang, Xingmei
Shi, Jiahao
Fujita, Hamido
Zhao, Yilin
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (7) : 8335 - 8345
[28] Grouping Bilinear Pooling for Fine-Grained Image Classification
Zeng, Rui
He, Jingsong
APPLIED SCIENCES-BASEL, 2022, 12 (10):
[29] Robust fine-grained image classification with noisy labels
Xinxing Tan
Zemin Dong
Hualing Zhao
The Visual Computer, 2023, 39 : 5637 - 5650
[30] Pre-Processing for Fine-Grained Image Classification
Ge, Hao
Yang, Feng
Tu, Xiaoguang
Xie, Mei
Ma, Zheng
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (08): : 1938 - 1942

← 1 2 3 4 5 →