Improving Fine-Grained Image Classification With Multimodal Information

被引:1
|
作者
Xu, Jie [1 ]
Zhang, Xiaoqian [1 ]
Zhao, Changming [2 ]
Geng, Zili [1 ]
Feng, Yuren [1 ]
Miao, Ke [1 ]
Li, Yunji [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Image classification; Visualization; Data mining; Birds; Spatiotemporal phenomena; Fuses; Multimodal information; fine-grained image classification; multi-temporal feature fusion; self-attention; dynamic MLP; NETWORK;
D O I
10.1109/TMM.2023.3291819
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image datasets have small inter-class differences and large intra-class differences, which is a difficulty of the fine-grained image classification. Traditional fine-grained image classification methods only focus on the visual features of images. However, this limitation can be eliminated when these methods are improved with multimodal information. This paper proposes an improved fine-grained image classification method with multimodal information that includes multimodal data preprocessing, multimodal feature extraction, multi-temporal feature fusion and decision correction. The preprocessing method proposed solves the problems of scattered distribution, difficult processing and uneven contribution to prediction of multimodal data through normalization, packing phrases and weighted concatenating methods. When extracting multimodal features, the SAMLP (Self-Attention MLP) module proposed combines self-attention with MLP to capture the internal correlation of multimodal information. The multi-temporal feature fusion proposed is divided into early feature fusion and late feature fusion. The former refers to adding multimodal information markers to the original image, and the latter refers to designing a multi-cascade dynamic MLP structure to fuse visual features and multimodal features. In view of the limitation of feature fusion, a decision strategy is proposed to revise the prediction results of fused features according to the prediction results of multimodal features. Ablation experiment on INAT18-1K and INAT21-1K datasets shows that our method is effective in improving classification with multimodal information. Experiments on the INAT2021_mini large dataset show that the comprehensive method in this article has higher accuracy and negligible efficiency loss compared with the state-of-the-art method.
引用
下载
收藏
页码:2082 / 2095
页数:14
相关论文
共 50 条
  • [1] Exploring Misclassification Information for Fine-Grained Image Classification
    Wang, Da-Han
    Zhou, Wei
    Li, Jianmin
    Wu, Yun
    Zhu, Shunzhi
    SENSORS, 2021, 21 (12)
  • [2] A fine-grained image classification method based on information interaction
    Zhu, Shuo
    Zhang, Xukang
    Wang, Yu
    Wang, Zongyang
    Sun, Jiahao
    IET Image Processing, 2024, 18 (14) : 4852 - 4861
  • [3] Image local structure information learning for fine-grained visual classification
    Jin Lu
    Weichuan Zhang
    Yali Zhao
    Changming Sun
    Scientific Reports, 12
  • [4] Image local structure information learning for fine-grained visual classification
    Lu, Jin
    Zhang, Weichuan
    Zhao, Yali
    Sun, Changming
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [5] Fine-Grained Image Classification With Global Information and Adaptive Compensation Loss
    Wu, Qin
    Miao, Shuting
    Chai, Zhilei
    Guo, Guodong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 36 - 40
  • [6] Research on the Fine-grained Plant Image Classification
    Hu, Zhifeng
    Zhang, Yin
    Tan, Liang
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND INFORMATION TECHNOLOGY APPLICATIONS, 2016, 71 : 1307 - 1311
  • [7] Image Classification With Tailored Fine-Grained Dictionaries
    Shu, Xiangbo
    Tang, Jinhui
    Qi, Guo-Jun
    Li, Zechao
    Jiang, Yu-Gang
    Yan, Shuicheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (02) : 454 - 467
  • [8] Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification
    Wang, Fenglei
    Zhou, Hao
    Li, Shuohao
    Lei, Jun
    Zhang, Jun
    SYMMETRY-BASEL, 2020, 12 (09):
  • [9] Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
    Yang, Lingfeng
    Li, Xiang
    Song, Renjie
    Zhao, Borui
    Tao, Juntian
    Zhou, Shihao
    Liang, Jiajun
    Yang, Jian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10935 - 10944
  • [10] Improving Document Classification Using Fine-Grained Weights
    Song, Soo-Hwan
    Lee, Chang-Hwan
    CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 488 - 492