Fine-Grained Image Classification Algorithm Using Multi-Scale Feature Fusion and Re-Attention Mechanism

被引：1

作者：

He K. ^{[1
]}

Feng X. ^{[1
]}

Gao S. ^{[1
]}

Ma X. ^{[1
]}

机构：

[1] School of Electrical and Information Engineering, Tianjin University, Tianjin

来源：

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology | 2020年 / 53卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Fine-grained image classification; Multi-scale feature fusion; Re-attention mechanism; ResNet50;

D O I：

10.11784/tdxbz201910029

中图分类号：

学科分类号：

摘要：

Fine-grained image classification aims to precisely classify an image subclass under a certain category. Hence, it has become a commonand difficult point in the field of computer vision and pattern recognition and has important research value due to its similar features, different gestures, and background interference. The key issue in fine-grained image classification is how to extract precise features from the discriminative region of an image. Existing algorithms based on neural networks are still insufficient in fine feature extraction. Accordingly, a fine-grained image classification algorithm using multi-scale re-attention mechanism is proposed in this study. Considering that high- and low-level features have rich semantic and texture information, respectively, attention mechanism is embedded in different scales to obtain rich feature information. In addition, an input feature map is processed with both channel and spatial attention, which can be regarded as the re-attention of a feature matrix. Finally, using the residual form to combine the attention results and original input feature maps, the attention results on the feature maps of different scales are concatenated and fed into the full connection layer. Thus, accurately extracting salient features is helpful. Accuracy rates of 86.16%, 92.26%, and 93.40% are obtained on the international public fine-grained datasets(CUB-200-2011, FGVC Aircraft, and Stanford Cars). Compared with ResNet50, the accuracy rate is increased by 1.66%, 1.46%, and 1.10%, respectively. It is obviously higher than that of existing classical algorithms and human performance, which demonstrate the effectiveness of the proposed algorithm. © 2020, Editorial Board of Journal of Tianjin University(Science and Technology). All right reserved.

引用

页码：1077 / 1085

页数：8

共 32 条

[11] Yao Bangpeng, Bradski G, Li Feifei, A codebook-free and annotation-free approach for fine-grained image categorization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466-3473, (2012)
[12] Berg T, Belhumeur P N., POOF: Part-based one-vs.-one features for fine-grained categorization, face verifi-cation, and attribute estimation, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 955-962, (2013)
[13] Donahue J, Jia Yangqing, Vinyals O, Et al., DeCAF: A deep convolutional activation feature for generic visual recognition, 31st International Conference on Machine Learning, pp. 988-996, (2014)
[14] Branson S, van Horn G, Belongie S, Et al., Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
[15] Zhang N, Donahue J, Girshick R, Et al., Part-based R-CNNs for fine-grained category detection, 13th European Conferenceon Computer Vision, pp. 834-849, (2014)
[16] Xiao Tianjun, Xu Yichong, Yang Kuiyuan, Et al., The application of two-level attention models in deep convolutional neural network for fine-grained image classification, 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 842-850, (2015)
[17] Cui Yin, Zhou Feng, Wang Jiang, Et al., Kernel pooling for convolutional neural networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3049-3058, (2017)
[18] Lin T Y, Roychowdhury A, Maji S., Bilinear CNN models for fine-grained visual recognition, 2015 IEEE International Conference on Computer Vision, pp. 1449-1457, (2015)
[19] Jaderberg M, Simonyan K, Zisserman A, Et al., Spatial transformer networks, 29th Annual Conference on Neural Information Processing Systems, pp. 2017-2025, (2015)
[20] Ji Zhong, Zhao Kexin, Zhang Suoping, Et al., Classification of fine-grained fish images based on spatial transformation bilinear networks, Journal of Tianjin University: Science and Technology, 52, 5, pp. 475-482, (2019)

← 1 2 3 4 →