Multilayer feature descriptors fusion CNN models for fine-grained visual recognition

被引:5
|
作者
Hou, Yong [1 ]
Luo, Hangzai [1 ]
Zhao, Wanqing [1 ]
Zhang, Xiang [1 ]
Wang, Jun [1 ]
Peng, Jinye [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Shaanxi, Peoples R China
关键词
convolutional neural network; deep learning; dimensionality reduction; fine-grained image classification; multilayer feature descriptors;
D O I
10.1002/cav.1897
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fine-grained image classification is a challenging topic in the field of computer vision. General models based on first-order local features cannot achieve acceptable performance because the features are not so efficient in capturing fine-grained difference. A bilinear convolutional neural network (CNN) model exhibits that a second-order statistical feature is more efficient in capturing fine-grained difference than a first-order local feature. However, this framework only considers the extraction of a second-order feature descriptor, using a single convolutional layer. The potential effective classification features of other convolutional layers are ignored, resulting in loss of recognition accuracy. In this paper, a multilayer feature descriptors fusion CNN model is proposed. It fully considers the second-order feature descriptors and the first-order local feature descriptor generated by different layers. Experimental verification was carried out on fine-grained classification benchmark data sets, CUB-200-2011, Stanford Cars, and FGVC-aircraft. Compared with the bilinear CNN model, the proposed method has improved accuracy by 0.8%, 1.1%, and 5.5%. Compared with the compact bilinear pooling model, there is an accuracy increase of 0.64%, 1.63%, and 1.45%, respectively. In addition, the proposed model effectively uses multiple 1x1 convolution kernels to reduce dimension. The experimental results show that the multilayer low-dimensional second-order feature descriptors fusion model has comparable recognition accuracy of the original model.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Bilinear CNN Models for Fine-grained Visual Recognition
    Lin, Tsung-Yu
    RoyChowdhury, Aruni
    Maji, Subhransu
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1449 - 1457
  • [2] Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition
    Ge, Shu-Yu
    Gao, Zi-Lin
    Zhang, Bing-Bing
    Li, Pei-Hua
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (10): : 2134 - 2141
  • [3] Multi-Scale Feature Fusion of Covariance Pooling Networks for Fine-Grained Visual Recognition
    Qian, Lulu
    Yu, Tan
    Yang, Jianyu
    [J]. SENSORS, 2023, 23 (08)
  • [4] Convolutionally Enhanced Feature Fusion Visual Transformer for Fine-Grained Visual Classification
    Huang, Min
    Zhu, Saixing
    Wang, Zehua
    Qu, Shuanghong
    [J]. 2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 447 - 452
  • [5] Multilayer feature fusion with parallel convolutional block for fine-grained image classification
    Wang, Lei
    He, Kai
    Feng, Xu
    Ma, Xitao
    [J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 2872 - 2883
  • [6] Multilayer feature fusion with parallel convolutional block for fine-grained image classification
    Lei Wang
    Kai He
    Xu Feng
    Xitao Ma
    [J]. Applied Intelligence, 2022, 52 : 2872 - 2883
  • [7] Hierarchical Joint CNN-Based Models for Fine-Grained Cars Recognition
    Liu, Maolin
    Yu, Chengyue
    Ling, Hefei
    Lei, Jie
    [J]. CLOUD COMPUTING AND SECURITY, ICCCS 2016, PT II, 2016, 10040 : 337 - 347
  • [8] Transformer-based descriptors with fine-grained region supervisions for visual place recognition
    Wang, Yuwei
    Qiu, Yuanying
    Cheng, Peitao
    Zhang, Junyu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [9] AMLNet: Attention Multibranch Loss CNN Models for Fine-Grained Vehicle Recognition
    Lu, Hongchun
    Han, Min
    Wang, Chaoqing
    Cheng, Junlong
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (01) : 375 - 384
  • [10] Fine-Grained Visual Categorization: A Spatial-Frequency Feature Fusion Perspective
    Wang, Min
    Zhao, Peng
    Lu, Xin
    Min, Fan
    Wang, Xizhao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 2798 - 2812