Bilinear CNN Models for Fine-grained Visual Recognition

被引:1415
|
作者
Lin, Tsung-Yu [1 ]
RoyChowdhury, Aruni [1 ]
Maji, Subhransu [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/ICCV.2015.170
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only. Using networks initialized from the ImageNet dataset followed by domain specific fine-tuning we obtain 84.1% accuracy of the CUB-200-2011 dataset requiring only category labels at training time. We present experiments and visualizations that analyze the effects of fine-tuning and the choice two networks on the speed and accuracy of the models. Results show that the architecture compares favorably to the existing state of the art on a number of fine-grained datasets while being substantially simpler and easier to train. Moreover, our most accurate model is fairly efficient running at 8 frames/sec on a NVIDIA Tesla K40 GPU. The source code for the complete system will be made available at http://vis-www.cs.umass.edu/bcnn
引用
收藏
页码:1449 / 1457
页数:9
相关论文
共 50 条
  • [1] Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition
    Ge, Shu-Yu
    Gao, Zi-Lin
    Zhang, Bing-Bing
    Li, Pei-Hua
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (10): : 2134 - 2141
  • [2] Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
    Yu, Chaojian
    Zhao, Xinyi
    Zheng, Qi
    Zhang, Peng
    You, Xinge
    [J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 595 - 610
  • [3] Multilayer feature descriptors fusion CNN models for fine-grained visual recognition
    Hou, Yong
    Luo, Hangzai
    Zhao, Wanqing
    Zhang, Xiang
    Wang, Jun
    Peng, Jinye
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [4] Fine-grained Mushroom Phenotype Recognition Based on Transfer Learning and Bilinear CNN
    Yuan, Peisen
    Shen, Chengji
    Xu, Huanliang
    [J]. Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (07): : 151 - 158
  • [5] w Bilinear Convolutional Neural Networks for Fine-Grained Visual Recognition
    Lin, Tsung-Yu
    RoyChowdhury, Aruni
    Maji, Subhransu
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1309 - 1322
  • [6] Text-Embedded Bilinear Model for Fine-Grained Visual Recognition
    Sun, Liang
    Guan, Xiang
    Yang, Yang
    Zhang, Lei
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 211 - 219
  • [7] Semantic Bilinear Pooling for Fine-Grained Recognition
    Li, Xinjie
    Yang, Chun
    Chen, Song-Lu
    Zhu, Chao
    Yin, Xu-Cheng
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3660 - 3666
  • [8] Semantic bilinear pooling for fine-grained recognition
    School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
    [J]. Proc. Int. Conf. Pattern Recognit., (3660-3666):
  • [9] Fine-Grained Intoxicated Gait Classification Using a Bilinear CNN
    Li, Ruojun
    Agu, Emmanuel
    Sarwar, Atifa
    Grimone, Kristin
    Herman, Debra
    Abrantes, Ana M.
    Stein, Michael D.
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (23) : 29733 - 29748
  • [10] Squeezed Bilinear Pooling for Fine-Grained Visual Categorization
    Liao, Qiyu
    Wang, Dadong
    Holewa, Hamish
    Xu, Min
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 728 - 732