A coarse-to-fine capsule network for fine-grained image categorization

被引:7
|
作者
Lin, Zhongqi [1 ,2 ]
Jia, Jingdun [2 ]
Huang, Feng [3 ]
Gao, Wanlin [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr & Rural Affairs, Key Lab Agr Informatizat Standardizat, Beijing 100083, Peoples R China
[3] China Agr Univ, Coll Sci, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Capsule network (CapsNet); Fine-grained image classification; Coarse-to-fine attention; Increasingly specialized perception; MODEL;
D O I
10.1016/j.neucom.2021.05.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image categorization is challenging due to the subordinate categories within an entry-level category can only be distinguished by subtle discriminations. This necessitates localizing key (most dis-criminative) regions and extract domain-specific features alternately. Existing methods predominantly realize fine-grained categorization independently, while ignoring that representation learning and fore-ground localization can reinforce each other iteratively. Sharing the state-of-the-art performance of cap-sule encoding for abstract semantic representation, we formalize our pipeline as a coarse-to-fine capsule network (CTF-CapsNet). It consists of customized expert CapsNets arranged in each perception scale and region proposal networks (RPNs) between two adjacent scales. Their mutually motivated self-optimization can achieve increasingly specialized cross-utilization of object-level and component-level descriptions. The RPN zooms the areas to turn the attention to the most distinctive regions by concerning preceding informations learned by expert CapsNet for references, whilst a finer-scale model takes as feed an amplified attended patch from last scale. Overall, CTF-CapsNet is driven by three focal margin losses between label prediction and ground truth, and three regeneration losses between original input images/ feature maps and reconstructed images. Experiments demonstrate that without any prior knowledge or strongly-supervised supports (e.g., bounding-box/part annotations), CTF-CapsNet can deliver competitive categorization performance among state-of-the-arts, i.e., testing accuracy achieves 89.57%, 88.63%, 90.51%, and 91.53% on our hand-crafted rice growth image set and three public benchmarks, i.e., CUB Birds, Stanford Dogs, and Stanford Cars, respectively. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:200 / 219
页数:20
相关论文
共 50 条
  • [1] A coarse-to-fine capsule network for fine-grained image categorization
    College of Information and Electrical Engineering, China Agricultural University, Beijing
    100083, China
    不详
    100083, China
    不详
    100083, China
    Neurocomputing, 1600, (200-219):
  • [2] Coarse-to-Fine Description for Fine-Grained Visual Categorization
    Yao, Hantao
    Zhang, Shiliang
    Zhang, Yongdong
    Li, Jintao
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4858 - 4872
  • [3] EXPLOITING COARSE-TO-FINE MECHANISM FOR FINE-GRAINED RECOGNITION
    Wang, Yongzhong
    Zhang, Xu-Yao
    Zhang, Yanming
    Hou, Xinwen
    Liu, Cheng-Lin
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 649 - 653
  • [4] A Survey of Fine-Grained Image Categorization
    Zheng, Min
    Li, Qingyong
    Geng, Yangli-ao
    Yu, Haomin
    Wang, Jianzhu
    Gan, Jinrui
    Xue, Wenyuan
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 533 - 538
  • [5] Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture
    Fang, Jie
    Zhou, Yu
    Yu, Yao
    Du, Sidan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (07) : 1782 - 1792
  • [6] Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis
    Ni, Bingbing
    Paramathayalan, Vignesh R.
    Li, Teng
    Moulin, Pierre
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (01) : 28 - 43
  • [7] Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis
    Bingbing Ni
    Vignesh R. Paramathayalan
    Teng Li
    Pierre Moulin
    International Journal of Computer Vision, 2016, 120 : 28 - 43
  • [8] Coarse-to-Fine Grained Classification
    Huo, Yuqi
    Lu, Yao
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1033 - 1036
  • [9] PFNet: A Novel Part Fusion Network for Fine-grained Image Categorization
    Guo, Jinlin
    Liang, Jingyun
    Bai, Liang
    Lao, Songyang
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [10] A coarse-to-fine full attention guided capsule network for medical image segmentation
    Wan, Jingjing
    Yue, Suyang
    Ma, Juan
    Ma, Xinggang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 76