Learning Deep Representations of Fine-Grained Visual Descriptions

被引:529
|
作者
Reed, Scott [1 ]
Akata, Zeynep [2 ]
Lee, Honglak [1 ]
Schiele, Bernt [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Max Planck Inst Informat, Saarbrucken, Germany
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually-encoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch; i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and category-specific content of images. Natural language provides a flexible and compact way of encoding only the salient visual aspects for distinguishing categories. By training on raw text, our model can do inference on raw text as well, providing humans a familiar mode both for annotation and retrieval. Our model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero-shot classification on the Caltech-UCSD Birds 200-2011 dataset.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [1] Fine-Grained Visual Computing Based on Deep Learning
    Lv, Zhihan
    Qiao, Liang
    Singh, Amit Kumar
    Wang, Qingjun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [2] A Survey of Fine-Grained Visual Categorization Based on Deep Learning
    Xie, Yuxiang
    Gong, Quanzhi
    Luan, Xidao
    Yan, Jie
    Zhang, Jiahui
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1337 - 1356
  • [3] A survey of fine-grained visual categorization based on deep learning
    XIE Yuxiang
    GONG Quanzhi
    LUAN Xidao
    YAN Jie
    ZHANG Jiahui
    Journal of Systems Engineering and Electronics, 2024, 35 (06) : 1337 - 1356
  • [4] A survey of fine-grained visual categorization based on deep learning
    Xie Yuxiang
    Gong Quanzhi
    Luan Xidao
    Yan Jie
    Zhang Jiahui
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2023,
  • [5] StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization
    He, Xiangteng
    Peng, Yuxin
    Zhao, Junjie
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 741 - 747
  • [6] Fine-Grained Visual Comparisons with Local Learning
    Yu, Aron
    Grauman, Kristen
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 192 - 199
  • [7] Learning sequentially diversified representations for fine-grained categorization
    Zhang, Lianbo
    Huang, Shaoli
    Liu, Wei
    PATTERN RECOGNITION, 2022, 121
  • [8] Coping with change: Learning invariant and minimum sufficient representations for fine-grained visual categorization
    Ye, Shuo
    Yu, Shujian
    Hou, Wenjin
    Wang, Yu
    You, Xinge
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [9] Fine-Grained Visual-Textual Representation Learning
    He, Xiangteng
    Peng, Yuxin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (02) : 520 - 531
  • [10] Adaptive Destruction Learning for Fine-grained Visual Classification
    Zhang, Riheng
    Tan, Min
    Mao, Xiaoyang
    Gao, Zhigang
    Gu, Xiaoling
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 946 - 950