Learning Deep Representations of Fine-Grained Visual Descriptions

被引:529
|
作者
Reed, Scott [1 ]
Akata, Zeynep [2 ]
Lee, Honglak [1 ]
Schiele, Bernt [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Max Planck Inst Informat, Saarbrucken, Germany
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually-encoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch; i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and category-specific content of images. Natural language provides a flexible and compact way of encoding only the salient visual aspects for distinguishing categories. By training on raw text, our model can do inference on raw text as well, providing humans a familiar mode both for annotation and retrieval. Our model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero-shot classification on the Caltech-UCSD Birds 200-2011 dataset.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [31] A deep learning based fine-grained classification algorithm for grading of visual impairment in cataract patients
    JIANG Jiewei
    ZHANG Yi
    XIE He
    YANG Jingshi
    GONG Jiamin
    LI Zhongwen
    Optoelectronics Letters, 2024, 20 (01) : 48 - 57
  • [32] To Know and To Learn About the Integration of Knowledge Representation and Deep Learning for Fine-Grained Visual Categorization
    Setti, Francesco
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2018), VOL 5: VISAPP, 2018, : 387 - 392
  • [33] Learning Mutually Exclusive Part Representations for Fine-Grained Image Classification
    Wang, Chuanming
    Fu, Huiyuan
    Ma, Huadong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3113 - 3124
  • [34] Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations
    Zeng, Sihang
    Yuan, Zheng
    Yu, Sheng
    PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 91 - 96
  • [35] Plenty is Plague: Fine-Grained Learning for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Sun, Xiaoshuai
    Su, Jinsong
    Meng, Deyu
    Gao, Yue
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 697 - 709
  • [36] Phenological visual rhythms: Compact representations for fine-grained plant species identification
    Almeida, Jurandy
    dos Santos, Jefersson A.
    Alberton, Bruna
    Morellato, Leonor Patricia C.
    Torres, Ricardo da S.
    PATTERN RECOGNITION LETTERS, 2016, 81 : 90 - 100
  • [37] Fine-Grained Classification of Hyperspectral Imagery Based on Deep Learning
    Chen, Yushi
    Huang, Lingbo
    Zhu, Lin
    Yokoya, Naoto
    Jia, Xiuping
    REMOTE SENSING, 2019, 11 (22)
  • [38] VenueNet: Fine-Grained Venue Discovery by Deep Correlation Learning
    Yu, Yi
    Tang, Suhua
    Aizawa, Kiyoharu
    Aizawa, Akiko
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 288 - 291
  • [39] An Interactive Deep Learning Method For Fine-grained Image Classification
    Luo, Liumin
    Wang, Mingxia
    Liu, Xiaoqing
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2025, 28 (04): : 701 - 708
  • [40] A model for fine-grained vehicle classification based on deep learning
    Yu, Shaoyong
    Wu, Yun
    Li, Wei
    Song, Zhijun
    Zeng, Wenhua
    NEUROCOMPUTING, 2017, 257 : 97 - 103