Learning Deep Representations of Fine-Grained Visual Descriptions

被引:529
|
作者
Reed, Scott [1 ]
Akata, Zeynep [2 ]
Lee, Honglak [1 ]
Schiele, Bernt [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Max Planck Inst Informat, Saarbrucken, Germany
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information. In these formulations the current best complement to visual features are attributes: manually-encoded vectors describing shared characteristics among categories. Despite good performance, attributes have limitations: (1) finer-grained recognition requires commensurately more attributes, and (2) attributes do not provide a natural language interface. We propose to overcome these limitations by training neural language models from scratch; i.e. without pre-training and only consuming words and characters. Our proposed models train end-to-end to align with the fine-grained and category-specific content of images. Natural language provides a flexible and compact way of encoding only the salient visual aspects for distinguishing categories. By training on raw text, our model can do inference on raw text as well, providing humans a familiar mode both for annotation and retrieval. Our model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero-shot classification on the Caltech-UCSD Birds 200-2011 dataset.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [21] Fine-Grained Visual Entailment
    Thomas, Christopher
    Zhang, Yipeng
    Chang, Shih-Fu
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 398 - 416
  • [22] Fine-Grained Visual Prompting
    Yang, Lingfeng
    Wang, Yueze
    Li, Xiang
    Wang, Xinlong
    Yang, Jian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] A Deep Sparse Coding Method for Fine-Grained Visual Categorization
    Guo, Lihua
    Guo, Chenggang
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 632 - 639
  • [24] Improve Fine-Grained Feature Learning in Fine-Grained DataSet GAI
    Wang, Hai Peng
    Geng, Zhi Qing
    IEEE ACCESS, 2025, 13 : 12777 - 12788
  • [25] A deep learning based fine-grained classification algorithm for grading of visual impairment in cataract patients
    Jiang, Jiewei
    Zhang, Yi
    Xie, He
    Yang, Jingshi
    Gong, Jiamin
    Li, Zhongwen
    OPTOELECTRONICS LETTERS, 2024, 20 (01) : 48 - 57
  • [26] Deep learning-based fine-grained car make/model classification for visual surveillance
    Gundogdu, Erhan
    Parildi, Enes Sinan
    Solmaz, Berkan
    Yucesoy, Veysel
    Koc, Aykut
    COUNTERTERRORISM, CRIME FIGHTING, FORENSICS, AND SURVEILLANCE TECHNOLOGIES, 2017, 10441
  • [27] A deep learning based fine-grained classification algorithm for grading of visual impairment in cataract patients
    Jiewei Jiang
    Yi Zhang
    He Xie
    Jingshi Yang
    Jiamin Gong
    Zhongwen Li
    Optoelectronics Letters, 2024, 20 : 48 - 57
  • [28] Learning Hierarchal Channel Attention for Fine-grained Visual Classification
    Guan, Xiang
    Wang, Guoqing
    Xu, Xing
    Bin, Yi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5011 - 5019
  • [29] Universal Fine-Grained Visual Categorization by Concept Guided Learning
    Bi, Qi
    Zhou, Beichen
    Ji, Wei
    Xia, Gui-Song
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 394 - 409
  • [30] Cross-X Learning for Fine-Grained Visual Categorization
    Luo, Wei
    Yang, Xitong
    Mo, Xianjie
    Lu, Yuheng
    Davis, Larry S.
    Li, Jun
    Yang, Jian
    Lim, Ser-Nam
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8241 - 8250