Fine-grained knowledge about manipulable objects is well-predicted by contrastive language image pre-training

被引:0
|
作者
Walbrin, Jon [1 ,2 ]
Sossounov, Nikita [1 ,2 ]
Mahdiani, Morteza [3 ]
Vaz, Igor [1 ,2 ]
Almeida, Jorge [3 ]
机构
[1] Univ Coimbra, Fac Psychol & Educ Sci, Proact Lab, Coimbra, Portugal
[2] Univ Coimbra, Fac Psychol & Educ Sci, CINEICC, Coimbra, Portugal
[3] Mila Quebec AI Res Inst, Montreal, PQ, Canada
基金
欧洲研究理事会;
关键词
OCCIPITOTEMPORAL CORTEX; CATEGORY-SELECTIVITY; BRAIN; REPRESENTATIONS; RECOGNITION;
D O I
10.1016/j.isci.2024.110297
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Object recognition is an important ability that relies on distinguishing between similar objects (e.g., deciding which utensil(s) to use at different stages of meal preparation). Recent work describes the finegrained organization of knowledge about manipulable objects via the study of the constituent dimensions that are most relevant to human behavior, for example, vision, manipulation, and function -based properties. A logical extension of this work concerns whether or not these dimensions are uniquely human, or can be approximated by deep learning. Here, we show that behavioral dimensions are generally well -predicted by CLIP-ViT - a multimodal network trained on a large and diverse set of image -text pairs. Moreover, this model outperforms comparison networks pre -trained on smaller, image -only datasets. These results demonstrate the impressive capacity of CLIP-ViT to approximate fine-grained object knowledge. We discuss the possible sources of this benefit relative to other models (e.g., multimodal vs. image -only pretraining, dataset size, architecture).
引用
收藏
页数:17
相关论文
共 30 条
  • [1] CLIP-FG:SELECTING DISCRIMINATIVE IMAGE PATCHES BY CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING FOR FINE-GRAINED IMAGE CLASSIFICATION
    Yuan, Min
    Lv, Ningning
    Xie, Yufei
    Lu, Fuxiang
    Zhan, Kun
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 560 - 564
  • [2] CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
    Conde, Marcos, V
    Turgutlu, Kerem
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3951 - 3955
  • [3] Contrastive Language-Image Pre-Training with Knowledge Graphs
    Pan, Xuran
    Ye, Tianzhu
    Han, Dongchen
    Song, Shiji
    Huang, Gao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [4] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [5] FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training
    Han, Yunpeng
    Zhang, Lisai
    Chen, Qingcai
    Chen, Zhijian
    Li, Zhonghua
    Yang, Jianxin
    Cao, Zhao
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15028 - 15038
  • [6] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    [J]. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [7] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
    Lee, Janghyeon
    Kim, Jongsuk
    Shon, Hyounguk
    Kim, Bumsoo
    Kim, Seung Hwan
    Lee, Honglak
    Kim, Junmo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
    Chen, Xiaofei
    He, Yuting
    Xue, Cheng
    Ge, Rongjun
    Li, Shuo
    Yang, Guanyu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415
  • [9] Non-Contrastive Learning Meets Language-Image Pre-Training
    Zhou, Jinghao
    Dong, Li
    Gan, Zhe
    Wang, Lijuan
    Wei, Furu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
  • [10] iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
    Wei, Yixuan
    Cao, Yue
    Zhang, Zheng
    Peng, Houwen
    Yao, Zhuliang
    Xie, Zhenda
    Hue, Han
    Guo, Baining
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2776 - 2786