Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval

被引:1
|
作者
Ma, Teng [1 ]
Organisciak, Daniel [2 ]
Ma, Wenbao [1 ]
Long, Yang [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Humanities & Social Sci, Xian 710049, Peoples R China
[2] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
[3] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
关键词
large visual language models; zero-shot instance retrieval; cognition alignment;
D O I
10.3390/electronics13091660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The pursuit of Artificial Intelligence (AI) that emulates human cognitive processes is a cornerstone of ethical AI development, ensuring that emerging technologies can seamlessly integrate into societal frameworks requiring nuanced understanding and decision-making. Zero-Shot Instance Retrieval (ZSIR) stands at the forefront of this endeavour, potentially providing a robust platform for AI systems, particularly large visual language models, to demonstrate and refine cognition-aligned learning without the need for direct experience. In this paper, we critically evaluate current cognition alignment methodologies within traditional zero-shot learning paradigms using visual attributes and word embedding generated by large AI models. We propose a unified similarity function that quantifies the cognitive alignment level, bridging the gap between AI processes and human-like understanding. Through extensive experimentation, our findings illustrate that this similarity function can effectively mirror the visual-semantic gap, steering the model towards enhanced performance in Zero-Shot Instance Retrieval. Our models achieve state-of-the-art performance on both the SUN (92.8% and 82.2%) and CUB datasets (59.92% and 48.82%) for bi-directional image-attribute retrieval accuracy. This work not only benchmarks the cognition alignment of AI but also sets a new precedent for the development of visual language models attuned to the complexities of human cognition.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models
    Klein, Tassilo
    Nabi, Moin
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8737 - 8743
  • [22] CLAREL: Classification via retrieval loss for zero-shot learning
    Oreshkin, Boris N.
    Rostamzadeh, Negar
    Pinheiro, Pedro O.
    Pal, Christopher
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3989 - 3993
  • [23] Zero-shot Visual Question Answering with Language Model Feedback
    Du, Yifan
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9268 - 9281
  • [24] Visual Language Based Succinct Zero-Shot Object Detection
    Zheng, Ye
    Huang, Xi
    Cui, Li
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5410 - 5418
  • [25] Towards Zero-Shot Knowledge Distillation for Natural Language Processing
    Rashid, Ahmad
    Lioutas, Vasileios
    Ghaddar, Abbas
    Rezagholizadeh, Mehdi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6551 - 6561
  • [26] Extensible Prompts for Language Models on Zero-shot Language Style Customization
    Ge, Tao
    Hu, Jing
    Dong, Li
    Mao, Shaoguang
    Xia, Yan
    Wang, Xun
    Chen, Si-Qing
    Wei, Furu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Towards Visual Explainable Active Learning for Zero-Shot Classification
    Jia, Shichao
    Li, Zeyu
    Chen, Nuo
    Zhang, Jiawan
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 791 - 801
  • [28] Ambiguous Learning from Retrieval: Towards Zero-shot Semantic Parsing
    Wu, Shan
    Xin, Chunlei
    Lin, Hongyu
    Han, Xianpei
    Liu, Cao
    Chen, Jiansong
    Yang, Fan
    Wan, Guanglu
    Sun, Le
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14081 - 14094
  • [29] Large Language Models are Zero-Shot Rankers for Recommender Systems
    Hou, Yupeng
    Zhang, Junjie
    Lin, Zihan
    Lu, Hongyu
    Xie, Ruobing
    McAuley, Julian
    Zhao, Wayne Xin
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 364 - 381
  • [30] Large Language Models Are Zero-Shot Time Series Forecasters
    Gruver, Nate
    Finzi, Marc
    Qiu, Shikai
    Wilson, Andrew Gordon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,