Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval

被引:1
|
作者
Ma, Teng [1 ]
Organisciak, Daniel [2 ]
Ma, Wenbao [1 ]
Long, Yang [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Humanities & Social Sci, Xian 710049, Peoples R China
[2] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
[3] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
关键词
large visual language models; zero-shot instance retrieval; cognition alignment;
D O I
10.3390/electronics13091660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The pursuit of Artificial Intelligence (AI) that emulates human cognitive processes is a cornerstone of ethical AI development, ensuring that emerging technologies can seamlessly integrate into societal frameworks requiring nuanced understanding and decision-making. Zero-Shot Instance Retrieval (ZSIR) stands at the forefront of this endeavour, potentially providing a robust platform for AI systems, particularly large visual language models, to demonstrate and refine cognition-aligned learning without the need for direct experience. In this paper, we critically evaluate current cognition alignment methodologies within traditional zero-shot learning paradigms using visual attributes and word embedding generated by large AI models. We propose a unified similarity function that quantifies the cognitive alignment level, bridging the gap between AI processes and human-like understanding. Through extensive experimentation, our findings illustrate that this similarity function can effectively mirror the visual-semantic gap, steering the model towards enhanced performance in Zero-Shot Instance Retrieval. Our models achieve state-of-the-art performance on both the SUN (92.8% and 82.2%) and CUB datasets (59.92% and 48.82%) for bi-directional image-attribute retrieval accuracy. This work not only benchmarks the cognition alignment of AI but also sets a new precedent for the development of visual language models attuned to the complexities of human cognition.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [32] Examining Zero-Shot Vulnerability Repair with Large Language Models
    Pearce, Hammond
    Tan, Benjamin
    Ahmad, Baleegh
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
  • [33] Zero-Shot Visual Recognition via Bidirectional Latent Embedding
    Qian Wang
    Ke Chen
    International Journal of Computer Vision, 2017, 124 : 356 - 383
  • [34] Revisiting Large Language Models as Zero-shot Relation Extractors
    Li, Guozheng
    Wang, Peng
    Ke, Wenjun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6877 - 6892
  • [35] Zero-Shot Visual Recognition via Bidirectional Latent Embedding
    Wang, Qian
    Chen, Ke
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 124 (03) : 356 - 383
  • [36] Transductive Zero-Shot Learning via Visual Center Adaptation
    Wan, Ziyu
    Li, Yan
    Yang, Min
    Zhang, Junge
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10059 - 10060
  • [37] Zero-Shot Reward Specification via Grounded Natural Language
    Mahmoudieh, Parsa
    Pathak, Deepak
    Darrell, Trevor
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [38] Resolving Zero-Shot and Fact-Based Visual Question Answering via Enhanced Fact Retrieval
    Wu, Sen
    Zhao, Guoshuai
    Qian, Xueming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1790 - 1800
  • [39] Zero-shot autonomous robot manipulation via natural language
    Han, Changheon
    Lee, Jiho
    Lee, Hojun
    Sim, Yuseop
    Jeon, Jurim
    Jun, Martin Byung-Guk
    MANUFACTURING LETTERS, 2024, 42 : 16 - 20
  • [40] Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models
    Deng, Yinlin
    Xia, Chunqiu Steven
    Peng, Haoran
    Yang, Chenyuan
    Zhan, Lingming
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 423 - 435