Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval

被引:1
|
作者
Ma, Teng [1 ]
Organisciak, Daniel [2 ]
Ma, Wenbao [1 ]
Long, Yang [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Humanities & Social Sci, Xian 710049, Peoples R China
[2] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
[3] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
关键词
large visual language models; zero-shot instance retrieval; cognition alignment;
D O I
10.3390/electronics13091660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The pursuit of Artificial Intelligence (AI) that emulates human cognitive processes is a cornerstone of ethical AI development, ensuring that emerging technologies can seamlessly integrate into societal frameworks requiring nuanced understanding and decision-making. Zero-Shot Instance Retrieval (ZSIR) stands at the forefront of this endeavour, potentially providing a robust platform for AI systems, particularly large visual language models, to demonstrate and refine cognition-aligned learning without the need for direct experience. In this paper, we critically evaluate current cognition alignment methodologies within traditional zero-shot learning paradigms using visual attributes and word embedding generated by large AI models. We propose a unified similarity function that quantifies the cognitive alignment level, bridging the gap between AI processes and human-like understanding. Through extensive experimentation, our findings illustrate that this similarity function can effectively mirror the visual-semantic gap, steering the model towards enhanced performance in Zero-Shot Instance Retrieval. Our models achieve state-of-the-art performance on both the SUN (92.8% and 82.2%) and CUB datasets (59.92% and 48.82%) for bi-directional image-attribute retrieval accuracy. This work not only benchmarks the cognition alignment of AI but also sets a new precedent for the development of visual language models attuned to the complexities of human cognition.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
    Lu, Ming Y.
    Chen, Bowen
    Zhang, Andrew
    Williamson, Drew F. K.
    Chen, Richard J.
    Ding, Tong
    Le, Long Phi
    Chuang, Yung-Sung
    Mahmood, Faisal
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19764 - 19775
  • [2] Zero-shot learning via visual-semantic aligned autoencoder
    Wei, Tianshu
    Huang, Jinjie
    Jin, Cong
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14081 - 14095
  • [3] Towards Zero-shot Language Modeling
    Ponti, Edoardo M.
    Vulic, Ivan
    Cotterell, Ryan
    Reichart, Roi
    Korhonen, Anna
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2900 - +
  • [4] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
    Li, Lin
    Xiao, Jun
    Chen, Guikun
    Shao, Jian
    Zhuang, Yueting
    Chen, Long
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
    Meng, Yu
    Huang, Jiaxin
    Zhang, Yu
    Han, Jiawei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Zero-Shot Nuclei Detection via Visual-Language Pre-trained Models
    Wu, Yongjian
    Zhou, Yang
    Saiyin, Jiya
    Wei, Bingzheng
    lai, Maode
    Shou, Jianzhong
    Fan, Yubo
    Xu, Yan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 693 - 703
  • [7] PqE: Zero-Shot Document Expansion for Dense Retrieval with Large Language Models
    Liu, Jiyuan
    Zou, Dongsheng
    Chai, Naiquan
    Yang, Yuming
    Wang, Hao
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 97 - 109
  • [8] Towards Zero-Shot Sign Language Recognition
    Bilge, Yunus Can
    Cinbis, Ramazan Gokberk
    Ikizler-Cinbis, Nazli
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1217 - 1232
  • [9] Towards Affordable Semantic Searching: Zero-Shot Retrieval via Dominant Attributes
    Long, Yang
    Liu, Li
    Shen, Yuming
    Shao, Ling
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7210 - 7217
  • [10] Large Language Models are Zero-Shot Reasoners
    Kojima, Takeshi
    Gu, Shixiang Shane
    Reid, Machel
    Matsuo, Yutaka
    Iwasawa, Yusuke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,