MyVLM: Personalizing VLMs for User-Specific Queries

被引:0
|
作者
Alaluf, Yuval [1 ,2 ]
Richardson, Elad [2 ]
Tulyakov, Sergey [1 ]
Aberman, Kfir [1 ]
Cohen-Or, Daniel [1 ,2 ]
机构
[1] Snap Inc, Santa Monica, CA 90405 USA
[2] Tel Aviv Univ, Tel Aviv, Israel
来源
基金
以色列科学基金会;
关键词
Vision-Language Models; Personalization;
D O I
10.1007/978-3-031-72624-8_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In this work, we take a first step toward the personalization of VLMs, enabling them to learn and reason over user-provided concepts. For example, we explore whether these models can learn to recognize you in an image and communicate what you are doing, tailoring the model to reflect your personal experiences and relationships. To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model, enabling the VLM to identify the presence of specific target concepts in a given image. Having recognized the concept, we learn a new concept embedding in the intermediate feature space of the VLM. This embedding is tasked with guiding the language model to naturally integrate the target concept in its generated response. We apply our technique to BLIP-2 and LLaVA for personalized image captioning and further show its applicability for personalized visual question-answering. Our experiments demonstrate our ability to generalize to unseen images of learned concepts while preserving the model behavior on unrelated inputs. Code and data will be made available upon acceptance.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [41] User-Specific Hand Modeling from Monocular Depth Sequences
    Taylor, Jonathan
    Stebbing, Richard
    Ramakrishna, Varun
    Keskin, Cem
    Shotton, Jamie
    Izadi, Shahram
    Hertzmann, Aaron
    Fitzgibbon, Andrew
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 644 - 651
  • [42] User-specific and Dynamic Internalization of Road Traffic Noise Exposures
    Ihab Kaddoura
    Lars Kröger
    Kai Nagel
    Networks and Spatial Economics, 2017, 17 : 153 - 172
  • [43] User-Specific Learning for Recognizing a Singer's Intended Pitch
    Guillory, Andrew
    Basu, Sumit
    Morris, Dan
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 960 - 966
  • [44] Fingerprint Presentation Attacks Detection based on the User-Specific Effect
    Ghiani, Luca
    Marcialis, Gian Luca
    Roli, Fabio
    2017 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB), 2017, : 359 - 365
  • [45] User-specific semantic integration of heterogeneous data: The SIRUP approach
    Ziegler, P
    Dittrich, KR
    SEMANTICS OF A NETWORKED WORLD: SEMANTICS FOR GRID DATABASES, 2004, 3226 : 44 - 64
  • [46] User-specific and Dynamic Internalization of Road Traffic Noise Exposures
    Kaddoura, Ihab
    Kroeger, Lars
    Nagel, Kai
    NETWORKS & SPATIAL ECONOMICS, 2017, 17 (01): : 153 - 172
  • [47] Compact User-Specific Reconfigurable Intelligent Surfaces for Uplink Transmission
    Liu, Kunzan
    Zhang, Zijian
    Dai, Linglong
    Hanzo, Lajos
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (01) : 680 - 692
  • [48] User-Specific Cohort Selection and Score Normalization for Biometric Systems
    Merati, Amin
    Poh, Norman
    Kittler, Josef
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2012, 7 (04) : 1270 - 1277
  • [49] USAR: An Interactive User-specific Aesthetic Ranking Framework for Images
    Lv, Pei
    Wang, Meng
    Xu, Yongbo
    Peng, Ze
    Sun, Junyi
    Su, Shimei
    Zhou, Bing
    Xu, Mingliang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1328 - 1336
  • [50] User-Specific Route Planning for People with Motor Disabilities: A Fuzzy Approach
    Gharebaghi, Amin
    Mostafavi, Mir-Abolfazl
    Edwards, Geoffrey
    Fougeyrollas, Patrick
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (02)