LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models

被引:0
|
作者
Rossetto, Luca [1 ]
Kyriakou, Athina [1 ]
Lange, Svenja [1 ]
Ruosch, Florian [1 ]
Wang, Ruijie [1 ]
Wardatzky, Kathrin [1 ]
Bernstein, Abraham [1 ]
机构
[1] Univ Zurich, Dept Informat, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Lifelogging; Lifelog Search Challenge; Multimodal Knowledge Graphs; Graph-based Retrieval; Multi-modal Retrieval; Vision-Language Models;
D O I
10.1145/3643489.3661127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the scope of the 7th Lifelog Search Challenge (LSC'24), we present the 4th iteration of LifeGraph, a multimodal knowledge-graph approach with data augmentations using Vision-Language Models (VLM). We extend the LifeGraph model presented in former LSC challenges by event-based clustering using temporal and spatial relations as well as information extracted from descriptions of Lifelog image captions produced by VLMs.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [41] EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
    Wang, Tiannan
    Zhou, Wangchunshu
    Zeng, Yan
    Zhang, Xinsong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 13899 - 13913
  • [42] MULTPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs
    Zahera, Hamada M.
    Vollmers, Daniel
    Sherif, Mohamed Ahmed
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB - ISWC 2022, 2022, 13489 : 303 - 318
  • [43] Workshop on Enterprise Knowledge Graphs using Large Language Models
    Gupta, Rajeev
    Srinivasa, Srinath
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5271 - 5272
  • [44] Attention-Based Multimodal Deep Learning on Vision-Language Data: Models, Datasets, Tasks, Evaluation Metrics and Applications
    Bose, Priyankar
    Rana, Pratip
    Ghosh, Preetam
    IEEE ACCESS, 2023, 11 : 80624 - 80646
  • [45] Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
    Huang, Po-Yao
    Patrick, Mandela
    Hu, Junjie
    Neubig, Graham
    Metze, Florian
    Hauptmann, Alexander
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2443 - 2459
  • [46] Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models
    Zhu, Hongyi
    Huang, Jia-Hong
    Rudinac, Stevan
    Kanoulas, Evangelos
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 978 - 987
  • [47] Bridging the Lexical Gap: Generative Text-to-Image Retrieval for Parts-of-Speech Imbalance in Vision-Language Models
    Hwang, Hyesu
    Kim, Daeun
    Park, Jaehui
    Kwon, Yongjin
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON DEEP MULTIMODAL GENERATION AND RETRIEVAL, MMGR 2024, 2024, : 25 - 33
  • [48] Reflex-based open-vocabulary navigation without prior knowledge using omnidirectional camera and multiple vision-language models
    Kawaharazuka, Kento
    Obinata, Yoshiki
    Kanazawa, Naoaki
    Tsukamoto, Naoto
    Okada, Kei
    Inaba, Masayuki
    ADVANCED ROBOTICS, 2024, 38 (18) : 1307 - 1317
  • [49] WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models
    Gabeff, Valentin
    Russwurm, Marc
    Tuia, Devis
    Mathis, Alexander
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3770 - 3786
  • [50] Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models
    Aseniero, Bon Adriel
    Lee, Michael
    Wang, Yi
    Zhou, Qian
    Shahmansouri, Nastaran
    Goldstein, Rhys
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,