LifeGraph 4-Lifelog Retrieval using Multimodal Knowledge Graphs and Vision-Language Models

被引：0

作者：

Rossetto, Luca ^{[1
]}

Kyriakou, Athina ^{[1
]}

Lange, Svenja ^{[1
]}

Ruosch, Florian ^{[1
]}

Wang, Ruijie ^{[1
]}

Wardatzky, Kathrin ^{[1
]}

Bernstein, Abraham ^{[1
]}

机构：

[1] Univ Zurich, Dept Informat, Zurich, Switzerland

来源：

PROCEEDINGS OF 2024 ACM WORKSHOP ON THE LIFELOG SEARCH CHALLENGE, LSC 2024 | 2024年

基金：

瑞士国家科学基金会;

关键词：

Lifelogging; Lifelog Search Challenge; Multimodal Knowledge Graphs; Graph-based Retrieval; Multi-modal Retrieval; Vision-Language Models;

D O I：

10.1145/3643489.3661127

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the scope of the 7th Lifelog Search Challenge (LSC'24), we present the 4th iteration of LifeGraph, a multimodal knowledge-graph approach with data augmentations using Vision-Language Models (VLM). We extend the LifeGraph model presented in former LSC challenges by event-based clustering using temporal and spatial relations as well as information extracted from descriptions of Lifelog image captions produced by VLMs.

引用

页码：88 / 92

页数：5

共 50 条

[41] EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Wang, Tiannan
Zhou, Wangchunshu
Zeng, Yan
Zhang, Xinsong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 13899 - 13913
[42] MULTPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs
Zahera, Hamada M.
Vollmers, Daniel
Sherif, Mohamed Ahmed
Ngomo, Axel-Cyrille Ngonga
SEMANTIC WEB - ISWC 2022, 2022, 13489 : 303 - 318
[43] Workshop on Enterprise Knowledge Graphs using Large Language Models
Gupta, Rajeev
Srinivasa, Srinath
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5271 - 5272
[44] Attention-Based Multimodal Deep Learning on Vision-Language Data: Models, Datasets, Tasks, Evaluation Metrics and Applications
Bose, Priyankar
Rana, Pratip
Ghosh, Preetam
IEEE ACCESS, 2023, 11 : 80624 - 80646
[45] Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Huang, Po-Yao
Patrick, Mandela
Hu, Junjie
Neubig, Graham
Metze, Florian
Hauptmann, Alexander
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2443 - 2459
[46] Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models
Zhu, Hongyi
Huang, Jia-Hong
Rudinac, Stevan
Kanoulas, Evangelos
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 978 - 987
[47] Bridging the Lexical Gap: Generative Text-to-Image Retrieval for Parts-of-Speech Imbalance in Vision-Language Models
Hwang, Hyesu
Kim, Daeun
Park, Jaehui
Kwon, Yongjin
PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON DEEP MULTIMODAL GENERATION AND RETRIEVAL, MMGR 2024, 2024, : 25 - 33
[48] Reflex-based open-vocabulary navigation without prior knowledge using omnidirectional camera and multiple vision-language models
Kawaharazuka, Kento
Obinata, Yoshiki
Kanazawa, Naoaki
Tsukamoto, Naoto
Okada, Kei
Inaba, Masayuki
ADVANCED ROBOTICS, 2024, 38 (18) : 1307 - 1317
[49] WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models
Gabeff, Valentin
Russwurm, Marc
Tuia, Devis
Mathis, Alexander
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3770 - 3786
[50] Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models
Aseniero, Bon Adriel
Lee, Michael
Wang, Yi
Zhou, Qian
Shahmansouri, Nastaran
Goldstein, Rhys
EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,

← 1 2 3 4 5 →