Demonstrating CAESURA: Language Models as Multi-Modal Query Planners

被引:0
|
作者
Urban, Matthias [1 ]
Binnig, Carsten [1 ,2 ]
机构
[1] Tech Univ Darmstadt, Darmstadt, Germany
[2] DFKI, Darmstadt, Germany
关键词
Multi-Modal; Query Planning; Large Language Models;
D O I
10.1145/3626246.3654732
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many domains, multi-modal data takes an important role and modern question-answering systems based on LLMs allow users to query this data using simple natural language queries. Retrieval Augmented Generation (RAG) is a recent approach that extends Large Language Models (LLM) with database technology to enable such multi-modal QA systems. In RAG, relevant data is first retrieved from a vector database and then fed into an LLM that computes the query result. However, RAG-based approaches have severe issues, such as regarding efficiency and scalability, since LLMs have high inference costs and can only process limited amounts of data. Therefore, in this demo paper, we propose CAESURA, a database-first approach that extends databases with LLMs. The main idea is that CAESURA utilizes the reasoning capabilities of LLMs to translate natural language queries into execution plans. Using such execution plans allows CAESURA to process multi-modal data outside the LLM using query operators and optimization strategies that are footed in scalable query execution strategies of databases. Our demo allows users to experience CAESURA on two example data sets containing tables, texts, and images .
引用
收藏
页码:472 / 475
页数:4
相关论文
共 50 条
  • [1] A Model and Query Language for Multi-modal Hybrid Query
    Hu, Chuan
    Mao, Along
    Zhao, Zihao
    Shen, Zhihong
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT 36TH INTERNATIONAL CONFERENCE, SSDBM 2024, 2024,
  • [2] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
    Wang, Mengzhao
    Wu, Haotian
    Ke, Xiangyu
    Gao, Yunjun
    Xu, Xiaoliang
    Chen, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336
  • [3] Towards sustainability in multi-modal urban planners
    Baena-Toquero, Manuel J.
    Muros-Cobos, Jesus L.
    Rodriguez-Valenzuela, Sandra
    Holgado-Terriza, Juan A.
    2014 INTERNATIONAL CONFERENCE ON CONNECTED VEHICLES AND EXPO (ICCVE), 2014, : 492 - 497
  • [4] Multi-modal Language Models for Lecture Video Retrieval
    Chen, Huizhong
    Cooper, Matthew
    Joshi, Dhiraj
    Girod, Bernd
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1081 - 1084
  • [5] Visual Hallucinations of Multi-modal Large Language Models
    Huang, Wen
    Liu, Hongbin
    Guo, Minxin
    Gong, Neil Zhenqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9614 - 9631
  • [6] Multi-Modal Attribute Prompting for Vision-Language Models
    Liu, Xin
    Wu, Jiamin
    Yang, Wenfei
    Zhou, Xu
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11579 - 11591
  • [7] Multi-modal Language Models for Human-Robot Interaction
    Janssens, Ruben
    COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 109 - 111
  • [8] MMA: Multi-Modal Adapter for Vision-Language Models
    Yang, Lingxiao
    Zhang, Ru-Yuan
    Wang, Yanchen
    Xie, Xiaohua
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23826 - +
  • [9] Incorporating Concreteness in Multi-Modal Language Models with Curriculum Learning
    Sezerer, Erhan
    Tekir, Selma
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [10] Generative Multi-Modal Knowledge Retrieval with Large Language Models
    Long, Xinwei
    Zeng, Jiali
    Meng, Fandong
    Ma, Zhiyuan
    Zhang, Kaiyan
    Zhou, Bowen
    Zhou, Jie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18733 - 18741