Demonstrating CAESURA: Language Models as Multi-Modal Query Planners

被引:0
|
作者
Urban, Matthias [1 ]
Binnig, Carsten [1 ,2 ]
机构
[1] Tech Univ Darmstadt, Darmstadt, Germany
[2] DFKI, Darmstadt, Germany
关键词
Multi-Modal; Query Planning; Large Language Models;
D O I
10.1145/3626246.3654732
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many domains, multi-modal data takes an important role and modern question-answering systems based on LLMs allow users to query this data using simple natural language queries. Retrieval Augmented Generation (RAG) is a recent approach that extends Large Language Models (LLM) with database technology to enable such multi-modal QA systems. In RAG, relevant data is first retrieved from a vector database and then fed into an LLM that computes the query result. However, RAG-based approaches have severe issues, such as regarding efficiency and scalability, since LLMs have high inference costs and can only process limited amounts of data. Therefore, in this demo paper, we propose CAESURA, a database-first approach that extends databases with LLMs. The main idea is that CAESURA utilizes the reasoning capabilities of LLMs to translate natural language queries into execution plans. Using such execution plans allows CAESURA to process multi-modal data outside the LLM using query operators and optimization strategies that are footed in scalable query execution strategies of databases. Our demo allows users to experience CAESURA on two example data sets containing tables, texts, and images .
引用
收藏
页码:472 / 475
页数:4
相关论文
共 50 条
  • [31] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
    Long, Sifan
    Zhao, Zhen
    Yuan, Junkun
    Tan, Zichang
    Liu, Jiangjiang
    Zhou, Luping
    Wang, Shengsheng
    Wang, Jingdong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
  • [32] SPECTRA: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications
    Nitin, Vikram
    Krishna, Rahul
    Ray, Baishakhi
    arXiv,
  • [33] On the Adversarial Robustness of Multi-Modal Foundation Models
    Schlarmann, Christian
    Hein, Matthias
    arXiv, 2023,
  • [34] What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
    Zhang, Letian
    Zhai, Xiaotong
    Zhao, Zhongkai
    Wen, Xin
    Zhao, Bingchen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4631 - 4635
  • [35] Multi-modal Predictive Models of Diabetes Progression
    Ramazi, Ramin
    Perndorfer, Christine
    Soriano, Emily
    Laurenceau, Jean-Philippe
    Beheshti, Rahmatollah
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 253 - 258
  • [36] Initial description of multi-modal dynamic models
    Kárny, M
    Nedoma, P
    Nagy, I
    Valecková, M
    ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 2001, : 398 - 401
  • [37] Multi-Modal Generative AI with Foundation Models
    Liu, Ziwei
    PROCEEDINGS OF THE 2ND WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM(CUBE)A 2024, 2024, : 4 - 4
  • [38] On the Adversarial Robustness of Multi-Modal Foundation Models
    Schlarmann, Christian
    Hein, Matthias
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3679 - 3687
  • [39] Discriminative multi-modal deep generative models
    Du, Fang
    Zhang, Jiangshe
    Hu, Junying
    Fei, Rongrong
    KNOWLEDGE-BASED SYSTEMS, 2019, 173 : 74 - 82
  • [40] Multi-Modal Generative AI with Foundation Models
    Liu, Ziwei
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 5 - 5