Demonstrating CAESURA: Language Models as Multi-Modal Query Planners

被引:0
|
作者
Urban, Matthias [1 ]
Binnig, Carsten [1 ,2 ]
机构
[1] Tech Univ Darmstadt, Darmstadt, Germany
[2] DFKI, Darmstadt, Germany
关键词
Multi-Modal; Query Planning; Large Language Models;
D O I
10.1145/3626246.3654732
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many domains, multi-modal data takes an important role and modern question-answering systems based on LLMs allow users to query this data using simple natural language queries. Retrieval Augmented Generation (RAG) is a recent approach that extends Large Language Models (LLM) with database technology to enable such multi-modal QA systems. In RAG, relevant data is first retrieved from a vector database and then fed into an LLM that computes the query result. However, RAG-based approaches have severe issues, such as regarding efficiency and scalability, since LLMs have high inference costs and can only process limited amounts of data. Therefore, in this demo paper, we propose CAESURA, a database-first approach that extends databases with LLMs. The main idea is that CAESURA utilizes the reasoning capabilities of LLMs to translate natural language queries into execution plans. Using such execution plans allows CAESURA to process multi-modal data outside the LLM using query operators and optimization strategies that are footed in scalable query execution strategies of databases. Our demo allows users to experience CAESURA on two example data sets containing tables, texts, and images .
引用
收藏
页码:472 / 475
页数:4
相关论文
共 50 条
  • [21] Online Multi-modal Hashing with Dynamic Query-adaption
    Lu, Xu
    Zhu, Lei
    Cheng, Zhiyong
    Nie, Liqiang
    Zhang, Huaxiang
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 715 - 724
  • [22] Directing Humanoids in a Multi-modal Command Language
    Oka, Tetsushi
    Abe, Toyokazu
    Shimoji, Masato
    Nakamura, Takuya
    Sugita, Kaoru
    Yokota, Masao
    2008 17TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2008, : 580 - 585
  • [23] Privacy-Preserving Image Retrieval with Multi-Modal Query
    Zhou, Fucai
    Zhang, Zongye
    Hou, Ruiwei
    COMPUTER JOURNAL, 2023, 67 (05): : 1979 - 1992
  • [24] Semi-paired Multi-modal Query Hashing Method
    Yu J.
    Ma J.
    Xian Y.
    Hou R.
    Sun W.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 481 - 491
  • [25] Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass
    Weerakoon, Dulanga
    Subbaraju, Vigneshwaran
    Tran, Tuan
    Misra, Archan
    2023 15TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS, COMSNETS, 2023,
  • [26] Towards automation in using multi-modal language resources: compatibility and interoperability for multi-modal features in Kachako
    Kano, Yoshinobu
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1098 - 1101
  • [27] Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
    He, Liqi
    Li, Zuchao
    Cai, Xiantao
    Wang, Ping
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18180 - 18187
  • [28] VGV: Verilog Generation using Visual Capabilities of Multi-Modal Large Language Models
    Wong, Sam-Zaak
    Wan, Gwok-Waa
    Liu, Dongping
    Wang, Xi
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [29] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [30] Multi-modal Language models in bioacoustics with zero-shot transfer: a case study
    Miao, Zhongqi
    Elizalde, Benjamin
    Deshmukh, Soham
    Kitzes, Justin
    Wang, Huaming
    Dodhia, Rahul
    Ferres, Juan Lavista
    SCIENTIFIC REPORTS, 2025, 15 (01):