Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

被引:0
|
作者
Ko, Hyung-Kwon [1 ]
Jeon, Hyeon [2 ]
Park, Gwanmo [2 ]
Kim, Dae Hyun [1 ]
Kim, Nam Wook [3 ]
Kim, Juho [1 ]
Seo, Jinwook [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
[3] Boston Coll, Chestnut Hill, MA 02167 USA
基金
新加坡国家研究基金会;
关键词
Vega-Lite; natural language datasets; large language models; framework; natural language interfaces; data visualization; OF-THE-ART; VEGA;
D O I
10.1145/3613904.3642943
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language Models
    Ram, Santhosh
    Muthumanikandan, V.
    [J]. IEEE Access, 2024, 12 : 138547 - 138563
  • [2] Automated Insights on Visualizations with Natural Language Generation
    Brath, Richard
    Hagerman, Craig
    [J]. 2021 25TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): AI & VISUAL ANALYTICS & DATA SCIENCE, 2021, : 278 - 284
  • [3] Research on Dataset Generation in the Development of Large Language Models for Digital Textbooks
    Lee, Youngho
    [J]. 2023 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND ARTIFICIAL INTELLIGENCE, RAAI 2023, 2023, : 297 - 300
  • [4] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    [J]. IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [5] Editing Graph Visualizations by Prompting Large Language Models
    Argyriou, Evmorfia
    Boehm, Jens
    Eberle, Anne
    Gonser, Julius
    Lumpp, Anna-Lena
    Niedermann, Benjamin
    Schwarzkopf, Fabian
    [J]. GRAPH DRAWING AND NETWORK VISUALIZATION, GD 2023, PT II, 2023, 14466 : 253 - 254
  • [6] Framework for evaluating code generation ability of large language models
    Yeo, Sangyeop
    Ma, Yu-Seung
    Kim, Sang Cheol
    Jun, Hyungkook
    Kim, Taeho
    [J]. ETRI JOURNAL, 2024, 46 (01) : 106 - 117
  • [7] Causal Dataset Discovery with Large Language Models
    Liu, Junfei
    Sun, Shaotong
    Nargesian, Fatemeh
    [J]. WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
  • [8] Natural language processing in the era of large language models
    Zubiaga, Arkaitz
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
  • [9] Structuring Natural Language Requirements with Large Language Models
    Norheim, Johannes J.
    Rebentisch, Eric
    [J]. 32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 68 - 71
  • [10] A Natural Bias for Language Generation Models
    Meister, Clara
    Stokowiec, Wojciech
    Pimentel, Tiago
    Yu, Lei
    Rimell, Laura
    Kuncoro, Adhiguna
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 243 - 255