Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

被引：0

作者：

Ko, Hyung-Kwon ^{[1
]}

Jeon, Hyeon ^{[2
]}

Park, Gwanmo ^{[2
]}

Kim, Dae Hyun ^{[1
]}

Kim, Nam Wook ^{[3
]}

Kim, Juho ^{[1
]}

Seo, Jinwook ^{[2
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Seoul Natl Univ, Seoul, South Korea

[3] Boston Coll, Chestnut Hill, MA 02167 USA

来源：

PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024) | 2024年

基金：

新加坡国家研究基金会;

关键词：

Vega-Lite; natural language datasets; large language models; framework; natural language interfaces; data visualization; OF-THE-ART; VEGA;

D O I：

10.1145/3613904.3642943

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.

引用

页数：22

共 50 条

[1] Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language Models
Ram, Santhosh
Muthumanikandan, V.
[J]. IEEE Access, 2024, 12 : 138547 - 138563
[2] Automated Insights on Visualizations with Natural Language Generation
Brath, Richard
Hagerman, Craig
[J]. 2021 25TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): AI & VISUAL ANALYTICS & DATA SCIENCE, 2021, : 278 - 284
[3] Research on Dataset Generation in the Development of Large Language Models for Digital Textbooks
Lee, Youngho
[J]. 2023 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND ARTIFICIAL INTELLIGENCE, RAAI 2023, 2023, : 297 - 300
[4] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
[J]. IEEE ACCESS, 2022, 10 : 61970 - 61979
[5] Editing Graph Visualizations by Prompting Large Language Models
Argyriou, Evmorfia
Boehm, Jens
Eberle, Anne
Gonser, Julius
Lumpp, Anna-Lena
Niedermann, Benjamin
Schwarzkopf, Fabian
[J]. GRAPH DRAWING AND NETWORK VISUALIZATION, GD 2023, PT II, 2023, 14466 : 253 - 254
[6] Framework for evaluating code generation ability of large language models
Yeo, Sangyeop
Ma, Yu-Seung
Kim, Sang Cheol
Jun, Hyungkook
Kim, Taeho
[J]. ETRI JOURNAL, 2024, 46 (01) : 106 - 117
[7] Causal Dataset Discovery with Large Language Models
Liu, Junfei
Sun, Shaotong
Nargesian, Fatemeh
[J]. WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
[8] Natural language processing in the era of large language models
Zubiaga, Arkaitz
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
[9] Structuring Natural Language Requirements with Large Language Models
Norheim, Johannes J.
Rebentisch, Eric
[J]. 32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 68 - 71
[10] A Natural Bias for Language Generation Models
Meister, Clara
Stokowiec, Wojciech
Pimentel, Tiago
Yu, Lei
Rimell, Laura
Kuncoro, Adhiguna
[J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 243 - 255

← 1 2 3 4 5 →