TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study

被引：0

作者：

Kanburoglu, Ali Bugra ^{[1
]}

Tek, Faik Boray ^{[2
]}

机构：

[1] Isik Univ, Dept Comp Engn, TR-34980 Istanbul, Turkiye

[2] Istanbul Tech Univ, Dept Artificial Intelligence & Data Engn, TR-34467 Istanbul, Turkiye

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Training; Structured Query Language; Accuracy; Error analysis; Benchmark testing; Cognition; Encoding; Text-to-SQL; LLM; large language models; Turkish; dataset; TURSpider;

D O I：

10.1109/ACCESS.2024.3498841

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.

引用

页码：169379 / 169387

页数：9

共 50 条

[31] FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis
Zhang, Chao
Mao, Yuren
Fan, Yijiang
Mi, Yu
Gao, Yunjun
Chen, Lu
Lou, Dongfang
Lin, Jinshu
COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 93 - 105
[32] On Modern Text-to-SQL Semantic Parsing Methodologies for Natural Language Interface to Databases: A Comparative Study
Visperas, Moses
Adoptante, Aunhel John
Borjal, Christalline Joie
Abia, Ma. Teresita
Catapang, Jasper Kyle
Peramo, Elmer
2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 390 - 396
[33] LLM-Based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT Department
Agossah, Alexandre
Krupa, Frederique
Perreira Da Silva, Matthieu
Le Callet, Patrick
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 237 - 241
[34] SV2-SQL: a text-to-SQL transformation mechanism based on BERT models for slot filling, value extraction, and verification
Chang, Chih-Yung
Liang, Yuan-Lin
Wu, Shih-Jung
Roy, Diptendu Sinha
MULTIMEDIA SYSTEMS, 2024, 30 (01)
[35] SV2-SQL: a text-to-SQL transformation mechanism based on BERT models for slot filling, value extraction, and verification
Chih-Yung Chang
Yuan-Lin Liang
Shih-Jung Wu
Diptendu Sinha Roy
Multimedia Systems, 2024, 30
[36] Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Yu, Tao
Zhang, Rui
Yang, Kai
Yasunaga, Michihiro
Wang, Dongxu
Li, Zifan
Ma, James
Li, Irene
Yao, Qingning
Roman, Shanelle
Zhang, Zilin
Radev, Dragomir R.
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3911 - 3921
[37] DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Sahipjohn, Neha
Gudmalwar, Ashishkumar
Shah, Nirmesh
Wasnik, Pankaj
Shah, Rajiv Ratn
INTERSPEECH 2024, 2024, : 2960 - 2964
[38] Exploring the application of LLM-based AI in UX design: an empirical case study of ChatGPT
Zhou, Zhibin
Li, Yaoqi
Yu, Junnan
HUMAN-COMPUTER INTERACTION, 2024,
[39] An LLM-Based Method for Quality Information Extraction From Web Text for Crowed-Sensing Spatiotemporal Data
Huang, Zongcai
Peng, Peng
Lu, Feng
Zhang, He
TRANSACTIONS IN GIS, 2025, 29 (01)
[40] Learning to Localize Actions in Instructional Videos with LLM-Based Multi-pathway Text-Video Alignment
Chen, Yuxiao
Li, Kai
Bao, Wentao
Patel, Deep
Kong, Yu
Min, Martin Renqiang
Metaxas, Dimitris N.
COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 193 - 210

← 1 2 3 4 5 →