TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study

被引:0
|
作者
Kanburoglu, Ali Bugra [1 ]
Tek, Faik Boray [2 ]
机构
[1] Isik Univ, Dept Comp Engn, TR-34980 Istanbul, Turkiye
[2] Istanbul Tech Univ, Dept Artificial Intelligence & Data Engn, TR-34467 Istanbul, Turkiye
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Training; Structured Query Language; Accuracy; Error analysis; Benchmark testing; Cognition; Encoding; Text-to-SQL; LLM; large language models; Turkish; dataset; TURSpider;
D O I
10.1109/ACCESS.2024.3498841
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.
引用
收藏
页码:169379 / 169387
页数:9
相关论文
共 50 条
  • [41] PS-SQL: Phrase-based Schema-Linking with Pre-trained Language Models for Text-to-SQL Parsing
    Lan, Zhibo
    Li, Shuangyin
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 31 - 35
  • [42] Reproducibility of LLM-based Recommender Systems: the Case Study of P5 Paradigm
    Lops, Pasquale
    Silletti, Antonio
    Polignano, Marco
    Musto, Cataldo
    Semeraro, Giovanni
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 116 - 125
  • [43] Improving the Accuracy of Text-to-SQL Tools Based on Large Language Models for Real-World Relational Databases
    Coelho, Gustavo M. C.
    Nascimento, Eduardo R. S.
    Izquierdo, Yenier T.
    Garcia, Grettel M.
    Feijo, Lucas
    Lemos, Melissa
    Garcia, Robinson L. S.
    de Oliveira, Aiko R.
    Pinheiro, Joao P.
    Casanova, Marco A.
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 93 - 107
  • [44] ExSPIN: Explicit Feedback-Based Self-Play Fine-Tuning for Text-to-SQL Parsing
    Yan, Liang
    Su, Jinhang
    Liu, Chuanyi
    Duan, Shaoming
    Zhang, Yuhao
    Li, Jianhang
    Han, Peiyi
    Liu, Ye
    ENTROPY, 2025, 27 (03)
  • [45] An interaction-modeling mechanism for context-dependent Text-to-SQL translation based on heterogeneous graph aggregation
    Yu, Wei
    Chang, Tao
    Guo, Xiaoting
    Wang, Mengzhu
    Wang, Xiaodong
    NEURAL NETWORKS, 2021, 142 : 573 - 582
  • [46] Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
    Zhong, Ruiqi
    Snell, Charlie
    Klein, Dan
    Eisner, Jason
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5126 - 5152
  • [47] IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation
    Cai, Yitao
    Wan, Xiaojun
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6903 - 6912
  • [48] LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
    Fakhoury, Sarah
    Naik, Aaditya
    Sakkas, Georgios
    Chakraborty, Saikat
    Lahiri, Shuvendu K.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (09) : 2254 - 2268
  • [49] "Artificial Intelligence - Carrying us into the Future": A Study of Older Adults' Perceptions of LLM-Based Chatbots
    Enam, M. D. Atik
    Murmu, Chandni
    Dixon, Emma
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2025,
  • [50] An empirical study on LLM-based classification of requirements-related provisions in food-safety regulations
    Hassani, Shabnam
    Sabetzadeh, Mehrdad
    Amyot, Daniel
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (03)