Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark

被引:0
|
作者
Wretblad, Niklas [1 ]
Riseby, Fredrik Gordh [1 ]
Biswas, Rahul [2 ]
Ahmadi, Amin [2 ]
Holmstrom, Oskar [1 ]
机构
[1] Linkoping Univ, Linkoping, Sweden
[2] Silo AI, Helsinki, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of 'noise,' such as ambiguous questions and syntactical errors. This study provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold SQL queries. We found that noise in questions and gold queries are prevalent in the dataset, with varying amounts across domains, and with an uneven distribution between noise types. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark's reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise. All datasets, annotations, and code are available at this URL.
引用
收藏
页码:356 / 369
页数:14
相关论文
共 2 条
  • [1] Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
    Gao, Dawei
    Wang, Haibin
    Li, Yaliang
    Sun, Xiuyu
    Qian, Yichen
    Ding, Bolin
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 1132 - 1145
  • [2] EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
    Lee, Gyubok
    Hwang, Hyeonji
    Bae, Seongsu
    Kwon, Yeonsu
    Shin, Woncheol
    Yang, Seongjun
    Seo, Minjoon
    Kim, Jongyeup
    Choi, Edward
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,