MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

被引:0
|
作者
Dou, Longxu [1 ]
Gao, Yan [2 ]
Pan, Mingyang [1 ]
Wang, Dingzirui [1 ]
Che, Wanxiang [1 ]
Zhan, Dechen [1 ]
Lou, Jian-Guang [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
引用
收藏
页码:12745 / 12753
页数:9
相关论文
共 50 条
  • [1] MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
    Dou, Longxu
    Gao, Yan
    Pan, Mingyang
    Wang, Dingzirui
    Che, Wanxiang
    Zhan, Dechen
    Lou, Jian-Guang
    Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, 37 : 12745 - 12753
  • [2] Semantic Decomposition of Question and SQL for Text-to-SQL Parsing
    Eyal, Ben
    Bachar, Amir
    Haroche, Ophir
    Mahabi, Moran
    Elhadad, Michael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13629 - 13645
  • [3] Error Detection for Text-to-SQL Semantic Parsing
    Chen, Shijie
    Chen, Ziru
    Sun, Huan
    Su, Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11730 - 11743
  • [4] A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
    Anh Tuan Nguyen
    Mai Hoang Dao
    Dat Quoc Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4079 - 4085
  • [5] Parallel Corpus Curation for Filipino Text-to-SQL Semantic Parsing
    Borjal, Christalline Joie
    Visperas, Moses
    Adoptante, Aunhel John
    Abia, Ma. Teresita
    Catapang, Jasper Kyle
    Peramo, Elmer
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 163 - 169
  • [6] RECPARSER: A Recursive Semantic Parsing Framework for Text-to-SQL Task
    Zeng, Yu
    Gao, Yan
    Guo, Jiaqi
    Chen, Bei
    Liu, Qian
    Lou, Jian-Guang
    Teng, Fei
    Zhang, Dongmei
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3644 - 3650
  • [7] Decoupling SQL query hardness parsing for text-to-SQL
    Yi, Jiawen
    Chen, Guo
    Zhou, Xiaojun
    Neurocomputing, 621
  • [8] Decoupling SQL query hardness parsing for text-to-SQL
    Yi, Jiawen
    Chen, Guo
    Zhou, Xiaojun
    NEUROCOMPUTING, 2025, 621
  • [9] Service-oriented Text-to-SQL Parsing
    Hu, Wangsu
    Tian, Jilei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2218 - 2222
  • [10] Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
    Chen, Zhi
    Chen, Lu
    Li, Hanqi
    Cao, Ruisheng
    Ma, Da
    Wu, Mengyue
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3063 - 3074