MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

被引:0
|
作者
Dou, Longxu [1 ]
Gao, Yan [2 ]
Pan, Mingyang [1 ]
Wang, Dingzirui [1 ]
Che, Wanxiang [1 ]
Zhan, Dechen [1 ]
Lou, Jian-Guang [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
引用
收藏
页码:12745 / 12753
页数:9
相关论文
共 50 条
  • [31] Towards Text-to-SQL over Aggregate Tables
    Li, Shuqin
    Zhou, Kaibin
    Zhuang, Zeyang
    Wang, Haofen
    Ma, Jun
    DATA INTELLIGENCE, 2023, 5 (02) : 457 - 474
  • [32] Semantic Evaluation for Text-to-SQL with Distilled Test Suites
    Zhong, Ruiqi
    Yu, Tao
    Klein, Dan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 396 - 411
  • [33] Towards Text-to-SQL over Aggregate Tables
    Shuqin Li
    Kaibin Zhou
    Zeyang Zhuang
    Haofen Wang
    Jun Ma
    Data Intelligence, 2023, 5 (02) : 457 - 474
  • [34] Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
    Bogin, Ben
    Gardner, Matt
    Berant, Jonathan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4560 - 4565
  • [35] Leveraging Explicit Lexico-logical Alignments in Text-to-SQL Parsing
    Sun, Runxin
    He, Shizhu
    Zhu, Chong
    He, Yaohan
    Li, Jinlong
    Zhao, Jun
    Liu, Kang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 283 - 289
  • [36] Importance of Synthesizing High-quality Data for Text-to-SQL Parsing
    Hu, Yiqun
    Zhao, Yiyun
    Jiang, Jiarong
    Lan, Wuwei
    Zhu, Henry
    Chauhan, Anuj
    Li, Alexander
    Pan, Lin
    Wang, Jun
    Hang, Chung-Wei
    Zhang, Sheng
    Guo, Jiang
    Dong, Marvin
    Lilien, Joe
    Ng, Patrick
    Wang, Zhiguo
    Castelli, Vittorio
    Xiang, Bing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1327 - 1343
  • [37] Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques
    Rai, Daking
    Wang, Bailin
    Zhou, Yilun
    Yao, Ziyu
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 150 - 160
  • [38] Multi-hop Relational Graph Attention Network for Text-to-SQL Parsing
    Liu, Hu
    Shi, Yuliang
    Zhang, Jianlin
    Wang, Xinjun
    Li, Hui
    Kong, Fanyu
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [39] Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing
    Wu, Kun
    Wang, Lijie
    Li, Zhenghua
    Zhang, Ao
    Xiao, Xinyan
    Wu, Hua
    Zhang, Min
    Wang, Haifeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8974 - 8983
  • [40] Towards Robustness of Text-to-SQL Models against Synonym Substitution
    Gan, Yujian
    Chen, Xinyun
    Huang, Qiuping
    Purver, Matthew
    Woodward, John R.
    Xie, Jinxia
    Huang, Pengsheng
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2505 - 2515