Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

被引:0
|
作者
Wu, Kun [1 ,2 ]
Wang, Lijie [2 ]
Li, Zhenghua [1 ]
Zhang, Ao [2 ]
Xiao, Xinyan [2 ]
Wu, Hua [2 ]
Zhang, Min [1 ]
Wang, Haifeng [2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Thchnol, Inst Artificial Intelligence, Suzhou, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of labeled data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data, or fail to handle complex SQL queries. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large number of SQL queries based on an abstract syntax tree grammar. For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Finally, we design a simple sampling strategy that can greatly improve training efficiency given large amounts of generated data. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and the hierarchical generation component is the key for the improvement.
引用
收藏
页码:8974 / 8983
页数:10
相关论文
共 50 条
  • [31] Global Reasoning over Database Structures for Text-to-SQL Parsing
    Bogin, Ben
    Gardner, Matt
    Berant, Jonathan
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3659 - 3664
  • [32] Leveraging Large Language Model for Enhanced Text-to-SQL Parsing
    Zhan, Zecheng
    Haihong, E.
    Song, Meina
    IEEE ACCESS, 2025, 13 : 30497 - 30504
  • [33] Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing
    Liu, Aiwei
    Liu, Wei
    Hu, Xuming
    Li, Shu'ang
    Ma, Fukun
    Yang, Yawen
    Wen, Lijie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 688 - 700
  • [34] MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
    Dou, Longxu
    Gao, Yan
    Pan, Mingyang
    Wang, Dingzirui
    Che, Wanxiang
    Zhan, Dechen
    Lou, Jian-Guang
    Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, 37 : 12745 - 12753
  • [35] MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
    Dou, Longxu
    Gao, Yan
    Pan, Mingyang
    Wang, Dingzirui
    Che, Wanxiang
    Zhan, Dechen
    Lou, Jian-Guang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12745 - 12753
  • [36] Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer
    Li, Tianyi
    Li, Sujian
    Steedman, Mark
    IWPT 2021: THE 17TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES: PROCEEDINGS OF THE CONFERENCE (INCLUDING THE IWPT 2021 SHARED TASK), 2021, : 38 - 49
  • [37] RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases
    Choi, DongHyun
    Shin, Myeong Cheol
    Kim, EungGyun
    Shin, Dong Ryeol
    COMPUTATIONAL LINGUISTICS, 2021, 47 (02) : 309 - 332
  • [38] G3R: A Graph-Guided Generate-and-Rerank Framework for Complex and Cross-domain Text-to-SQL Generation
    Xiang, Yanzheng
    Zhang, Qian-Wen
    Zhang, Xu
    Liu, Zejie
    Cao, Yunbo
    Zhou, Deyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 338 - 352
  • [39] Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
    Bogin, Ben
    Gardner, Matt
    Berant, Jonathan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4560 - 4565
  • [40] Leveraging Explicit Lexico-logical Alignments in Text-to-SQL Parsing
    Sun, Runxin
    He, Shizhu
    Zhu, Chong
    He, Yaohan
    Li, Jinlong
    Zhao, Jun
    Liu, Kang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 283 - 289