Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

被引:0
|
作者
Wu, Kun [1 ,2 ]
Wang, Lijie [2 ]
Li, Zhenghua [1 ]
Zhang, Ao [2 ]
Xiao, Xinyan [2 ]
Wu, Hua [2 ]
Zhang, Min [1 ]
Wang, Haifeng [2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Thchnol, Inst Artificial Intelligence, Suzhou, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of labeled data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data, or fail to handle complex SQL queries. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large number of SQL queries based on an abstract syntax tree grammar. For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Finally, we design a simple sampling strategy that can greatly improve training efficiency given large amounts of generated data. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and the hierarchical generation component is the key for the improvement.
引用
收藏
页码:8974 / 8983
页数:10
相关论文
共 50 条
  • [41] Benchmarking and Improving Text-to-SQL Generation under Ambiguity
    Bhaskar, Adithya
    Tomar, Tushar
    Sathe, Ashutosh
    Sarawagi, Sunita
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7053 - 7074
  • [42] Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation
    Giaquinto, Robert
    Zhang, Dejiao
    Kleiner, Benjamin
    Li, Yang
    Tan, Ming
    Bhatia, Parminder
    Nallapati, Ramesh
    Ma, Xiaofei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11067 - 11083
  • [43] ConDA: state-based data augmentation for context-dependent text-to-SQL
    Wang, Dingzirui
    Dou, Longxu
    Che, Wanxiang
    Wang, Jiaqi
    Liu, Jinbo
    Li, Lixin
    Shang, Jingan
    Tao, Lei
    Zhang, Jie
    Fu, Cong
    Song, Xuri
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (08) : 3157 - 3168
  • [44] HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing
    Zheng, Yanzhao
    Wang, Haibin
    Dong, Baohua
    Wang, Xingjun
    Li, Changshan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2997 - 3007
  • [45] Multi-hop Relational Graph Attention Network for Text-to-SQL Parsing
    Liu, Hu
    Shi, Yuliang
    Zhang, Jianlin
    Wang, Xinjun
    Li, Hui
    Kong, Fanyu
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [46] Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
    Chen, Zhi
    Chen, Lu
    Li, Hanqi
    Cao, Ruisheng
    Ma, Da
    Wu, Mengyue
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3063 - 3074
  • [47] Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing
    Wang, Run-Ze
    Ling, Zhen-Hua
    Zhou, Jing-Bo
    Hu, Yu
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13979 - 13987
  • [48] Valid Text-to-SQL Generation with Unification-Based DeepStochLog
    Jiao, Ying
    De Raedt, Luc
    Marra, Giuseppe
    NEURAL-SYMBOLIC LEARNING AND REASONING, PT I, NESY 2024, 2024, 14979 : 312 - 330
  • [49] Synthesizing Text-to-SQL Data from Weak and Strong LLMs
    Yang, Jiaxi
    Hui, Binyuan
    Yang, Min
    Yang, Jian
    Lin, Junyang
    Zhou, Chang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7864 - 7875
  • [50] Exploring Dimensions of Generalizability and Few-shot Transfer for Text-to-SQL Semantic Parsing
    Patil, Rajaswa
    Patwardhan, Manasi
    Karande, Shirish
    Vig, Lovekesh
    Shroff, Gautam
    TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 103 - 114