Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

被引:0
|
作者
Wu, Kun [1 ,2 ]
Wang, Lijie [2 ]
Li, Zhenghua [1 ]
Zhang, Ao [2 ]
Xiao, Xinyan [2 ]
Wu, Hua [2 ]
Zhang, Min [1 ]
Wang, Haifeng [2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Thchnol, Inst Artificial Intelligence, Suzhou, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of labeled data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data, or fail to handle complex SQL queries. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large number of SQL queries based on an abstract syntax tree grammar. For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Finally, we design a simple sampling strategy that can greatly improve training efficiency given large amounts of generated data. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and the hierarchical generation component is the key for the improvement.
引用
收藏
页码:8974 / 8983
页数:10
相关论文
共 50 条
  • [21] Importance of Synthesizing High-quality Data for Text-to-SQL Parsing
    Hu, Yiqun
    Zhao, Yiyun
    Jiang, Jiarong
    Lan, Wuwei
    Zhu, Henry
    Chauhan, Anuj
    Li, Alexander
    Pan, Lin
    Wang, Jun
    Hang, Chung-Wei
    Zhang, Sheng
    Guo, Jiang
    Dong, Marvin
    Lilien, Joe
    Ng, Patrick
    Wang, Zhiguo
    Castelli, Vittorio
    Xiang, Bing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1327 - 1343
  • [22] A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
    Anh Tuan Nguyen
    Mai Hoang Dao
    Dat Quoc Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4079 - 4085
  • [23] Integrating Question Answering and Text-to-SQL in Portuguese
    Jose, Marcos Menon
    Jose, Marcelo Archanjo
    Maua, Denis Deratani
    Cozman, Fabio Gagliardi
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 278 - 287
  • [24] Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation
    Zhang, Weixu
    Wang, Yu
    Fan, Ming
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT V, 2023, 14258 : 181 - 192
  • [25] CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
    Yu, Tao
    Zhang, Rui
    Er, He Yang
    Li, Suyi
    Xue, Eric
    Pang, Bo
    Lin, Xi Victoria
    Tan, Yi Chern
    Shi, Tianze
    Li, Zihan
    Jiang, Youxuan
    Yasunaga, Michihiro
    Shim, Sungrok
    Chen, Tao
    Fabbri, Alexander
    Li, Zifan
    Chen, Luyao
    Zhang, Yuwen
    Dixit, Shreya
    Zhang, Vincent
    Xiong, Caiming
    Socher, Richard
    Lasecki, Walter S.
    Radev, Dragomir
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1962 - 1979
  • [26] DIR: A Large-Scale Dialogue Rewrite Dataset for Cross-Domain Conversational Text-to-SQL
    Li, Jieyu
    Chen, Zhi
    Chen, Lu
    Zhu, Zichen
    Li, Hanqi
    Cao, Ruisheng
    Yu, Kai
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [27] Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
    Zhao, Chen
    Su, Yu
    Pauls, Adam
    Platanios, Emmanouil Antonios
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5568 - 5578
  • [28] RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
    Li, Haoyang
    Zhang, Jing
    Li, Cuiping
    Chen, Hong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13067 - 13075
  • [29] Parallel Corpus Curation for Filipino Text-to-SQL Semantic Parsing
    Borjal, Christalline Joie
    Visperas, Moses
    Adoptante, Aunhel John
    Abia, Ma. Teresita
    Catapang, Jasper Kyle
    Peramo, Elmer
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 163 - 169
  • [30] RECPARSER: A Recursive Semantic Parsing Framework for Text-to-SQL Task
    Zeng, Yu
    Gao, Yan
    Guo, Jiaqi
    Chen, Bei
    Liu, Qian
    Lou, Jian-Guang
    Teng, Fei
    Zhang, Dongmei
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3644 - 3650