NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions

被引:0
|
作者
Chen, Fuxiang [1 ,4 ]
Hwang, Seung-won [2 ]
Choo, Jaegul [3 ]
Ha, Jung-Woo [1 ]
Kim, Sunghun [1 ,4 ]
机构
[1] NAVER, Clova AI Res, Seongnam Si, South Korea
[2] Yonsei Univ, Seoul, South Korea
[3] Korea Univ, Seoul, South Korea
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
SPARSE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating SQL codes from natural language questions (NL2SQL) is an emerging research area. Existing studies have mainly focused on clear scenarios where specified information is fully given to generate a SQL query. However, in developer forums such as Stack Overflow,1 questions cover more diverse tasks including table manipulation or performance issues, where a table is not specified. The SQL query posted in Stack Overflow, PseudoSQL (pSQL), does not usually contain table schemas and is not necessarily executable, is sufficient to guide developers. Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, in short, NL2pSQL. In addition, we define two new metrics suitable for the proposed NL2pSQL task, Canonical-BLEU and SQL-BLEU, instead of the conventional BLEU. With a baseline model using sequence-to-sequence architecture integrated with denoising autoencoder, we confirm the validity of our task. Experiments show that the proposed NL2pSQL approach yields well-formed queries (up to 43% more than a standard Seq2Seq model). Our code and datasets are publicly available at http: //github.com/clovaai/nl2psql.
引用
收藏
页码:2603 / 2613
页数:11
相关论文
共 4 条
  • [1] ACL-SQL: Generating SQL Queries from Natural Language
    Kaoshik, Ronak
    Patil, Rohit
    Prakash, R.
    Agarawal, Shaurya
    Jain, Naman
    Singh, Mayank
    [J]. CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 423 - 423
  • [2] L2S: Transforming natural language questions into SQL queries
    Duc Tam Hoang
    Minh Le Nguyen
    Son Bao Pham
    [J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 85 - 90
  • [3] Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks
    Luo, Yuyu
    Tang, Nan
    Li, Guoliang
    Chai, Chengliang
    Li, Wenbo
    Qin, Xuedi
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1235 - 1247
  • [4] NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries
    Narechania, Arpit
    Srinivasan, Arjun
    Stasko, John
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 369 - 379