GTR: An SQL Generator With Transition Representation in Cross-Domain Database Systems

被引:0
|
作者
Qiao, Shaojie [1 ]
Liu, Chenxu [1 ]
Yang, Guoping [1 ]
Han, Nan [2 ]
Peng, Yuhan [1 ]
Wu, Lingchun [1 ]
Li, He [3 ]
Yuan, Guan [4 ]
机构
[1] Chengdu Univ Informat Technol, Sch Software Engn, Chengdu 610225, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Management, Chengdu 610225, Peoples R China
[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[4] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic SQL generator; cross-domain database; grammar-based neural model; natural language (NL); NL-to-SQL learning system; transition representation (TR); TEXT-TO-SQL; NATURAL-LANGUAGE;
D O I
10.1109/TNNLS.2023.3309824
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have focused on using natural language (NL) to automatically retrieve useful data from database (DB) systems. As an important component of autonomous DB systems, the NL-to-SQL technique can assist DB administrators in writing high-quality SQL statements and make persons with no SQL background knowledge learn complex SQL languages. However, existing studies cannot deal with the issue that the expression of NL inevitably mismatches the implementation details of SQLs, and the large number of out-of-domain (OOD) words makes it difficult to predict table columns. In particular, it is difficult to accurately convert NL into SQL in an end-to-end fashion. Intuitively, it facilitates the model to understand the relations if a "bridge" [transition representation (TR)] is employed to make it compatible with both NL and SQL in the phase of conversion. In this article, we propose an automatic SQL generator with TR called GTR in cross-domain DB systems. Specifically, GTR contains three SQL generation steps: 1) GTR learns the relation between questions and DB schemas; 2) GTR uses a grammar-based model to synthesize a TR; and 3) GTR predicts SQL from TR based on the rules. We conduct extensive experiments on two commonly used datasets, that is, WikiSQL and Spider. On the testing set of the Spider and WikiSQL datasets, the results show that GTR achieves 58.32% and 71.29% exact matching accuracy which outperforms the state-of-the-art methods, respectively.
引用
收藏
页码:17908 / 17920
页数:13
相关论文
共 50 条
  • [1] Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
    Guo, Jiaqi
    Zhan, Zecheng
    Gao, Yan
    Xiao, Yan
    Lou, Jian-Guang
    Liu, Ting
    Zhang, Dongmei
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4524 - 4535
  • [2] An SQL query generator for cross-domain human language based questions based on NLP model
    B. Balaji Naik
    T. Jaya Venkata Rama Reddy
    K. Rohith Venkata karthik
    Pratyay Kuila
    Multimedia Tools and Applications, 2024, 83 : 11861 - 11884
  • [3] An SQL query generator for cross-domain human language based questions based on NLP model
    Naik, B. Balaji
    Reddy, T. Jaya Venkata Rama
    Karthik, K. Rohith Venkata
    Kuila, Pratyay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 11861 - 11884
  • [4] Selective Demonstrations for Cross-domain Text-to-SQL
    Chang, Shuaichen
    Fosler-Lussier, Eric
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14174 - 14189
  • [5] A Review of Cross-Domain Text-to-SQL Models
    Gan, Yujian
    Purver, Matthew
    Woodward, John R.
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 101 - 108
  • [6] PHOTON: A Robust Cross-Domain Text-to-SQL System
    Zeng, Jichuan
    Lin, Xi Victoria
    Xiong, Caiming
    Socher, Richard
    Lyu, Michael R.
    King, Irwin
    Hoi, Steven C. H.
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 204 - 214
  • [7] Representation Learning for Imbalanced Cross-Domain Classification
    Cheng, Lu
    Guo, Ruocheng
    Candan, K. Selcuk
    Liu, Huan
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 478 - 486
  • [8] Unsupervised Cross-Domain Word Representation Learning
    Bollegala, Danushka
    Maehara, Takanori
    Kawarabayashi, Ken-Ichi
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 730 - 740
  • [9] Evaluating Cross-Domain Text-to-SQL Models and Benchmarks
    Pourreza, Mohammadreza
    Rafiei, Davood
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1601 - 1611
  • [10] Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization
    Gan, Yujian
    Chen, Xinyun
    Purver, Matthew
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8926 - 8931