Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

被引:3
|
作者
Jha, Sumit Kumar [1 ]
Jha, Susmit [2 ]
Lincoln, Patrick [2 ]
Bastian, Nathaniel D. [3 ]
Velasquez, Alvaro [4 ,5 ]
Ewetz, Rickard
Neema, Sandeep [6 ]
机构
[1] Florida Int Univ, Dept Comp Sci, Miami, FL 33199 USA
[2] SRI Int, Comp Sci Lab, Menlo Pk, CA USA
[3] US Mil Acad, Army Cyber Inst, West Point, NY USA
[4] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[5] Univ Cent Florida, Elect & Comp Engn, Orlando, FL USA
[6] Vanderbilt Univ, Elect & Comp Engn, Nashville, TN USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/MILCOM58377.2023.10356332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative large language models (LLMs) can follow human-provided instruction prompts and generate human-like responses. Apart from natural language responses, they have been found to be effective at generating formal artifacts such as code, plans, and logical specifications. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucinations. This limitation makes it difficult to use these models to synthesize formal artifacts used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] An Investigation into the Utility of Large Language Models in Geotechnical Education and Problem Solving
    Chen, Liuxin
    Tophel, Amir
    Hettiyadura, Umidu
    Kodikara, Jayantha
    GEOTECHNICS, 2024, 4 (02): : 470 - 498
  • [32] Causal-Guided Active Learning for Debiasing Large Language Models
    Sun, Zhouhao
    Li Du
    Ding, Xiao
    Ma, Yixuan
    Zhao, Yang
    Qiu, Kaitao
    Liu, Ting
    Qin, Bing
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14455 - 14469
  • [33] ELAD: Explanation-Guided Large Language Models Active Distillation
    Zhang, Yifei
    Pan, Bo
    Ling, Chen
    Hu, Yuntong
    Zhao, Liang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4463 - 4475
  • [34] Multi-stage guided code generation for Large Language Models
    Han, Yewei
    Lyu, Chen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [35] Self-Knowledge Guided Retrieval Augmentation for Large Language Models
    Wang, Yile
    Li, Peng
    Sun, Maosong
    Liu, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10303 - 10315
  • [36] Oracle-Guided Program Selection from Large Language Models
    Fan, Zhiyu
    Ruan, Haifeng
    Mechtaev, Sergey
    Roychoudhury, Abhik
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 628 - 640
  • [37] Using Large Language Models in Business Processes
    Grisold, Thomas
    vom Brocke, Jan
    Kratsch, Wolfgang
    Mendling, Jan
    Vidgof, Maxim
    BUSINESS PROCESS MANAGEMENT, BPM 2023, 2023, 14159 : XXIX - XXXI
  • [38] Accelerating Pharmacovigilance using Large Language Models
    Prakash, Mukkamala Venkata Sai
    Parab, Ganesh
    Veeramalla, Meghana
    Reddy, Siddartha
    Varun, V.
    Gopalakrishnan, Saisubramaniam
    Pagidipally, Vishal
    Vaddina, Vishal
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1182 - 1183
  • [39] Guiding Enumerative Program Synthesis with Large Language Models
    Li, Yixuan
    Parsert, Julian
    Polgreen, Elizabeth
    COMPUTER AIDED VERIFICATION, PT II, CAV 2024, 2024, 14682 : 280 - 301
  • [40] Jigsaw: Large Language Models meet Program Synthesis
    Jain, Naman
    Vaidyanath, Skanda
    Iyer, Arun
    Natarajan, Nagarajan
    Parthasarathy, Suresh
    Rajamani, Sriram
    Sharma, Rahul
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1219 - 1231