Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

被引:3
|
作者
Jha, Sumit Kumar [1 ]
Jha, Susmit [2 ]
Lincoln, Patrick [2 ]
Bastian, Nathaniel D. [3 ]
Velasquez, Alvaro [4 ,5 ]
Ewetz, Rickard
Neema, Sandeep [6 ]
机构
[1] Florida Int Univ, Dept Comp Sci, Miami, FL 33199 USA
[2] SRI Int, Comp Sci Lab, Menlo Pk, CA USA
[3] US Mil Acad, Army Cyber Inst, West Point, NY USA
[4] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[5] Univ Cent Florida, Elect & Comp Engn, Orlando, FL USA
[6] Vanderbilt Univ, Elect & Comp Engn, Nashville, TN USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/MILCOM58377.2023.10356332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative large language models (LLMs) can follow human-provided instruction prompts and generate human-like responses. Apart from natural language responses, they have been found to be effective at generating formal artifacts such as code, plans, and logical specifications. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucinations. This limitation makes it difficult to use these models to synthesize formal artifacts used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Solving the Economic Models by Using the Tools of Excel and VBA Language
    Hallova, Marcela
    Hennyeyova, Klara
    PROCEEDINGS FROM IX. INTERNATIONAL CONFERENCE ON APPLIED BUSINESS RESEARCH (ICABR 2014), 2015, : 243 - 248
  • [42] Automatic Bug Fixing via Deliberate Problem Solving with Large Language Models
    Weng, Guoyang
    Andrzejak, Artur
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS, ISSREW, 2023, : 34 - 36
  • [43] On Opportunities and Challenges of Large Language Models and GPT for Problem Solving and TRIZ Education
    Avogadri, Simone
    Russo, Davide
    WORLD CONFERENCE OF AI-POWERED INNOVATION AND INVENTIVE DESIGN, PT I, TFC 2024, 2025, 735 : 193 - 204
  • [44] MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
    Zhang, Lei
    Zhang, Yuge
    Ren, Kan
    Li, Dongsheng
    Yang, Yuqing
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2931 - 2959
  • [45] KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models
    Matsumoto, Nicholas
    Moran, Jay
    Choi, Hyunjun
    Hernandez, Miguel E.
    Venkatesan, Mythreye
    Wang, Paul
    Moore, Jason H.
    BIOINFORMATICS, 2024, 40 (06)
  • [46] Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education
    Kumar, Nischal Ashok
    Lan, Andrew S.
    arXiv, 1600,
  • [47] Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education
    Kumar, Nischal Ashok
    Lan, Andrew S.
    AI FOR EDUCATION WORKSHOP, 2024, 257 : 170 - 178
  • [48] Fairness-guided Few-shot Prompting for Large Language Models
    Ma, Huan
    Zhang, Changqing
    Bian, Yatao
    Liu, Lemao
    Zhang, Zhirui
    Zhao, Peilin
    Zhang, Shu
    Fu, Huazhu
    Hu, Qinghua
    Wu, Bingzhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Using Large Language Models to Improve Sentiment Analysis in Latvian Language
    Purvins, Pauls
    Urtans, Evalds
    Caune, Vairis
    BALTIC JOURNAL OF MODERN COMPUTING, 2024, 12 (02): : 165 - 175
  • [50] Driving and suppressing the human language network using large language models
    Tuckute, Greta
    Sathe, Aalok
    Srikant, Shashank
    Taliaferro, Maya
    Wang, Mingye
    Schrimpf, Martin
    Kay, Kendrick
    Fedorenko, Evelina
    NATURE HUMAN BEHAVIOUR, 2024, 8 (03) : 544 - 561