Counterexample Guided Inductive Synthesis Using Large Language Models and Satisfiability Solving

被引:3
|
作者
Jha, Sumit Kumar [1 ]
Jha, Susmit [2 ]
Lincoln, Patrick [2 ]
Bastian, Nathaniel D. [3 ]
Velasquez, Alvaro [4 ,5 ]
Ewetz, Rickard
Neema, Sandeep [6 ]
机构
[1] Florida Int Univ, Dept Comp Sci, Miami, FL 33199 USA
[2] SRI Int, Comp Sci Lab, Menlo Pk, CA USA
[3] US Mil Acad, Army Cyber Inst, West Point, NY USA
[4] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[5] Univ Cent Florida, Elect & Comp Engn, Orlando, FL USA
[6] Vanderbilt Univ, Elect & Comp Engn, Nashville, TN USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/MILCOM58377.2023.10356332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative large language models (LLMs) can follow human-provided instruction prompts and generate human-like responses. Apart from natural language responses, they have been found to be effective at generating formal artifacts such as code, plans, and logical specifications. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucinations. This limitation makes it difficult to use these models to synthesize formal artifacts used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Counterexample guided inductive optimization based on satisfiability modulo theories
    Araujo, Rodrigo F.
    Albuquerque, Higo E.
    de Bessa, Iury V.
    Cordeiro, Lucas C.
    Chaves Filho, Joao E.
    SCIENCE OF COMPUTER PROGRAMMING, 2018, 165 : 3 - 23
  • [2] Counterexample Guided Inductive Synthesis Modulo Theories
    Abate, Alessandro
    David, Cristina
    Kesseli, Pascal
    Kroening, Daniel
    Polgreen, Elizabeth
    COMPUTER AIDED VERIFICATION (CAV 2018), PT I, 2018, 10981 : 270 - 288
  • [3] Counterexample-guided inductive synthesis for probabilistic systems
    Ceska, Milan
    Hensel, Christian
    Junges, Sebastian
    Katoen, Joost-Pieter
    FORMAL ASPECTS OF COMPUTING, 2021, 33 (4-5) : 637 - 667
  • [4] Counterexample-Guided Synthesis of Perception Models and Control
    Ghosh, Shromona
    Pant, Yash Vardhan
    Ravanbakhsh, Hadi
    Seshia, Sanjit A.
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 3447 - 3454
  • [5] A Novel Counterexample-Guided Inductive Synthesis Framework for Barrier Certificate Generation
    Ding, Mi
    Lin, Kaipeng
    Lin, Wang
    Ding, Zuohua
    2022 IEEE 33RD INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2022), 2022, : 263 - 273
  • [6] SAT-Based Counterexample-Guided Inductive Synthesis of Distributed Controllers
    Chukharev, Konstantin
    Suvorov, Dmitrii
    Chivilikhin, Daniil
    Vyatkin, Valeriy
    IEEE ACCESS, 2020, 8 : 207485 - 207498
  • [7] Inductive reasoning in humans and large language models
    Han, Simon Jerome
    Ransom, Keith J.
    Perfors, Andrew
    Kemp, Charles
    COGNITIVE SYSTEMS RESEARCH, 2024, 83
  • [8] Solving Proof Block Problems Using Large Language Models
    Poulsen, Seth
    Sarsa, Sami
    Prather, James
    Leinonen, Juho
    Becker, Brett A.
    Hellas, Arto
    Denny, Paul
    Reeves, Brent N.
    PROCEEDINGS OF THE 55TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE 2024, VOL. 1, 2024, : 1063 - 1069
  • [9] Solving satisfiability problems using logic synthesis and reconfigurable hardware
    Suyama, T
    Yokoo, M
    Sawada, H
    PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL VII: SOFTWARE TECHNOLOGY TRACK, 1998, : 179 - 186
  • [10] Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting
    Jha, Susmit
    Jha, Sumit Kumar
    Lincoln, Patrick
    Bastian, Nathaniel D.
    Velasquez, Alvaro
    Neema, Sandeep
    2023 IEEE INTERNATIONAL CONFERENCE ON ASSURED AUTONOMY, ICAA, 2023, : 149 - 152